Add fixed-trajectory system tests with cross-track error metrics by pvkumara · Pull Request #365 · castacks/AirStack

pvkumara · 2026-06-09T21:02:31Z

- What features did you add and/or bugs did you address?*

- Which GitHub issue does this address?

This PR does not address any GitHub issues but instead adds a new fixed-trajectory test suite for automatic path tracking error in sim.

- Additional description if not fully described in the GitHub issue

This PR adds automated fixed-trajectory evaluation tests for the autonomy stack and fixes trajectory-tracking bugs that caused path execution to fail. It also improves the test results workflow so maintainers get one readable summary file instead of many per-test logs.

Please add videos and images to demonstrate the feature. Please upload videos to somewhere persistent (e.g. YouTube or Vimeo) for archival purposes.

https://youtu.be/zaaZqLUzqZ8

How did you implement it?

Algorithm details, design decisions, engineering notes, and any other relevant information about the implementation should be included

How do you run and use it?

What commands and button presses do you use to manually launch the stack to use your new feature?

The exact workflow of running all these tests is to simply do airstack up and then create a test based on your needs using the global CLI options. Some basic tests that I ran to validate this testing stack is included below. The global CLI options are also included below.

Write a detailed procedure with EXACT BASH COMMANDS so that another maintainer can replicate and understand the benefits of your feature, and reproduce the videos and images you added above.

Testing with PyTest

What pytests did you add to ensure the feature is reliable and robust? What metrics are used?

What's the exact command to run the pytests that test your feature? i.e. airstack test -m ...

What are the expected results of the tests? What should a maintainer look at to understand whether the test succeeded?

A maintainer should see that all the tests have passed in their console once they input an airstack test command and they should go to the testing folder, isolate the folder that has their test and open their summary.txt file for the test in question to see all the outputted metrics from the test.

Documentation

Was mkdocs.yml updated? (y/n)

Yes, the mkdocs.yml was updated to make a trajectory testing page.

Do the docs have sufficient scope such that a newcomer can easily reproduce and use your feature?

Yes, there is now a docs that explains the full pipeline for trajectory testing.

Is there sufficient visual media?

I believe that there is sufficient visual media from the YouTube Video above but I can generate more if needed.

Versioning

Did you make sure to bump the version number in the .env file according to semantic versioning?

Yes, the versioning was changed to 0.19.0-alpha.4.

New tests/test_fixed_trajectory.py evaluates drone performance on Circle, Figure8, Racetrack, and Line trajectories: takeoff -> execute -> land with cross-track error, path RMSE, execution time, and success metrics recorded to metrics.json for baseline comparison. - Python ideal-path generators mirror fixed_trajectory_task.cpp equations - Cross-track error uses robot pose snapshot at dispatch to transform base_link ideal path to world frame for odom comparison - 5m loose tolerance documents the known circle failure without stranding drone - conftest.py gains --trajectory-types CLI option and generalised phase-order sorting/ID-rewriting for both autonomy test modules - tests/README.md documents the new module, all 11 metrics, and run commands Made-with: Cursor

Made-with: Cursor

* Add link to PAT * Change to new orchestrator instance workflow * Add availability zone * Bump version to 0.18.0-alpha.7

…ge build tests for ci/cd

…sted runner

…ding docker images

…cle test to pass

…easily see their results in one file without having to wade through a ton of log files to get what they need

…t doesn't inundate the user with a ton of log files for no reason

… everything

…esting Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # .agents/skills/configure-multi-robot/SKILL.md # .agents/skills/run-system-tests/SKILL.md # .env # AGENTS.md # docs/development/intermediate/testing/index.md # mkdocs.yml # robot/docker/docker-compose.yaml # robot/ros_ws/src/sensors/lidar_point_cloud_filter/README.md # robot/ros_ws/src/sensors/lidar_point_cloud_filter/scripts/validate_lidar_filter_clouds.py # robot/ros_ws/src/sensors/lidar_point_cloud_filter/setup.cfg # robot/ros_ws/src/sensors/lidar_point_cloud_filter/setup.py # simulation/isaac-sim/extensions/PegasusSimulator # tests/README.md # tests/conftest.py # tests/parse_metrics.py # tests/pytest.ini # tests/sensor_probes.py # tests/test_liveliness.py

JohnYanxinLiu · 2026-07-01T01:38:42Z

 ```

-**One log file per test execution**, plus separate `airstack_env.*.log` files for fixture narration (the `up`/`down` of each parametrize tuple). The fixture log file is named to track the rewritten test ID so it lands next to the triggering test.
+There is **no `logs/` subdirectory**. Live output streams to the terminal during


Was there a reason logs were taken out? I feel they would be useful. Feel free to push back tho, I could see it going either way.

Basically, i created a single summary.txt that takes the key metrics from every run and puts all the info in one clean file. when we ran the logs, there were just way too many that were formatted weird. Now all the info from the logs gets cleanly formatted into a single file that you can read easily without diving in to a ton of logs

JohnYanxinLiu · 2026-07-01T01:45:12Z

Do you think we could actually start up a new file called end_to_end_testing.md instead of very specifically "fixed_trajectory_testing." This would match the rest of the structure and better set the precedent in the future for proper CI/CD structure.

Yeah that works for me John. We could do that for sure

And then we can have a specific fixed_trajectory testing subsection as well, but a lot of these things can build into stronger e2e tests later on I feel like.

Thanks for the e2e testing suggestion — I reworked the docs around it in this PR. Here's what I did and what I'm leaving as follow-up.

What changed (docs-only in this PR)

Renamed fixed_trajectory_testing.md → end_to_end_testing.md (via git mv, so history is preserved).

Reframed the page as the end-to-end testing home: the H1 is now End-to-End Testing with an intro defining e2e (the full flight chain — takeoff → action → land — exercising the whole stack rather than one module), and it lists the two e2e suites we have today: takeoff/hover/land (takeoff_hover_land) and the fixed-trajectory path-tracker benchmark (autonomy). The existing fixed-trajectory content now sits as a section under that page.

Updated every reference to match: the mkdocs.yml nav, the testing index.md link + heading, and tests/README.md.

Added a "Future work" note in the page itself so the intent isn't lost.

I kept this docs-only on purpose so the change stays focused and doesn't touch the test/CI interface in this PR.

Follow-up (separate PR)
The two flight suites are really the same 4-phase chain (px4_ready → takeoff → [hover | trajectory] → land) — they differ only in the middle phase and the swept parameter, and conftest.py already orders them with shared logic. So the follow-up is to unify the two marks into a single e2e mark (-m e2e instead of -m "takeoff_hover_land or autonomy"), and optionally factor the shared phases/helpers (px4_ready, takeoff, landing, odom + ground-truth capture) into a common base to remove the duplication.

I'm deferring that because it's cross-cutting: it touches pytest.ini, the conftest.py ordering, the CI workflow, the README/AGENTS mark tables, and the metrics baselines keyed on the current mark names — and renaming the marks changes the -m interface. Doing it here would bloat this PR and risk the baselines/CI, so I'd rather land it as its own change.

JohnYanxinLiu · 2026-07-01T01:58:21Z

+
+### Observed baseline (Circle, Isaac Sim, 10 headless runs)
+
+Validated on branch `pkumaraTrajectoryTesting` — see `tests/results/2026-06-05_18-26-52/summary.txt`:


I don't think we really need to reference internal branches. These docs are meant to become somewhat public. We may archive old branches. I think it's fine to just leave the suggested numbers. Also make sure to specify what hardware these are done on. AirStations? Cloud Instances (through github's CI/CD integration?) On your local machine?

I removed the mention of my internal branch and I made sure to say that it was on a AirStation where the code was validated.

JohnYanxinLiu · 2026-07-01T02:03:37Z

+
+## Running tests (complete CLI reference)
+
+### Prerequisites


Could these prereqs be moved to index.md, and then this doc just references index.md?

Yeah, this is actually a much better setup. I put the prereqs into index.md and then I added a reference in the .md file for the end to end testing referencing the index.md prereq section.

JohnYanxinLiu · 2026-07-01T02:05:40Z

+
+---
+
+## Path tracker bug fixes (this PR)


I feel like this should go into the PR template instead of into the direct documentation.

yeah lol, I kept that there for myself and forgot to remove it. I removed that section from the file now.

JohnYanxinLiu · 2026-07-01T02:08:36Z

+
+## Manual stack usage (without pytest)
+
+To fly a fixed trajectory interactively:


Isn't most of this stuff in the beginner docs for AirStack? Can we remove this fluff? Or is there something about this section that the other docs don't have?

Back here in the getting_started docs is probably a better place to be updating these docs:
(https://github.com/castacks/AirStack/blob/78f5772e8fc3a3bc28a9f5f0a1fbea4e4142975c/docs/getting_started/index.md)

I removed the initial boiler plate stuff that does the airstack up bring-up, the takeoff action block, the land action block, and the airstack down stuff and only left the fixed trajectory task dispatch command. I think we should just leave the fixed trajectory task stuff in this document because it is separate and more complicated from the getting started stuff.

JohnYanxinLiu · 2026-07-01T02:10:34Z

 for the multi-drone Pegasus script. Details: **`tests/README.md`** → *Isaac Sim and
 the sensors mark*.

+### Fixed-trajectory path-tracker benchmark


I guess once the fixed_trajectory.md file is renamed, this should be switched to e2e (end_to_end) benchmarking.

yep switched that with my new end to end testing file changes.

JohnYanxinLiu · 2026-07-01T02:55:45Z

        <param name="virtual_tracking_ahead_time" value="0.5" />
        <param name="min_virtual_tracking_velocity" value="0.5" />
-        <param name="sphere_radius" value="1.0" />
+        <param name="sphere_radius" value="2.0" />


Just curious, what was the reason for this?

yeah I think this is an artifact of me trying to fix why the drone was stalling during simulation, so I changed the radius to give the drone more look ahead so the pure pursuit path tracker could work better. It has no effect because the velocity_sphere_radius_multiplier=1.0 and that makes the radius velocity-proportional so this value is just a fallback and isn't actually used. I'll change it back to 1.0 to make it consistent though.

JohnYanxinLiu · 2026-07-01T02:56:13Z

        <param name="virtual_tracking_ahead_time" value="0.5" />
        <param name="min_virtual_tracking_velocity" value="0.5" />
-        <param name="sphere_radius" value="1.0" />
+        <param name="sphere_radius" value="2.0" />


Same question as in local.launch.xml. What is the purpose of this?

This is the same issue that was in the local.launch.xml, I will change it back to 1.0 to make it consistent.

JohnYanxinLiu · 2026-07-01T02:59:46Z

    liveliness: Container and process health (Docker, tmux, sentinel ROS 2 nodes)
    sensors: Sim and robot sensor topic rates, LiDAR validation, sim RTF
    takeoff_hover_land: End-to-end takeoff / hover / land action tests
+    autonomy: Fixed-pattern trajectory path-tracker benchmark (test_fixed_trajectory.py)


Following the e2e testing precedent above, it might be nice to begin putting things together. I feel takeoff_hover_land could be combined with your new autonomy testing to become a general e2e testing pipeline.

yeah I made a doc combining how everything is going to be lined out in this PR, but I think we should do the entire upheaval in another PR down the line to actually confirm all this so this PR doesn't get too big

JohnYanxinLiu · 2026-07-01T03:00:30Z

If we're combining other things into an e2e categorization, make sure to update this documentation

yeah once the next PR happens and we're good, I'm going to update the README documentation.

krrishj18 · 2026-07-01T16:35:26Z

 # `pytest tests/` and `airstack test -m unit` discover them without any
 # sys.path manipulation here.  Each proxy file sets up its own paths.
 RUN_DIR = None
-LOGS_DIR = None


Why are we removing this? Can't we leave it in as optional logs?

The logs are still all in the terminal, so you can technically still see all of them. The issue with the logs was that they spit out a bunch of different unstructured information for every test, so if you are doing tons of runs, thats a ton of tests that are super hard to wade through. The summary.txt file gives all of the information in the logs just structured and easy to read all in one place.

…kes the fixed value inert Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…d-trajectory doc Co-authored-by: Cursor <cursoragent@cursor.com>

…ption) Co-authored-by: Cursor <cursoragent@cursor.com>

…rted Co-authored-by: Cursor <cursoragent@cursor.com>

Rename fixed_trajectory_testing.md to end_to_end_testing.md (history preserved), add e2e intro and future-work note, fix stale test path to tests/system, and update mkdocs nav, testing index, and tests/README references. Co-authored-by: Cursor <cursoragent@cursor.com>

pvkumara5 and others added 30 commits April 27, 2026 05:15

Remove module docstring from test_fixed_trajectory.py

b616dbf

Made-with: Cursor

Aj/GitHub ci cd (#347)

d06effb

* Add link to PAT * Change to new orchestrator instance workflow * Add availability zone * Bump version to 0.18.0-alpha.7

Add fix for boot volume size blocking orchestrator

ee99e88

Add floating IPs to CI/CD

78f50e7

Bump gh runner_version to latest

d22aa32

Update cicd defaults

a74b591

Rename integration-tests.yml to system-tests.yml

0a2c7c8

Add debugging tips and add to mkdocs

862e12b

Use venv instead of pip3 to fix error: externally-managed-environment

6b70d16

Explicitly fail autonomy test if images not yet built

504644c

Enable using docker cache from docker registry to speed up docker ima…

8d200d2

…ge build tests for ci/cd

Fix bug

cbe56f9

Update docs and change docker image build/push to also run on self-ho…

5d494e3

…sted runner

Enable trigger docker build workflow on via manual dispatch

144a7fd

Increase instance volume size so that space doesn't run out when buil…

4a5f184

…ding docker images

Update to always try build all images

7512e5b

Create dummy file for docker compose push to pass

f27e432

Add omni_pass.env with guest access to AirLab nucleus

be68158

Update ci/cd tests to make sure image is present before running tests

5cd5f7e

Make sure images for profiles get built

70e1584

Update system tests to not build images if pull available

073aa69

Make build/pull quiet

c70e7fd

Pin empy version to fix ROS2 jazzy version bug

3aa9326

Switch image to desktop so that tests run successfully

3bbd970

Add docker image signing to workflow

bca80ef

Change pytest mark 'autonomy' to 'takeoff_hover_land'

53b1286

update comments on workflow

2e0130b

Recurisve checkout of airstack

6fa3843

Log more to GitHub

622e510

pvkumara5 added 7 commits June 9, 2026 14:22

Spherical lookahead bug that fixed the circle test and caused the cir…

cb9c96c

…cle test to pass

Added in code that consolidated all the results code so the user can …

59eef1b

…easily see their results in one file without having to wade through a ton of log files to get what they need

Results for 10 tries headless summary statistics

c6ba9d7

Fixed the logging files so now it only outputs one summary file and i…

df16cb9

…t doesn't inundate the user with a ton of log files for no reason

deleted cleanup_old_results.sh which was a local tool for cleaning up…

6e147ef

… everything

Added preliminary docs to explain changes made

e4df6c6

Changed .env to say 0.19.0-alpha.4

6182b93

pvkumara requested review from JohnYanxinLiu and krrishj18 June 23, 2026 18:47

pvkumara5 added 2 commits June 29, 2026 19:47

Resolved all the merge conflicts that are in this file

f288c37

JohnYanxinLiu reviewed Jul 1, 2026

View reviewed changes

krrishj18 reviewed Jul 1, 2026

View reviewed changes

pvkumara5 and others added 7 commits July 2, 2026 11:40

Revert sphere_radius to 1.0; velocity_sphere_radius_multiplier=1.0 ma…

8209379

…kes the fixed value inert Co-authored-by: Cursor <cursoragent@cursor.com>

Remove internal branch reference from baseline; note AirStation hardware

4d40505

Co-authored-by: Cursor <cursoragent@cursor.com>

Remove parameter tuning bullet from docs after reverting sphere_radius

70b091d

Co-authored-by: Cursor <cursoragent@cursor.com>

Move system-test prerequisites to index.md and reference it from fixe…

b232871

…d-trajectory doc Co-authored-by: Cursor <cursoragent@cursor.com>

Remove path tracker bug fixes section from docs (covered in PR descri…

d46ccf2

…ption) Co-authored-by: Cursor <cursoragent@cursor.com>

Trim duplicated stack bring-up from manual usage; link to Getting Sta…

f5d1b1e

…rted Co-authored-by: Cursor <cursoragent@cursor.com>


		### Observed baseline (Circle, Isaac Sim, 10 headless runs)

		Validated on branch `pkumaraTrajectoryTesting` — see `tests/results/2026-06-05_18-26-52/summary.txt`:


		## Manual stack usage (without pytest)

		To fly a fixed trajectory interactively:

Uh oh!

Conversation

pvkumara commented Jun 9, 2026

How did you implement it?

How do you run and use it?

Testing with PyTest

Documentation

Versioning

Uh oh!

JohnYanxinLiu Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JohnYanxinLiu Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JohnYanxinLiu Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

JohnYanxinLiu Jul 1, 2026 •

edited

Loading

JohnYanxinLiu Jul 1, 2026 •

edited

Loading

JohnYanxinLiu Jul 1, 2026 •

edited

Loading