Add fixed-trajectory system tests with cross-track error metrics#365
Add fixed-trajectory system tests with cross-track error metrics#365pvkumara wants to merge 71 commits into
Conversation
New tests/test_fixed_trajectory.py evaluates drone performance on Circle, Figure8, Racetrack, and Line trajectories: takeoff -> execute -> land with cross-track error, path RMSE, execution time, and success metrics recorded to metrics.json for baseline comparison. - Python ideal-path generators mirror fixed_trajectory_task.cpp equations - Cross-track error uses robot pose snapshot at dispatch to transform base_link ideal path to world frame for odom comparison - 5m loose tolerance documents the known circle failure without stranding drone - conftest.py gains --trajectory-types CLI option and generalised phase-order sorting/ID-rewriting for both autonomy test modules - tests/README.md documents the new module, all 11 metrics, and run commands Made-with: Cursor
Made-with: Cursor
* Add link to PAT * Change to new orchestrator instance workflow * Add availability zone * Bump version to 0.18.0-alpha.7
…ge build tests for ci/cd
…ding docker images
…easily see their results in one file without having to wade through a ton of log files to get what they need
…t doesn't inundate the user with a ton of log files for no reason
…esting Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # .agents/skills/configure-multi-robot/SKILL.md # .agents/skills/run-system-tests/SKILL.md # .env # AGENTS.md # docs/development/intermediate/testing/index.md # mkdocs.yml # robot/docker/docker-compose.yaml # robot/ros_ws/src/sensors/lidar_point_cloud_filter/README.md # robot/ros_ws/src/sensors/lidar_point_cloud_filter/scripts/validate_lidar_filter_clouds.py # robot/ros_ws/src/sensors/lidar_point_cloud_filter/setup.cfg # robot/ros_ws/src/sensors/lidar_point_cloud_filter/setup.py # simulation/isaac-sim/extensions/PegasusSimulator # tests/README.md # tests/conftest.py # tests/parse_metrics.py # tests/pytest.ini # tests/sensor_probes.py # tests/test_liveliness.py
| ``` | ||
|
|
||
| **One log file per test execution**, plus separate `airstack_env.*.log` files for fixture narration (the `up`/`down` of each parametrize tuple). The fixture log file is named to track the rewritten test ID so it lands next to the triggering test. | ||
| There is **no `logs/` subdirectory**. Live output streams to the terminal during |
There was a problem hiding this comment.
Was there a reason logs were taken out? I feel they would be useful. Feel free to push back tho, I could see it going either way.
There was a problem hiding this comment.
Basically, i created a single summary.txt that takes the key metrics from every run and puts all the info in one clean file. when we ran the logs, there were just way too many that were formatted weird. Now all the info from the logs gets cleanly formatted into a single file that you can read easily without diving in to a ton of logs
There was a problem hiding this comment.
Do you think we could actually start up a new file called end_to_end_testing.md instead of very specifically "fixed_trajectory_testing." This would match the rest of the structure and better set the precedent in the future for proper CI/CD structure.
There was a problem hiding this comment.
Yeah that works for me John. We could do that for sure
There was a problem hiding this comment.
And then we can have a specific fixed_trajectory testing subsection as well, but a lot of these things can build into stronger e2e tests later on I feel like.
There was a problem hiding this comment.
Thanks for the e2e testing suggestion — I reworked the docs around it in this PR. Here's what I did and what I'm leaving as follow-up.
What changed (docs-only in this PR)
- Renamed
fixed_trajectory_testing.md→end_to_end_testing.md(viagit mv, so history is preserved). - Reframed the page as the end-to-end testing home: the H1 is now End-to-End Testing with an intro defining e2e (the full flight chain — takeoff → action → land — exercising the whole stack rather than one module), and it lists the two e2e suites we have today: takeoff/hover/land (
takeoff_hover_land) and the fixed-trajectory path-tracker benchmark (autonomy). The existing fixed-trajectory content now sits as a section under that page. - Updated every reference to match: the
mkdocs.ymlnav, the testingindex.mdlink + heading, andtests/README.md. - Added a "Future work" note in the page itself so the intent isn't lost.
I kept this docs-only on purpose so the change stays focused and doesn't touch the test/CI interface in this PR.
Follow-up (separate PR)
The two flight suites are really the same 4-phase chain (px4_ready → takeoff → [hover | trajectory] → land) — they differ only in the middle phase and the swept parameter, and conftest.py already orders them with shared logic. So the follow-up is to unify the two marks into a single e2e mark (-m e2e instead of -m "takeoff_hover_land or autonomy"), and optionally factor the shared phases/helpers (px4_ready, takeoff, landing, odom + ground-truth capture) into a common base to remove the duplication.
I'm deferring that because it's cross-cutting: it touches pytest.ini, the conftest.py ordering, the CI workflow, the README/AGENTS mark tables, and the metrics baselines keyed on the current mark names — and renaming the marks changes the -m interface. Doing it here would bloat this PR and risk the baselines/CI, so I'd rather land it as its own change.
|
|
||
| ### Observed baseline (Circle, Isaac Sim, 10 headless runs) | ||
|
|
||
| Validated on branch `pkumaraTrajectoryTesting` — see `tests/results/2026-06-05_18-26-52/summary.txt`: |
There was a problem hiding this comment.
I don't think we really need to reference internal branches. These docs are meant to become somewhat public. We may archive old branches. I think it's fine to just leave the suggested numbers. Also make sure to specify what hardware these are done on. AirStations? Cloud Instances (through github's CI/CD integration?) On your local machine?
There was a problem hiding this comment.
I removed the mention of my internal branch and I made sure to say that it was on a AirStation where the code was validated.
|
|
||
| ## Running tests (complete CLI reference) | ||
|
|
||
| ### Prerequisites |
There was a problem hiding this comment.
Could these prereqs be moved to index.md, and then this doc just references index.md?
There was a problem hiding this comment.
Yeah, this is actually a much better setup. I put the prereqs into index.md and then I added a reference in the .md file for the end to end testing referencing the index.md prereq section.
|
|
||
| --- | ||
|
|
||
| ## Path tracker bug fixes (this PR) |
There was a problem hiding this comment.
I feel like this should go into the PR template instead of into the direct documentation.
There was a problem hiding this comment.
yeah lol, I kept that there for myself and forgot to remove it. I removed that section from the file now.
|
|
||
| ## Manual stack usage (without pytest) | ||
|
|
||
| To fly a fixed trajectory interactively: |
There was a problem hiding this comment.
Isn't most of this stuff in the beginner docs for AirStack? Can we remove this fluff? Or is there something about this section that the other docs don't have?
Back here in the getting_started docs is probably a better place to be updating these docs:
(https://github.com/castacks/AirStack/blob/78f5772e8fc3a3bc28a9f5f0a1fbea4e4142975c/docs/getting_started/index.md)
There was a problem hiding this comment.
I removed the initial boiler plate stuff that does the airstack up bring-up, the takeoff action block, the land action block, and the airstack down stuff and only left the fixed trajectory task dispatch command. I think we should just leave the fixed trajectory task stuff in this document because it is separate and more complicated from the getting started stuff.
| for the multi-drone Pegasus script. Details: **`tests/README.md`** → *Isaac Sim and | ||
| the sensors mark*. | ||
|
|
||
| ### Fixed-trajectory path-tracker benchmark |
There was a problem hiding this comment.
I guess once the fixed_trajectory.md file is renamed, this should be switched to e2e (end_to_end) benchmarking.
There was a problem hiding this comment.
yep switched that with my new end to end testing file changes.
| <param name="virtual_tracking_ahead_time" value="0.5" /> | ||
| <param name="min_virtual_tracking_velocity" value="0.5" /> | ||
| <param name="sphere_radius" value="1.0" /> | ||
| <param name="sphere_radius" value="2.0" /> |
There was a problem hiding this comment.
Just curious, what was the reason for this?
There was a problem hiding this comment.
yeah I think this is an artifact of me trying to fix why the drone was stalling during simulation, so I changed the radius to give the drone more look ahead so the pure pursuit path tracker could work better. It has no effect because the velocity_sphere_radius_multiplier=1.0 and that makes the radius velocity-proportional so this value is just a fallback and isn't actually used. I'll change it back to 1.0 to make it consistent though.
| <param name="virtual_tracking_ahead_time" value="0.5" /> | ||
| <param name="min_virtual_tracking_velocity" value="0.5" /> | ||
| <param name="sphere_radius" value="1.0" /> | ||
| <param name="sphere_radius" value="2.0" /> |
There was a problem hiding this comment.
Same question as in local.launch.xml. What is the purpose of this?
There was a problem hiding this comment.
This is the same issue that was in the local.launch.xml, I will change it back to 1.0 to make it consistent.
| liveliness: Container and process health (Docker, tmux, sentinel ROS 2 nodes) | ||
| sensors: Sim and robot sensor topic rates, LiDAR validation, sim RTF | ||
| takeoff_hover_land: End-to-end takeoff / hover / land action tests | ||
| autonomy: Fixed-pattern trajectory path-tracker benchmark (test_fixed_trajectory.py) |
There was a problem hiding this comment.
Following the e2e testing precedent above, it might be nice to begin putting things together. I feel takeoff_hover_land could be combined with your new autonomy testing to become a general e2e testing pipeline.
There was a problem hiding this comment.
yeah I made a doc combining how everything is going to be lined out in this PR, but I think we should do the entire upheaval in another PR down the line to actually confirm all this so this PR doesn't get too big
There was a problem hiding this comment.
If we're combining other things into an e2e categorization, make sure to update this documentation
There was a problem hiding this comment.
yeah once the next PR happens and we're good, I'm going to update the README documentation.
| # `pytest tests/` and `airstack test -m unit` discover them without any | ||
| # sys.path manipulation here. Each proxy file sets up its own paths. | ||
| RUN_DIR = None | ||
| LOGS_DIR = None |
There was a problem hiding this comment.
Why are we removing this? Can't we leave it in as optional logs?
There was a problem hiding this comment.
The logs are still all in the terminal, so you can technically still see all of them. The issue with the logs was that they spit out a bunch of different unstructured information for every test, so if you are doing tons of runs, thats a ton of tests that are super hard to wade through. The summary.txt file gives all of the information in the logs just structured and easy to read all in one place.
…kes the fixed value inert Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…d-trajectory doc Co-authored-by: Cursor <cursoragent@cursor.com>
…ption) Co-authored-by: Cursor <cursoragent@cursor.com>
…rted Co-authored-by: Cursor <cursoragent@cursor.com>
Rename fixed_trajectory_testing.md to end_to_end_testing.md (history preserved), add e2e intro and future-work note, fix stale test path to tests/system, and update mkdocs nav, testing index, and tests/README references. Co-authored-by: Cursor <cursoragent@cursor.com>
- What features did you add and/or bugs did you address?*

- Which GitHub issue does this address?
This PR does not address any GitHub issues but instead adds a new fixed-trajectory test suite for automatic path tracking error in sim.
- Additional description if not fully described in the GitHub issue
This PR adds automated fixed-trajectory evaluation tests for the autonomy stack and fixes trajectory-tracking bugs that caused path execution to fail. It also improves the test results workflow so maintainers get one readable summary file instead of many per-test logs.
https://youtu.be/zaaZqLUzqZ8
How did you implement it?
How do you run and use it?
The exact workflow of running all these tests is to simply do airstack up and then create a test based on your needs using the global CLI options. Some basic tests that I ran to validate this testing stack is included below. The global CLI options are also included below.
Testing with PyTest
airstack test -m ...A maintainer should see that all the tests have passed in their console once they input an airstack test command and they should go to the testing folder, isolate the folder that has their test and open their summary.txt file for the test in question to see all the outputted metrics from the test.

Documentation
Yes, the mkdocs.yml was updated to make a trajectory testing page.
Yes, there is now a docs that explains the full pipeline for trajectory testing.
I believe that there is sufficient visual media from the YouTube Video above but I can generate more if needed.
Versioning
.envfile according to semantic versioning?Yes, the versioning was changed to 0.19.0-alpha.4.