Retry transient failures during upstream sync#900
Conversation
…-upstream-sync-retries Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com>
OWASP#898) * Improved boilerplate, github link scrolling to readme, and completing the list of standards with the new AI ones * Update Search.tsx * Improved boilerplate, github link scrolling to readme, and completing… * Update Search.tsx Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com> --------- Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com> Co-authored-by: Rob van der Veer <rob@vdveer.net>
|
Warning Review limit reached
More reviews will be available in 10 minutes and 9 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (27)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
northdpole
left a comment
There was a problem hiding this comment.
Thanks for the work on #471 — a lot of effort here.
I'm leaving this as a review comment (not an approval) because #900 is several PRs in a trenchcoat: the title describes retry logic (~50 lines), but the diff is ~1700 lines across parsers, JSON data, scripts, and tests. Last updated 2026-06-08, so not stale — but too large to review or merge as one piece. Please close #900 without merging and open separate PRs, one per concern, each with its own CI run.
The retry logic is excellent and very useful — thank you. That should be its own small PR against current main.
JSON mappings
We don't normally merge mappings in JSON format in the repository. In this case they will be useful for validating the GSoC ETL work, so we can use them in an open PR — as test/fixture/benchmark data for Modules B/C/D, not wired in as production importers under application/utils/.../data/.
Please put the JSON in something like application/tests/fixtures/owasp_mappings/ with a small test that loads them and sanity-checks schema and cre_id references.
How to split (commands)
cd OpenCRE
git fetch origin main
git fetch origin review/issue-471-upstream-sync-retries
MONO=origin/review/issue-471-upstream-sync-retries
# PR A — upstream retry only
git checkout -b feat/upstream-sync-retry origin/main
git checkout $MONO -- application/cmd/cre_main.py application/tests/cre_main_test.py
# Edit cre_main.py: keep fetch_upstream_json + download_graph_from_upstream wiring only.
# Remove: parse_file, owasp_* register_resource blocks, unrelated hunks.
# Edit cre_main_test.py: keep upstream retry tests only.
make lint && make mypy
python -m pytest application/tests/cre_main_test.py -k upstream -q
git add application/cmd/cre_main.py application/tests/cre_main_test.py
git commit -S -m "feat: retry transient failures during upstream sync (#471)"
gh pr create --base main --title "feat: retry transient failures during upstream sync" \
--body "Split from #900."
# PR B — GSoC mapping fixtures (JSON only, no production parsers)
git checkout -b test/gsoc-owasp-mapping-fixtures origin/main
mkdir -p application/tests/fixtures/owasp_mappings
git checkout $MONO -- application/utils/external_project_parsers/data/owasp_*.json
git mv application/utils/external_project_parsers/data/owasp_*.json \
application/tests/fixtures/owasp_mappings/
# Add application/tests/owasp_mapping_fixtures_test.py (load + schema/cre_id checks)
python -m pytest application/tests/owasp_mapping_fixtures_test.py -q
git add -A && git commit -S -m "test: OWASP mapping fixtures for GSoC ETL validation"
gh pr create --base main --title "test: OWASP mapping fixtures for GSoC ETL validation" \
--body "Split from #900. Fixture data only — not production importers."
# Close #900 once A and B are open:
# gh pr close 900 --comment "Superseded by #___ (retry) and #___ (fixtures). Thanks!"Merge retry first, then fixtures. Rebase each onto main before merge.
Related PRs #877 / #858 / #863 overlap the same JSON stack — please close or retarget the same way.
Happy to review the split PRs quickly once they're up.
Summary
This PR improves reliability by retrying transient failures during upstream synchronization.
This is the fifth upstream PR in the stacked #471 review series.
What changed
Validation
Why this is split out
The full #471 work is too large to review effectively as one PR.
This PR isolates one OWASP resource family so the parser/data model can be reviewed independently before the later Kubernetes, cheat sheet, backend analysis, and frontend changes.