refactor(assets): centralize asset upload and compare truth translation#616
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: lino <lino@rapidata.ai>
Code ReviewOverviewThis PR centralizes a repeated "upload media asset / wrap text" pattern across six call sites into two methods on Positives
Issues / Suggestions1. Behavior change: The old This is strictly better behavior (fail fast beats silent corruption), but it is a breaking behavior change that isn't flagged in the PR description. If any existing caller constructs a 2. Pre-upload failures are silently discarded In if assets_to_upload:
AssetUploadOrchestrator(self._openapi_service).upload_all_assets(
assets_to_upload
)
3. AssetUploadOrchestrator(self._openapi_service).upload_all_assets(assets_to_upload)The 4. if rapid.media_context:
assets_to_upload.update(rapid.media_context)If VerdictApprove with minor notes. The core refactoring is clean, the bug fix is correct, and the pre-upload batching for validation sets is a genuine win. The main thing worth flagging before merge is the |
What
Asset upload +
IAssetInputconstruction was duplicated across six call sites (datapoint uploader, validation rapid uploader, audience example handler ×3, benchmark participant), each re-implementing the same "if media: upload then wrap as ExistingAssetInput/MultiAssetInput, else wrap as TextAssetInput" branching. This PR centralizes it and fixes a real bug in the audience flow. No public API changes.1. Single entry point for "asset → IAssetInput"
AssetUploader.build_asset_input(asset, data_type)andbuild_asset_input_with_names(...)(which also returns the original-path → uploaded-name map needed for compare-truth translation). All six call sites now use them;media_contextcall sites keep usingupload_and_map_asset.2. Shared, strict compare-truth translation
ValidationRapidUploader._translate_compare_truthmoved todatapoints/_truth_translator.pyas a module-level function, handling bothCompareTruthandMultiCompareTruth. Behavior change: if a referenced asset is not in the name map on a media rapid, it now raises a clearValueErrorinstead of silently shipping the raw path/URL aswinnerIdto the backend.3. Bug fix: audience compare-truth examples
AudienceExampleHandler._add_rapid_examplenever translated compare truths at all — media compare rapids added as audience examples sent the caller's raw local path/URL aswinnerIdinstead of the uploaded asset name. It now runs the shared translation before theIValidationTruthModel→IExampleTruthdict conversion.4. Perf: validation sets pre-upload assets
ValidationSetManager._submitnow collects every media asset (includingmedia_context) across all rapids and runsAssetUploadOrchestrator.upload_all_assetsbefore the per-rapid loop. SinceAssetUploadercaches by path/URL, the per-rapid uploads become cache hits — validation sets get the same batched-URL + parallel-file upload path the dataset flow already has. Pre-upload failures don't abort the loop; the per-rapid upload still raises so the existingfailed_rapidsreporting keeps working.5. Dead code removal
RapidataOrderManagerandRapidataJobManagerconstructed anAssetUploaderthey never used — field and imports removed.Validation
uv run pyright src/rapidata/rapidata_client→ 0 errors, 0 warningsuv run python -c "from rapidata import RapidataClient"→ OK🔗 Session: session-916a26db
🤖 Generated with Claude Code