add eval capes to sdk#460
Conversation
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
edwinpav
left a comment
There was a problem hiding this comment.
Overall nice work!
Two main things:
- I'd make sure that the user-facing docs/descriptions are not overly complex. Not everyone will know or even care about how the function works behind the scenes, just care what are the params, what are the returns, and the feature that the method provides.
- If you want to deploy a new sdk version with these changes, two more files need to be changed and added to this pr:
-
CHANGELOG.mdshould be updated. The tag link that the CHANGELOG references will be created after this pr is merged into master. You'd add a new release with a new tag here: https://github.com/scaleapi/nucleus-python-client/releases. Feel free to ping for any questions! The process isn't super clear lol -
The sdk
versionundertool.poetryshould be updated inpyproject.toml
(see #457 as a reference pr)
-
| self.__dict__.update(updated.__dict__) | ||
| return self | ||
|
|
||
| def wait_for_completion( |
There was a problem hiding this comment.
Is this needed because this is not integrated with NucleusJobs? I thought this type of functionality comes built in for the other async functions (dedup async also uses temporal)
There was a problem hiding this comment.
correct yeah I don't have any ties back to the nuc jobs currently (since this stuff isn't "technically" in nucleus)...I could set that up tho that would be simple
There was a problem hiding this comment.
oh i see, ig if it's in the nucleus sdk might be worth doing that if it's simple. if it shows up on the nucleus jobs page ui that's probably fine but that's probably a call you have more context on to make
There was a problem hiding this comment.
yeah i think thats fine too. I'll run that in its own PR set tho after this one (i'll have to update scaleapi too)
…ucleus-python-client into add-eval-capabilities
resolves https://linear.app/scale-epd/issue/DE-7460
tests wont pass until https://github.com/scaleapi/scaleapi/pull/142963 is merged
Greptile Summary
This PR adds a full Evaluations V2 SDK surface to the Nucleus Python client — COCO-style detection metrics on model runs stored as
evaluation_match_v2rows. Three newNucleusClientmethods (create_evaluation_v2,get_evaluation_v2,list_evaluations_v2) and anEvaluationV2resource class cover the complete lifecycle.EvaluationV2(new dataclass): supportswait_for_completion(),charts()(mAP, confusion matrix, PR curve, TIDE),examples()(paginated TP/FP/FN rows),refresh(), anddelete(). Status comparisons against thestr, EnumEvaluationV2Statuswork correctly.EvaluationV2Charts,EvaluationV2ExamplesPage,EvaluationV2MatchExample,EvaluationV2FilterArgs): nullable fields that could be absent for FN/FP rows are correctly declaredOptionalwith= Nonedefaults; camelCase filter serialization is well-tested.Confidence Score: 5/5
Safe to merge — new functionality only, no changes to existing paths, and nullable DTO fields are correctly handled.
The change is entirely additive: new files, new public exports, and three new NucleusClient methods that follow existing delegation patterns. The only finding is a wrong release-tag URL in CHANGELOG.md, which has no runtime impact. DTO nullable fields (iou, prediction_metadata, item_metadata) are correctly declared Optional, the str-enum status comparisons are sound, and the test suite covers the key code paths with mocked connections.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant User participant NucleusClient participant API User->>NucleusClient: create_evaluation_v2(model_run_id, ...) NucleusClient->>API: "POST modelRun/{id}/evaluationsV2" API-->>NucleusClient: "{evaluation_id}" NucleusClient->>API: "GET evaluationsV2/{evaluation_id}" API-->>NucleusClient: EvaluationV2 payload NucleusClient-->>User: EvaluationV2 loop poll until terminal User->>NucleusClient: wait_for_completion() NucleusClient->>API: "GET evaluationsV2/{id}" API-->>NucleusClient: "{status}" end User->>NucleusClient: "charts(iou_threshold=0.5)" NucleusClient->>API: "GET evaluationsV2/{id}/charts?iouThreshold=0.5" API-->>NucleusClient: EvaluationV2Charts NucleusClient-->>User: EvaluationV2Charts User->>NucleusClient: "examples(match_type=FP, limit=20)" NucleusClient->>API: "POST evaluationsV2/{id}/examples" API-->>NucleusClient: EvaluationV2ExamplesPage NucleusClient-->>User: EvaluationV2ExamplesPage User->>NucleusClient: delete() NucleusClient->>API: "DELETE evaluationsV2/{id}" API-->>NucleusClient: 200/204Prompt To Fix All With AI
Reviews (7): Last reviewed commit: "Merge branch 'add-eval-capabilities' of ..." | Re-trigger Greptile