add eval capes to sdk by luke-e-schaefer · Pull Request #460 · scaleapi/nucleus-python-client

luke-e-schaefer · 2026-05-12T18:19:19Z

resolves https://linear.app/scale-epd/issue/DE-7460

tests wont pass until https://github.com/scaleapi/scaleapi/pull/142963 is merged

Greptile Summary

This PR adds a full Evaluations V2 SDK surface to the Nucleus Python client — COCO-style detection metrics on model runs stored as evaluation_match_v2 rows. Three new NucleusClient methods (create_evaluation_v2, get_evaluation_v2, list_evaluations_v2) and an EvaluationV2 resource class cover the complete lifecycle.

EvaluationV2 (new dataclass): supports wait_for_completion(), charts() (mAP, confusion matrix, PR curve, TIDE), examples() (paginated TP/FP/FN rows), refresh(), and delete(). Status comparisons against the str, Enum EvaluationV2Status work correctly.
DTOs (EvaluationV2Charts, EvaluationV2ExamplesPage, EvaluationV2MatchExample, EvaluationV2FilterArgs): nullable fields that could be absent for FN/FP rows are correctly declared Optional with = None defaults; camelCase filter serialization is well-tested.
Tests: comprehensive unit coverage via mocked connections, including filter serialization, pagination, polling, delete, and error paths.

Confidence Score: 5/5

Safe to merge — new functionality only, no changes to existing paths, and nullable DTO fields are correctly handled.

The change is entirely additive: new files, new public exports, and three new NucleusClient methods that follow existing delegation patterns. The only finding is a wrong release-tag URL in CHANGELOG.md, which has no runtime impact. DTO nullable fields (iou, prediction_metadata, item_metadata) are correctly declared Optional, the str-enum status comparisons are sound, and the test suite covers the key code paths with mocked connections.

No files require special attention.

Important Files Changed

Filename	Overview
nucleus/evaluation_v2.py	New EvaluationV2 resource class with full lifecycle: create, poll, charts, examples, delete. Logic and comparisons are correct for str-enum status fields.
nucleus/data_transfer_object/evaluation_v2.py	New Pydantic DTOs for filters, charts, and match examples. Nullable fields are correctly declared Optional with defaults; camelCase filter serialization helper is well-tested.
nucleus/init.py	Adds create_evaluation_v2, get_evaluation_v2, and list_evaluations_v2 to NucleusClient; exports all new public types. Follows existing patterns for make_request/get/post delegation.
tests/test_evaluation_v2.py	Unit tests covering filters, pagination, wait-for-completion, delete, and error paths using mocked connections. Good coverage of the new SDK surface.
CHANGELOG.md	Adds 0.18.4 entry, but the hyperlink in the header incorrectly points to the v0.18.3 release tag instead of v0.18.4.
docs/index.rst	Adds Evaluations V2 section with a working code example; correct Sphinx cross-references to new methods.

Sequence Diagram

sequenceDiagram
    participant User
    participant NucleusClient
    participant API

    User->>NucleusClient: create_evaluation_v2(model_run_id, ...)
    NucleusClient->>API: "POST modelRun/{id}/evaluationsV2"
    API-->>NucleusClient: "{evaluation_id}"
    NucleusClient->>API: "GET evaluationsV2/{evaluation_id}"
    API-->>NucleusClient: EvaluationV2 payload
    NucleusClient-->>User: EvaluationV2

    loop poll until terminal
        User->>NucleusClient: wait_for_completion()
        NucleusClient->>API: "GET evaluationsV2/{id}"
        API-->>NucleusClient: "{status}"
    end

    User->>NucleusClient: "charts(iou_threshold=0.5)"
    NucleusClient->>API: "GET evaluationsV2/{id}/charts?iouThreshold=0.5"
    API-->>NucleusClient: EvaluationV2Charts
    NucleusClient-->>User: EvaluationV2Charts

    User->>NucleusClient: "examples(match_type=FP, limit=20)"
    NucleusClient->>API: "POST evaluationsV2/{id}/examples"
    API-->>NucleusClient: EvaluationV2ExamplesPage
    NucleusClient-->>User: EvaluationV2ExamplesPage

    User->>NucleusClient: delete()
    NucleusClient->>API: "DELETE evaluationsV2/{id}"
    API-->>NucleusClient: 200/204

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
CHANGELOG.md:8
The release-tag URL in the 0.18.4 header points to `v0.18.3` instead of `v0.18.4`, so the changelog link will resolve to the wrong release.

```suggestion
## [0.18.4](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.18.4) - 2026-05-28
```

_{Reviews (7): Last reviewed commit: "Merge branch 'add-eval-capabilities' of ..." | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

edwinpav

Overall nice work!

Two main things:

I'd make sure that the user-facing docs/descriptions are not overly complex. Not everyone will know or even care about how the function works behind the scenes, just care what are the params, what are the returns, and the feature that the method provides.
If you want to deploy a new sdk version with these changes, two more files need to be changed and added to this pr:
1. CHANGELOG.md should be updated. The tag link that the CHANGELOG references will be created after this pr is merged into master. You'd add a new release with a new tag here: https://github.com/scaleapi/nucleus-python-client/releases. Feel free to ping for any questions! The process isn't super clear lol
2. The sdk version under tool.poetry should be updated in pyproject.toml
  (see #457 as a reference pr)

edwinpav · 2026-05-27T15:46:48Z

+        self.__dict__.update(updated.__dict__)
+        return self
+
+    def wait_for_completion(


Is this needed because this is not integrated with NucleusJobs? I thought this type of functionality comes built in for the other async functions (dedup async also uses temporal)

correct yeah I don't have any ties back to the nuc jobs currently (since this stuff isn't "technically" in nucleus)...I could set that up tho that would be simple

oh i see, ig if it's in the nucleus sdk might be worth doing that if it's simple. if it shows up on the nucleus jobs page ui that's probably fine but that's probably a call you have more context on to make

yeah i think thats fine too. I'll run that in its own PR set tho after this one (i'll have to update scaleapi too)

…ucleus-python-client into add-eval-capabilities

add eval capes to sdk

4c6083e

luke-e-schaefer requested review from edwinpav and vinay553 May 12, 2026 18:19

luke-e-schaefer self-assigned this May 12, 2026

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Comment thread nucleus/data_transfer_object/evaluation_v2.py Outdated

Comment thread nucleus/__init__.py Outdated

luke-e-schaefer and others added 2 commits May 12, 2026 13:49

Apply suggestion from @greptile-apps[bot]

36f6b4a

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Apply suggestion from @greptile-apps[bot]

3caaf8d

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Comment thread nucleus/data_transfer_object/evaluation_v2.py Outdated

luke-e-schaefer and others added 3 commits May 12, 2026 14:03

run hooks

13a91b2

merge remote

cce066e

Update nucleus/data_transfer_object/evaluation_v2.py

aced4aa

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Comment thread nucleus/data_transfer_object/evaluation_v2.py

fix p1

866ac71

edwinpav reviewed May 27, 2026

View reviewed changes

luke-e-schaefer added 2 commits May 28, 2026 17:31

address comments

6582163

Merge branch 'master' into add-eval-capabilities

ff6e671

luke-e-schaefer requested a review from edwinpav May 28, 2026 22:55

luke-e-schaefer added 2 commits May 28, 2026 18:21

fix lint

f88b665

Merge branch 'add-eval-capabilities' of https://github.com/scaleapi/n…

cd38ab6

…ucleus-python-client into add-eval-capabilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add eval capes to sdk#460

add eval capes to sdk#460
luke-e-schaefer wants to merge 11 commits into
masterfrom
add-eval-capabilities

luke-e-schaefer commented May 12, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edwinpav left a comment

Uh oh!

Uh oh!

Uh oh!

edwinpav May 27, 2026

Uh oh!

luke-e-schaefer May 28, 2026

Uh oh!

edwinpav May 28, 2026

Uh oh!

luke-e-schaefer May 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luke-e-schaefer commented May 12, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edwinpav left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

edwinpav May 27, 2026

Choose a reason for hiding this comment

Uh oh!

luke-e-schaefer May 28, 2026

Choose a reason for hiding this comment

Uh oh!

edwinpav May 28, 2026

Choose a reason for hiding this comment

Uh oh!

luke-e-schaefer May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luke-e-schaefer commented May 12, 2026 •

edited by greptile-apps Bot

Loading