Skip to content

feat(job): expose locate job definition#612

Open
RapidPoseidon wants to merge 6 commits into
mainfrom
feat(job)/expose-locate-job-definition
Open

feat(job): expose locate job definition#612
RapidPoseidon wants to merge 6 commits into
mainfrom
feat(job)/expose-locate-job-definition

Conversation

@RapidPoseidon

@RapidPoseidon RapidPoseidon commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What & why

Exposes the previously-private _create_locate_job_definition as a public client.job.create_locate_job_definition, so customers can create locate jobs (labelers tap the points in a datapoint that match an instruction) directly from the SDK — and documents it.

This closes the gap raised in two Poseidon sessions: customers were told a locate endpoint exists, but it was private and undocumented in the 3.x docs.

While documenting locate, we also fixed a latent bug across all the 3.x example docs: every example created a brand-new empty audience and immediately assigned a job to it. A freshly-created audience has no graduated labelers, so the job never collects responses and get_results() never returns — i.e. none of the examples actually ran.

Changes

Locate job (original scope)

  • rapidata_job_manager.py — rename _create_locate_job_definitioncreate_locate_job_definition (now public; mkdocstrings filters: ["!^_"] means only the public name renders in the API reference).
  • docs/examples/locate_job.md — new worked example, wired into mkdocs.yml nav.
  • docs/job_definition_parameters.md — locate added to the datapoints table, a dedicated Locate section, and the parameter-availability matrix.
  • docs/starting_page.md — locate added to the "What you can do" landing-page table.
  • Docstring opener kept as "With this order…" to stay consistent with the other seven create_*_job_definition methods.

Example docs that actually run (added in this PR)

  • docs/examples/classify_job.md and docs/examples/compare_job.md — restructured into Simple / Advanced content tabs (mirroring the 2.x docs):
    • Simple runs immediately against a curated audience (Coherence for classify, Alignment for compare) that already has graduated labelers.
    • Advanced builds a custom audience and trains labelers with qualification examples (add_classification_example / add_compare_example) before assigning the job.
  • docs/examples/locate_job.md — switched off the dead create_audience(...) onto the curated Global audience; no Advanced tab (the audience API has no locate qualification examples yet — noted inline).
  • docs/quickstart.md — same fix applied to Step 1 and the Complete Example.
  • All curated-audience lookups select the pool by exact namenext(a for a in client.audience.find_audiences("X") if a.name == "X") — because find_audiences does a substring match sorted newest-first, so [0] would grab a stale custom audience and re-introduce the empty-audience bug.

Consistency review (done in-session)

  • Naming follows the create_<type>_job_definition pattern of the public siblings (classification, compare). ✓
  • Shared parameter order matches siblings; correctly omits data_type / confidence_threshold / quorum_threshold, matching the parameter-availability matrix. ✓
  • Locate is now public across all three surfaces where it applies (order, validation, job). ✓
  • Example/quickstart audience selection is consistent across all four docs pages; verified against the live docs account that Coherence / Alignment / Global resolve and job definitions create. ✓
  • mkdocs build --strict introduces no new warnings (same 5 pre-existing mri.md + box.py baseline warnings). ✓

Note on deployment

docs.rapidata.ai is not auto-deployed on merge — the Deploy Documentation workflow is workflow_dispatch only. After merge, someone needs to run it (branch main, alias 3.x, set latest) for these docs + the new API-reference entry to go live.

🔗 Session: https://session-9964dbce.poseidon.rapidata.internal/

RapidPoseidon and others added 3 commits June 9, 2026 13:32
Promote create_locate_job_definition from private to public on the job
manager and add a Locate example to the docs (page + nav + llms.txt),
matching the existing classify/compare examples. The order-API equivalent
is deprecated in favour of jobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: karl <karl@rapidata.ai>
Add a Locate column to the parameter availability matrix, a datapoints
format row, and a short Locate Job note. Locate has no job-specific
parameters and does not accept data_type / confidence_threshold /
quorum_threshold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: karl <karl@rapidata.ai>
Add the Locate job type to the "What you can do" table on the docs
landing page so it sits alongside Compare and Classify now that it is
public. Revert the locate docstring opener from "With this job" back to
"With this order" to match the other seven create_*_job_definition
methods in the manager.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: karl <karl@rapidata.ai>
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR exposes the previously-private _create_locate_job_definition method as public (create_locate_job_definition) and documents it across the SDK's three surfaces (order, validation, job). The change is minimal and focused — a single rename plus documentation additions.


Code Quality

rapidata_job_manager.py

  • The rename is clean: no internal callers of the old _create_locate_job_definition name exist anywhere in the codebase, so there's no breakage risk.
  • Naming, parameter order, and signature all match the public siblings (create_classification_job_definition, create_compare_job_definition). ✓
  • from __future__ import annotations and TYPE_CHECKING are already correctly in place. ✓
  • The tracer span name ("JobManager.create_locate_job") omits the _definition suffix, matching the pattern used by create_compare_job and create_classification_job spans. ✓

Docstring opener ("With this order…")
The PR acknowledges this intentional consistency compromise. Worth noting that all seven sibling create_*_job_definition docstrings have the same technically-imprecise opener — fixing it across the board would be the cleaner long-term move, but that's out of scope here.


Documentation

  • locate_job.md: Follows the same structure as classify_job.md and compare_job.md. The annotation callout is helpful.
  • job_definition_parameters.md: Locate is correctly added to the datapoints table, gets its own section, and the parameter availability matrix accurately reflects the missing data_type / confidence_threshold / quorum_threshold. ✓
  • starting_page.md: The table entry is consistent with the other entries.
  • mkdocs.yml: Wired into both the plugins nav and the top-level nav. ✓

Actionable Suggestions

  1. No test coverage — this is the main gap. The method was presumably tested indirectly before (e.g., via integration tests for orders using LocateWorkflow), but there's no unit test asserting the method is callable, builds the correct workflow, or passes the correct args to _create_general_job_definition. A simple happy-path unit test would guard against regressions when the signature is later extended.

  2. responses_per_datapoint=35 in the example has no explanation — the classify example uses 25 with no comment either, so this is consistent, but 35 is a specific number that may confuse readers. A callout annotation (like the one used for the instruction) explaining it's just a representative value would help.

  3. Example doesn't exercise contexts or media_contexts — given the use-case (locating artifacts in AI-generated images), adding a contexts example showing the generation prompt for each image would demonstrate a common real-world pattern.

  4. Pyright: Per CLAUDE.md, pyright src/rapidata/rapidata_client should be run before merging to confirm the rename introduces no type errors.


Summary

Clean, well-scoped PR. The core change is a single-line rename with no downstream callers to update. Documentation is thorough and consistent. The only material gap is test coverage for the newly public method. Everything else is minor polish.

@Karl-The-Man Karl-The-Man requested a review from LinoGiger June 9, 2026 17:41
…ce results

The classify/compare/locate example pages created a brand-new empty audience
and immediately assigned a job to it. A fresh audience has no graduated
labelers, so the job never collected responses and the examples never produced
results.

Mirror the 2.x Basic/Advanced split using the 3.x audience model:
- Simple tab runs on a curated audience that already has trained labelers
  (classify -> Coherence, compare -> Alignment, locate -> Global Audience),
  selected by exact name so -created_at ordering can't pick a stale custom
  audience.
- Advanced tab (classify, compare) builds a custom audience and trains it with
  qualification examples before assigning the job. Locate has no Advanced tab
  because the audience API exposes no locate qualification examples yet.

Co-Authored-By: karl <karl@rapidata.ai>
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Code Review: feat(job): expose locate job definition

Overview

This PR exposes create_locate_job_definition as a public method (renaming it from _create_locate_job_definition) and adds documentation, including a new example page and updates to the parameter matrix and starting page. The classify and compare example docs are also refactored to a Simple/Advanced tab layout.

The core change is a single-character rename. The implementation was already solid — this PR makes it accessible to customers.


Code Quality

rapidata_job_manager.py — rename only (lines 501–548)

  • The rename is correct and consistent with the create_classification_job_definition / create_compare_job_definition naming pattern.
  • The method signature, docstring, and implementation are identical to the private version — no regressions introduced.

private_metadata is missing from the parameter availability matrix

All three public create_*_job_definition methods accept private_metadata, but it doesn't appear anywhere in docs/job_definition_parameters.md. This pre-dates this PR, but since the matrix is being updated here, it would be a good time to add it (or document it in the core parameters section) to keep the matrix accurate.


Documentation

docs/examples/locate_job.md (new)

Clear and well-scoped. The note that locate doesn't support custom qualification examples sets the right expectations.

The next(a for a in ... if a.name == ...) pattern is used in all three examples. It will raise StopIteration with a confusing traceback if the named audience doesn't exist (e.g. a typo, a region without that audience, or a renamed audience). Consider documenting this in the callout annotation, or using the safer next(..., None) form with an explicit check. The same issue exists in the updated classify and compare examples.

docs/job_definition_parameters.md

  • The matrix correctly shows contexts and media_contexts as available for Locate — matching the method signature.
  • The Locate section states "Locate has no job-specific parameters" — accurate and clear.
  • Minor inconsistency: the code sample uses bare filename strings ("image1.jpg", "image2.jpg") while the actual example uses full URLs. Using full URLs or placeholder URLs would be more consistent with the other sections.

docs/examples/classify_job.md and compare_job.md (refactored)

  • The Simple/Advanced tab structure is a meaningful improvement — it cleanly separates the "just run it" path from the "build your own audience" path.
  • The qualification example bias warning is a valuable addition.
  • In the Advanced classify tab, settings=[NoShuffleSetting()] is annotated on the add_classification_example call but not on the identical argument in the job_definition block just below. The asymmetry could confuse readers wondering whether it's intentional.

Potential Issues

  1. No tests — There are no new tests for create_locate_job_definition. If the private method had test coverage, verify those tests still pass under the renamed method. If there is no test coverage for job manager methods generally, this is a pre-existing gap rather than a regression introduced here.

  2. __all__ / exports — If any module uses __all__ to gate the public API, confirm create_locate_job_definition is reachable via client.job. Based on the pattern of the other public methods this should already be fine.


Summary

This is a clean, low-risk change. The rename is the right approach and the docs are accurate. The Simple/Advanced example refactor is a genuine improvement to the existing pages.

Suggested items before merge:

  • Should address: next(...) in all three examples raises StopIteration with no useful message when the audience name doesn't match. Document the risk in the callout or use next(..., None) with an explicit error.
  • Consider: Add private_metadata to the parameter availability matrix — this PR is already touching that table.
  • Minor: Replace bare filenames in the Locate docs section with URL examples for consistency with the other sections.

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR exposes the previously-private _create_locate_job_definition as a public create_locate_job_definition, adds a new locate job example page, and refactors the classify/compare example pages to show Simple/Advanced tabs with curated vs custom audiences. The core change is small and well-motivated.


Correctness Issues

1. Missing input validation on the newly-public method (src/rapidata/rapidata_client/job/rapidata_job_manager.py:501)

create_classification_job_definition and create_compare_job_definition both validate the datapoints argument at the start of the method body before doing anything else. create_locate_job_definition has no such check:

# classification does this:
if not isinstance(datapoints, list) or not all(
    isinstance(datapoint, str) for datapoint in datapoints
):
    raise ValueError("Datapoints must be a list of strings")

Since this is now a public API surface, passing a wrong type will currently fall through to an internal DatapointsValidator.map_datapoints call and produce a confusing error. Add the same guard for consistency.

2. Stale find_audiences pattern in docs/starting_page.md (lines 21 and 44)

The PR correctly migrates quickstart.md, classify_job.md, and compare_job.md to the safer next(a for a in ... if a.name == ...) pattern, but starting_page.md still uses the old find_audiences("alignment")[0] on both snippet tabs. This leaves incorrect example code on the landing page.


Minor Observations

3. next() without a default — silent StopIteration risk

The new pattern used across all examples:

audience = next(
    a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment"
)

…raises StopIteration if the audience doesn't exist (which bubbles up as a RuntimeError in some Python contexts). The old [0] raised IndexError. Neither is ideal for copy-paste docs. Consider adding a helpful default or a note, e.g.:

audience = next(
    (a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment"),
    None
)
if audience is None:
    raise ValueError("Curated 'Alignment' audience not found — check the Dashboard")

This is a docs concern rather than a blocker, but the landing-page examples especially will be copied verbatim by new users.

4. locate_job.md example uses responses_per_datapoint=35

The function signature defaults to 10. 35 is perfectly valid as an example value, but a brief comment (or a footnote annotation) explaining why locate uses a higher count than the default might help users calibrate.

5. Parameter matrix row for contexts / media_contexts in locate

The matrix marks both :white_check_mark: for locate, which matches the method signature — good. Just confirming the implementation actually passes them through DatapointsValidator.map_datapoints correctly (it does, checked in the source).


What's Well Done

  • The rename itself is a one-line change and correctly removes only the underscore prefix — no logic changes, low risk.
  • The docstring opener ("With this order…") is deliberately kept consistent with the seven sibling methods.
  • The parameter-availability matrix is accurate and clearly shows what locate omits (data_type, confidence_threshold, quorum_threshold, answer_options, a_b_names).
  • The Simple/Advanced tab refactor on classify_job.md and compare_job.md is a genuine improvement — the warning block about qualification example quality is particularly good.
  • The PR description notes the manual deploy step required for docs — good operational awareness.

Summary

Two things worth fixing before merge:

  1. Add the missing datapoints type guard in create_locate_job_definition to match the pattern of its public siblings.
  2. Update docs/starting_page.md lines 21 and 44 to use the new next(... if a.name == ...) pattern to keep the landing page consistent with the rest of the docs.

Everything else is advisory.

@LinoGiger LinoGiger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it also makes sense to add the ability to add locate examples to the audience

Comment thread docs/examples/classify_job.md Outdated
Comment thread docs/examples/classify_job.md Outdated
Comment thread docs/examples/compare_job.md Outdated
Comment thread docs/examples/compare_job.md Outdated
Comment thread docs/examples/locate_job.md Outdated
Comment thread docs/examples/locate_job.md Outdated
Addresses review feedback on PR #612:
- Replace the brittle next(... if a.name ==) audience lookup with
  get_audience_by_id across the example, quickstart and landing pages
  (Coherence=aud_mr3NbeWa4Uo, Alignment=aud_MU1GZYoESyO, global="global").
- Move the qualification-example caveat to a short reference link and add
  a warning above the Advanced code blocks that building a new audience
  takes significantly longer.
- Correct the locate prose: a locate job can run on any audience, not
  only the global one.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: karl <karl@rapidata.ai>
@LinoGiger LinoGiger self-requested a review June 11, 2026 13:18

@LinoGiger LinoGiger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is mostly good now, just missing the adding of the locate example. we can maybe also have an advanced example for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants