From 190c1f1bba777d5e49333bb5980d1fce9e9a3680 Mon Sep 17 00:00:00 2001 From: RapidPoseidon Date: Tue, 9 Jun 2026 13:32:17 +0000 Subject: [PATCH 01/10] feat(job): expose locate job definition Promote create_locate_job_definition from private to public on the job manager and add a Locate example to the docs (page + nav + llms.txt), matching the existing classify/compare examples. The order-API equivalent is deprecated in favour of jobs. Co-Authored-By: Claude Opus 4.8 (1M context) Co-Authored-By: karl --- docs/examples/locate_job.md | 35 +++++++++++++++++++ mkdocs.yml | 2 ++ .../job/rapidata_job_manager.py | 4 +-- 3 files changed, 39 insertions(+), 2 deletions(-) create mode 100644 docs/examples/locate_job.md diff --git a/docs/examples/locate_job.md b/docs/examples/locate_job.md new file mode 100644 index 000000000..601dedab5 --- /dev/null +++ b/docs/examples/locate_job.md @@ -0,0 +1,35 @@ +# Locate Job Example + +To learn about the basics of creating a job, please refer to the [quickstart guide](../quickstart.md). + +In a locate job, labelers tap the points in a datapoint that match your instruction. In this example, we ask people to point out visual artifacts in AI-generated images — a common way to find where a generator went wrong. + +```python +from rapidata import RapidataClient + +IMAGE_URLS = [ + "https://assets.rapidata.ai/eac11c3e-ad57-402b-90ed-23378d2ff869.jpg", + "https://assets.rapidata.ai/04e7e3c6-5554-47ca-bdb2-950e48ac3e6c.jpg", + "https://assets.rapidata.ai/91d9913c-b399-47f8-ad19-767798cc951c.jpg", +] + +client = RapidataClient() + +audience = client.audience.create_audience(name="Artifact Detection Audience") + +job_definition = client.job.create_locate_job_definition( + name="Artifact Detection Example", + instruction="Tap on any visual glitches or errors in the image.", # (1)! + datapoints=IMAGE_URLS, + responses_per_datapoint=35, +) + +job_definition.preview() + +job = audience.assign_job(job_definition) +job.display_progress_bar() +results = job.get_results() +print(results) +``` + +1. The instruction tells labelers what to locate. Each response is the set of points they tapped on that datapoint. diff --git a/mkdocs.yml b/mkdocs.yml index f46934382..793dc2714 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -68,6 +68,7 @@ plugins: Examples: - examples/classify_job.md - examples/compare_job.md + - examples/locate_job.md Benchmarks: - mri.md - mri_advanced.md @@ -124,6 +125,7 @@ nav: - Examples: - Classification: examples/classify_job.md - Comparison: examples/compare_job.md + - Locate: examples/locate_job.md - Ranking Flows: flows.md - Model Ranking: - Getting Started: mri.md diff --git a/src/rapidata/rapidata_client/job/rapidata_job_manager.py b/src/rapidata/rapidata_client/job/rapidata_job_manager.py index 4bc01ae4f..292218505 100644 --- a/src/rapidata/rapidata_client/job/rapidata_job_manager.py +++ b/src/rapidata/rapidata_client/job/rapidata_job_manager.py @@ -498,7 +498,7 @@ def _create_select_words_job_definition( settings=settings, ) - def _create_locate_job_definition( + def create_locate_job_definition( self, name: str, instruction: str, @@ -511,7 +511,7 @@ def _create_locate_job_definition( ) -> RapidataJobDefinition: """Create a locate job definition. - With this order you can have people locate specific objects in a datapoint (image, text, video, audio). + With this job you can have people locate specific objects in a datapoint (image, text, video, audio). The annotators will be shown a datapoint and will be asked to select locations based on the instruction. Args: From 1e80746e90db0baf8a54444f8be65a3cfcef28ed Mon Sep 17 00:00:00 2001 From: RapidPoseidon Date: Tue, 9 Jun 2026 13:49:20 +0000 Subject: [PATCH 02/10] docs(job): add locate to parameter reference Add a Locate column to the parameter availability matrix, a datapoints format row, and a short Locate Job note. Locate has no job-specific parameters and does not accept data_type / confidence_threshold / quorum_threshold. Co-Authored-By: Claude Opus 4.8 (1M context) Co-Authored-By: karl --- docs/job_definition_parameters.md | 41 ++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 14 deletions(-) diff --git a/docs/job_definition_parameters.md b/docs/job_definition_parameters.md index 188c8b6e7..b0b5697f2 100644 --- a/docs/job_definition_parameters.md +++ b/docs/job_definition_parameters.md @@ -67,6 +67,7 @@ The data to be labeled. The format depends on the job type: |----------|--------|-------------| | Classification | `list[str]` | Single items to classify | | Compare | `list[list[str]]` | Pairs of items (exactly 2 per inner list) | +| Locate | `list[str]` | Single items to locate within | **Supported Formats:** @@ -337,21 +338,33 @@ job_definition = client.job.create_compare_job_definition( ) ``` +### Locate Job + +Locate has no job-specific parameters — it uses only the core parameters. The `instruction` describes what labelers should locate, and each response is the set of points they tapped on the datapoint. + +```python +job_definition = client.job.create_locate_job_definition( + name="Artifact Detection", + instruction="Tap on any visual glitches or errors in the image.", + datapoints=["image1.jpg", "image2.jpg"], +) +``` + --- ## Parameter Availability Matrix -| Parameter | Classification | Compare | -|-----------|:-:|:-:| -| `name` | :white_check_mark: | :white_check_mark: | -| `instruction` | :white_check_mark: | :white_check_mark: | -| `datapoints` | :white_check_mark: | :white_check_mark: | -| `responses_per_datapoint` | :white_check_mark: | :white_check_mark: | -| `data_type` | :white_check_mark: | :white_check_mark: | -| `contexts` | :white_check_mark: | :white_check_mark: | -| `media_contexts` | :white_check_mark: | :white_check_mark: | -| `confidence_threshold` | :white_check_mark: | :white_check_mark: | -| `quorum_threshold` | :white_check_mark: | :white_check_mark: | -| `settings` | :white_check_mark: | :white_check_mark: | -| `answer_options` | :white_check_mark: | :x: | -| `a_b_names` | :x: | :white_check_mark: | +| Parameter | Classification | Compare | Locate | +|-----------|:-:|:-:|:-:| +| `name` | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| `instruction` | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| `datapoints` | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| `responses_per_datapoint` | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| `data_type` | :white_check_mark: | :white_check_mark: | :x: | +| `contexts` | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| `media_contexts` | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| `confidence_threshold` | :white_check_mark: | :white_check_mark: | :x: | +| `quorum_threshold` | :white_check_mark: | :white_check_mark: | :x: | +| `settings` | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| `answer_options` | :white_check_mark: | :x: | :x: | +| `a_b_names` | :x: | :white_check_mark: | :x: | From 93aba031bd9f56f1d765171cf660d7d2a30baad4 Mon Sep 17 00:00:00 2001 From: RapidPoseidon Date: Tue, 9 Jun 2026 14:03:01 +0000 Subject: [PATCH 03/10] docs(job): list locate on landing page and align docstring wording Add the Locate job type to the "What you can do" table on the docs landing page so it sits alongside Compare and Classify now that it is public. Revert the locate docstring opener from "With this job" back to "With this order" to match the other seven create_*_job_definition methods in the manager. Co-Authored-By: Claude Opus 4.8 (1M context) Co-Authored-By: karl --- docs/starting_page.md | 1 + src/rapidata/rapidata_client/job/rapidata_job_manager.py | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/starting_page.md b/docs/starting_page.md index 63b902496..d11ccb664 100644 --- a/docs/starting_page.md +++ b/docs/starting_page.md @@ -149,5 +149,6 @@ The SDK is built around three concepts: |---|---|---| | **Compare** | Side-by-side comparison of images, video, audio, or text | [Comparison example](examples/compare_job.md) | | **Classify** | Categorize data with custom labels or Likert scales | [Classification example](examples/classify_job.md) | +| **Locate** | Point out objects, artifacts, or regions within an image | [Locate example](examples/locate_job.md) | | **Rank models** | Benchmark AI models on leaderboards with human evaluation | [Model Ranking](mri.md) | | **Continuous ranking** | Lightweight ongoing ranking without full job setup | [Ranking Flows](flows.md) | diff --git a/src/rapidata/rapidata_client/job/rapidata_job_manager.py b/src/rapidata/rapidata_client/job/rapidata_job_manager.py index 292218505..e83172de2 100644 --- a/src/rapidata/rapidata_client/job/rapidata_job_manager.py +++ b/src/rapidata/rapidata_client/job/rapidata_job_manager.py @@ -511,7 +511,7 @@ def create_locate_job_definition( ) -> RapidataJobDefinition: """Create a locate job definition. - With this job you can have people locate specific objects in a datapoint (image, text, video, audio). + With this order you can have people locate specific objects in a datapoint (image, text, video, audio). The annotators will be shown a datapoint and will be asked to select locations based on the instruction. Args: From 290e60b3d581e8020aacd37b5e51a978953f048b Mon Sep 17 00:00:00 2001 From: Poseidon Date: Thu, 11 Jun 2026 08:58:00 +0000 Subject: [PATCH 04/10] fix(docs): run example jobs on curated/custom audiences so they produce results The classify/compare/locate example pages created a brand-new empty audience and immediately assigned a job to it. A fresh audience has no graduated labelers, so the job never collected responses and the examples never produced results. Mirror the 2.x Basic/Advanced split using the 3.x audience model: - Simple tab runs on a curated audience that already has trained labelers (classify -> Coherence, compare -> Alignment, locate -> Global Audience), selected by exact name so -created_at ordering can't pick a stale custom audience. - Advanced tab (classify, compare) builds a custom audience and trains it with qualification examples before assigning the job. Locate has no Advanced tab because the audience API exposes no locate qualification examples yet. Co-Authored-By: karl --- docs/examples/classify_job.md | 151 +++++++++++++++++++++--------- docs/examples/compare_job.md | 171 ++++++++++++++++++++++++---------- docs/examples/locate_job.md | 11 ++- 3 files changed, 240 insertions(+), 93 deletions(-) diff --git a/docs/examples/classify_job.md b/docs/examples/classify_job.md index 5f7dd807d..8c97c7904 100644 --- a/docs/examples/classify_job.md +++ b/docs/examples/classify_job.md @@ -2,46 +2,111 @@ To learn about the basics of creating a job, please refer to the [quickstart guide](../quickstart.md). -In this example, we want to rate different images based on a Likert scale to assess how well generated images match their descriptions. The `NoShuffleSetting` setting ensures answer options remain in order since they represent a scale. - -```python -from rapidata import RapidataClient, NoShuffleSetting - -IMAGE_URLS = [ - "https://assets.rapidata.ai/tshirt-4o.png", - "https://assets.rapidata.ai/tshirt-aurora.jpg", - "https://assets.rapidata.ai/teamleader-aurora.jpg", -] - -CONTEXTS = ["A t-shirt with the text 'Running on caffeine & dreams'"] * len(IMAGE_URLS) - -client = RapidataClient() - -audience = client.audience.create_audience(name="Likert Scale Audience") -audience.add_classification_example( - instruction="How well does the image match the description?", - answer_options=["1: Not at all", "2: A little", "3: Moderately", "4: Very well", "5: Perfectly"], - datapoint="https://assets.rapidata.ai/tshirt-4o.png", - truth=["5: Perfectly"], - context="A t-shirt with the text 'Running on caffeine & dreams'" -) - -job_definition = client.job.create_classification_job_definition( - name="Likert Scale Example", - instruction="How well does the image match the description?", - answer_options=["1: Not at all", "2: A little", "3: Moderately", "4: Very well", "5: Perfectly"], - contexts=CONTEXTS, - datapoints=IMAGE_URLS, - responses_per_datapoint=25, - settings=[NoShuffleSetting()] # (1)! -) - -job_definition.preview() - -job = audience.assign_job(job_definition) -job.display_progress_bar() -results = job.get_results() -print(results) -``` - -1. Keeps the answer options in their specified order. Without this, options are randomized to reduce bias — but for Likert scales you want them ordered. +In this example, we rate images on a Likert scale to assess how well generated images match their descriptions. The `NoShuffleSetting` keeps the answer options in order, since they represent a scale. + +=== "Simple" + + The simple version runs straight away on a **curated** audience — a pre-existing pool of trained labelers — so the job starts collecting responses immediately. + + ```python + from rapidata import RapidataClient, NoShuffleSetting + + IMAGE_URLS = [ + "https://assets.rapidata.ai/tshirt-4o.png", + "https://assets.rapidata.ai/tshirt-aurora.jpg", + "https://assets.rapidata.ai/teamleader-aurora.jpg", + ] + + CONTEXTS = ["A t-shirt with the text 'Running on caffeine & dreams'"] * len(IMAGE_URLS) + + client = RapidataClient() + + audience = next( # (1)! + a for a in client.audience.find_audiences("Coherence") if a.name == "Coherence" + ) + + job_definition = client.job.create_classification_job_definition( + name="Likert Scale Example", + instruction="How well does the image match the description?", + answer_options=["1: Not at all", "2: A little", "3: Moderately", "4: Very well", "5: Perfectly"], + contexts=CONTEXTS, + datapoints=IMAGE_URLS, + responses_per_datapoint=25, + settings=[NoShuffleSetting()] # (2)! + ) + + job_definition.preview() + + job = audience.assign_job(job_definition) + job.display_progress_bar() + results = job.get_results() + print(results) + ``` + + 1. Grabs the curated **Coherence** audience, which already has trained labelers. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses — see the Advanced tab for how to build and train your own. You can browse the curated audiences in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). + 2. Keeps the answer options in their specified order. Without this, options are randomized to reduce bias — but for Likert scales you want them ordered. + +=== "Advanced" + + The advanced version builds a **custom** audience and trains labelers with qualification examples before running the job. Only labelers who answer the examples correctly join the audience, which raises label quality on nuanced tasks. + + ```python + from rapidata import RapidataClient, NoShuffleSetting + + IMAGE_URLS = [ + "https://assets.rapidata.ai/tshirt-4o.png", + "https://assets.rapidata.ai/tshirt-aurora.jpg", + "https://assets.rapidata.ai/teamleader-aurora.jpg", + ] + + CONTEXTS = ["A t-shirt with the text 'Running on caffeine & dreams'"] * len(IMAGE_URLS) + + ANSWER_OPTIONS = ["1: Not at all", "2: A little", "3: Moderately", "4: Very well", "5: Perfectly"] + + # Qualification examples — each pairs an image with a description and the + # correct rating. Use only examples whose truth is clear and unambiguous. + EXAMPLES = [ + ("https://assets.rapidata.ai/tshirt-4o.png", "A t-shirt with the text 'Running on caffeine & dreams'", "5: Perfectly"), + ("https://assets.rapidata.ai/flux_duck.jpg", "A psychedelic duck with glasses", "5: Perfectly"), + ("https://assets.rapidata.ai/flux_flower.jpg", "A yellow flower sticking out of a green pot", "5: Perfectly"), + ("https://assets.rapidata.ai/teamleader-aurora.jpg", "A t-shirt with the text 'Running on caffeine & dreams'", "1: Not at all"), + ("https://assets.rapidata.ai/flux_book.jpg", "A psychedelic duck with glasses", "1: Not at all"), + ("https://assets.rapidata.ai/flux_duck.jpg", "A small blue book sitting on a large red book", "1: Not at all"), + ] + + client = RapidataClient() + + audience = client.audience.create_audience(name="Likert Scale Audience") # (1)! + for datapoint, context, truth in EXAMPLES: + audience.add_classification_example( + instruction="How well does the image match the description?", + answer_options=ANSWER_OPTIONS, + datapoint=datapoint, + truth=[truth], + context=context, + settings=[NoShuffleSetting()] # (2)! + ) + + job_definition = client.job.create_classification_job_definition( + name="Likert Scale Example", + instruction="How well does the image match the description?", + answer_options=ANSWER_OPTIONS, + contexts=CONTEXTS, + datapoints=IMAGE_URLS, + responses_per_datapoint=25, + settings=[NoShuffleSetting()] + ) + + job_definition.preview() + + job = audience.assign_job(job_definition) + job.display_progress_bar() + results = job.get_results() + print(results) + ``` + + 1. Creates a new, empty audience. The `add_classification_example` calls below define who qualifies to join it. + 2. Qualify labelers on the same UI they'll see in the job. Since the job uses `NoShuffleSetting`, the examples use it too — see [Custom Audiences](../audiences.md#matching-the-job-ui-with-settings). + + !!! warning "Review your qualification examples carefully" + Every qualification example and its truth must be reviewed before use. An example with a wrong or ambiguous truth filters out good labelers while letting bad ones through — inverting your quality control. Use only examples with a clear, unambiguous correct answer, and add more than the few shown here for production workloads. See [Custom Audiences](../audiences.md) for the full guide. diff --git a/docs/examples/compare_job.md b/docs/examples/compare_job.md index a0e3d722e..d3b4aede9 100644 --- a/docs/examples/compare_job.md +++ b/docs/examples/compare_job.md @@ -4,50 +4,127 @@ To learn about the basics of creating a job, please refer to the [quickstart gui In this example, we compare images from two image generation models (Flux and Midjourney) to determine which more accurately follows the given prompts. -```python -from rapidata import RapidataClient - -PROMPTS = [ - "A sign that says 'Diffusion'.", - "A yellow flower sticking out of a green pot.", - "hyperrealism render of a surreal alien humanoid.", - "psychedelic duck", - "A small blue book sitting on a large red book." -] - -IMAGE_PAIRS = [ - ["https://assets.rapidata.ai/flux_sign_diffusion.jpg", "https://assets.rapidata.ai/mj_sign_diffusion.jpg"], - ["https://assets.rapidata.ai/flux_flower.jpg", "https://assets.rapidata.ai/mj_flower.jpg"], - ["https://assets.rapidata.ai/flux_alien.jpg", "https://assets.rapidata.ai/mj_alien.jpg"], - ["https://assets.rapidata.ai/flux_duck.jpg", "https://assets.rapidata.ai/mj_duck.jpg"], - ["https://assets.rapidata.ai/flux_book.jpg", "https://assets.rapidata.ai/mj_book.jpg"] -] - -client = RapidataClient() - -audience = client.audience.create_audience(name="Prompt Alignment Audience") -audience.add_compare_example( - instruction="Which image follows the prompt more accurately?", - datapoint=[ - "https://assets.rapidata.ai/flux_sign_diffusion.jpg", - "https://assets.rapidata.ai/mj_sign_diffusion.jpg" - ], - truth="https://assets.rapidata.ai/flux_sign_diffusion.jpg", - context="A sign that says 'Diffusion'." -) - -job_definition = client.job.create_compare_job_definition( - name="Example Image Prompt Alignment Job", - instruction="Which image follows the prompt more accurately?", - datapoints=IMAGE_PAIRS, - responses_per_datapoint=25, - contexts=PROMPTS -) - -job_definition.preview() - -job = audience.assign_job(job_definition) -job.display_progress_bar() -results = job.get_results() -print(results) -``` +=== "Simple" + + The simple version runs straight away on a **curated** audience — a pre-existing pool of trained labelers — so the job starts collecting responses immediately. + + ```python + from rapidata import RapidataClient + + PROMPTS = [ + "A sign that says 'Diffusion'.", + "A yellow flower sticking out of a green pot.", + "hyperrealism render of a surreal alien humanoid.", + "psychedelic duck", + "A small blue book sitting on a large red book." + ] + + IMAGE_PAIRS = [ + ["https://assets.rapidata.ai/flux_sign_diffusion.jpg", "https://assets.rapidata.ai/mj_sign_diffusion.jpg"], + ["https://assets.rapidata.ai/flux_flower.jpg", "https://assets.rapidata.ai/mj_flower.jpg"], + ["https://assets.rapidata.ai/flux_alien.jpg", "https://assets.rapidata.ai/mj_alien.jpg"], + ["https://assets.rapidata.ai/flux_duck.jpg", "https://assets.rapidata.ai/mj_duck.jpg"], + ["https://assets.rapidata.ai/flux_book.jpg", "https://assets.rapidata.ai/mj_book.jpg"] + ] + + client = RapidataClient() + + audience = next( # (1)! + a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment" + ) + + job_definition = client.job.create_compare_job_definition( + name="Example Image Prompt Alignment Job", + instruction="Which image follows the prompt more accurately?", + datapoints=IMAGE_PAIRS, + responses_per_datapoint=25, + contexts=PROMPTS + ) + + job_definition.preview() + + job = audience.assign_job(job_definition) + job.display_progress_bar() + results = job.get_results() + print(results) + ``` + + 1. Grabs the curated **Alignment** audience, which already has trained labelers. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses — see the Advanced tab for how to build and train your own. You can browse the curated audiences in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). + +=== "Advanced" + + The advanced version builds a **custom** audience and trains labelers with qualification examples before running the job. Only labelers who pick the correct image on the examples join the audience, which raises label quality. + + ```python + from rapidata import RapidataClient + + PROMPTS = [ + "A sign that says 'Diffusion'.", + "A yellow flower sticking out of a green pot.", + "hyperrealism render of a surreal alien humanoid.", + "psychedelic duck", + "A small blue book sitting on a large red book." + ] + + IMAGE_PAIRS = [ + ["https://assets.rapidata.ai/flux_sign_diffusion.jpg", "https://assets.rapidata.ai/mj_sign_diffusion.jpg"], + ["https://assets.rapidata.ai/flux_flower.jpg", "https://assets.rapidata.ai/mj_flower.jpg"], + ["https://assets.rapidata.ai/flux_alien.jpg", "https://assets.rapidata.ai/mj_alien.jpg"], + ["https://assets.rapidata.ai/flux_duck.jpg", "https://assets.rapidata.ai/mj_duck.jpg"], + ["https://assets.rapidata.ai/flux_book.jpg", "https://assets.rapidata.ai/mj_book.jpg"] + ] + + # Qualification pairs where the first (Flux) image clearly follows the prompt + # better. The truth must point at the unambiguously better image. + QUALIFICATION_PAIRS = [ + ["https://assets.rapidata.ai/flux_sign_diffusion.jpg", "https://assets.rapidata.ai/mj_sign_diffusion.jpg"], + ["https://assets.rapidata.ai/flux_duck.jpg", "https://assets.rapidata.ai/mj_duck.jpg"], + ["https://assets.rapidata.ai/flux_book.jpg", "https://assets.rapidata.ai/mj_book.jpg"], + ["https://assets.rapidata.ai/flux_flower.jpg", "https://assets.rapidata.ai/mj_flower.jpg"], + ["https://assets.rapidata.ai/flux_store_front.jpg", "https://assets.rapidata.ai/mj_store_front.jpg"], + ["https://assets.rapidata.ai/flux_hand.jpg", "https://assets.rapidata.ai/mj_hand.jpg"], + ["https://assets.rapidata.ai/flux_traffic_lights.jpg", "https://assets.rapidata.ai/mj_traffic_lights.jpg"], + ["https://assets.rapidata.ai/flux_plane.jpg", "https://assets.rapidata.ai/mj_plane.jpg"], + ] + QUALIFICATION_PROMPTS = [ + "A sign that says 'Diffusion'.", + "A psychedelic duck with glasses", + "A small blue book sitting on a large red book.", + "A yellow flower sticking out of a bright green pot.", + "A store front with 'hello world' written on it.", + "A yellow hand on a black stone.", + "A green, yellow and red traffic light.", + "A plane flying over a person.", + ] + + client = RapidataClient() + + audience = client.audience.create_audience(name="Custom Prompt Alignment Audience") # (1)! + for prompt, datapoint in zip(QUALIFICATION_PROMPTS, QUALIFICATION_PAIRS): + audience.add_compare_example( + instruction="Which image follows the prompt more accurately?", + datapoint=datapoint, + truth=datapoint[0], + context=prompt + ) + + job_definition = client.job.create_compare_job_definition( + name="Example Image Prompt Alignment Job", + instruction="Which image follows the prompt more accurately?", + datapoints=IMAGE_PAIRS, + responses_per_datapoint=25, + contexts=PROMPTS + ) + + job_definition.preview() + + job = audience.assign_job(job_definition) + job.display_progress_bar() + results = job.get_results() + print(results) + ``` + + 1. Creates a new, empty audience. The `add_compare_example` calls train and filter the labelers who join it. + + !!! warning "Review your qualification examples carefully" + Every qualification example and its truth must be reviewed before use. An example with a wrong or ambiguous truth filters out good labelers while letting bad ones through — inverting your quality control. Use only pairs with a clearly better image, and add more than the few shown here for production workloads. See [Custom Audiences](../audiences.md) for the full guide. diff --git a/docs/examples/locate_job.md b/docs/examples/locate_job.md index 601dedab5..785764042 100644 --- a/docs/examples/locate_job.md +++ b/docs/examples/locate_job.md @@ -4,6 +4,8 @@ To learn about the basics of creating a job, please refer to the [quickstart gui In a locate job, labelers tap the points in a datapoint that match your instruction. In this example, we ask people to point out visual artifacts in AI-generated images — a common way to find where a generator went wrong. +Locate jobs run on the **global** audience — the broadest pool of labelers, ready to work immediately. Unlike classification and comparison, locate tasks don't support custom qualification examples yet, so there's no custom-audience variant for this job type. + ```python from rapidata import RapidataClient @@ -15,11 +17,13 @@ IMAGE_URLS = [ client = RapidataClient() -audience = client.audience.create_audience(name="Artifact Detection Audience") +audience = next( # (1)! + a for a in client.audience.find_audiences("Global Audience") if a.name == "Global Audience" +) job_definition = client.job.create_locate_job_definition( name="Artifact Detection Example", - instruction="Tap on any visual glitches or errors in the image.", # (1)! + instruction="Tap on any visual glitches or errors in the image.", # (2)! datapoints=IMAGE_URLS, responses_per_datapoint=35, ) @@ -32,4 +36,5 @@ results = job.get_results() print(results) ``` -1. The instruction tells labelers what to locate. Each response is the set of points they tapped on that datapoint. +1. The global audience already has labelers ready to work. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses. +2. The instruction tells labelers what to locate. Each response is the set of points they tapped on that datapoint. From 142133a4b94385c13b72c8eff2a79d92463d6838 Mon Sep 17 00:00:00 2001 From: Karl-The-Man Date: Thu, 11 Jun 2026 11:53:45 +0200 Subject: [PATCH 05/10] docs(quickstart): update audience retrieval method for accuracy --- docs/quickstart.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/quickstart.md b/docs/quickstart.md index 9c256f918..a3bca5326 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -61,10 +61,12 @@ client = RapidataClient(client_id="Your client ID", client_secret="Your client s The simplest way to get started is with a curated audience: ```py -audience = client.audience.find_audiences("alignment")[0] # (1)! +audience = next( # (1)! + a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment" +) ``` -1. Curated audiences are pre-existing pools of labelers trained on a specific type of task. +1. Curated audiences are pre-existing pools of labelers trained on a specific type of task. `find_audiences` returns matches ordered by recency, so selecting by exact name avoids accidentally picking a custom audience that happens to match the search term. You can browse the curated audiences in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). !!! note The curated audience gets you started quickly, but results may be less accurate than a custom audience trained with examples specific to your task. For higher quality, see [Custom Audiences](audiences.md). @@ -165,7 +167,9 @@ from rapidata import RapidataClient client = RapidataClient() -audience = client.audience.find_audiences("alignment")[0] +audience = next( + a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment" +) job_definition = client.job.create_compare_job_definition( name="Example Image Prompt Alignment", From f9537dea6bda7c1dc97be26e14e80d97e91d5608 Mon Sep 17 00:00:00 2001 From: RapidPoseidon Date: Thu, 11 Jun 2026 12:22:37 +0000 Subject: [PATCH 06/10] docs(examples): use get_audience_by_id and fix locate audience guidance Addresses review feedback on PR #612: - Replace the brittle next(... if a.name ==) audience lookup with get_audience_by_id across the example, quickstart and landing pages (Coherence=aud_mr3NbeWa4Uo, Alignment=aud_MU1GZYoESyO, global="global"). - Move the qualification-example caveat to a short reference link and add a warning above the Advanced code blocks that building a new audience takes significantly longer. - Correct the locate prose: a locate job can run on any audience, not only the global one. Co-Authored-By: Claude Opus 4.8 Co-Authored-By: karl --- docs/examples/classify_job.md | 13 +++++++------ docs/examples/compare_job.md | 13 +++++++------ docs/examples/locate_job.md | 8 +++----- docs/quickstart.md | 10 +++------- docs/starting_page.md | 4 ++-- 5 files changed, 22 insertions(+), 26 deletions(-) diff --git a/docs/examples/classify_job.md b/docs/examples/classify_job.md index 8c97c7904..4415ece55 100644 --- a/docs/examples/classify_job.md +++ b/docs/examples/classify_job.md @@ -21,9 +21,7 @@ In this example, we rate images on a Likert scale to assess how well generated i client = RapidataClient() - audience = next( # (1)! - a for a in client.audience.find_audiences("Coherence") if a.name == "Coherence" - ) + audience = client.audience.get_audience_by_id("aud_mr3NbeWa4Uo") # (1)! job_definition = client.job.create_classification_job_definition( name="Likert Scale Example", @@ -43,13 +41,16 @@ In this example, we rate images on a Likert scale to assess how well generated i print(results) ``` - 1. Grabs the curated **Coherence** audience, which already has trained labelers. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses — see the Advanced tab for how to build and train your own. You can browse the curated audiences in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). + 1. Looks up the curated **Coherence** audience by id, which already has trained labelers. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses — see the Advanced tab for how to build and train your own. You can browse the curated audiences and copy their ids from the [Rapidata Dashboard](https://app.rapidata.ai/audiences). 2. Keeps the answer options in their specified order. Without this, options are randomized to reduce bias — but for Likert scales you want them ordered. === "Advanced" The advanced version builds a **custom** audience and trains labelers with qualification examples before running the job. Only labelers who answer the examples correctly join the audience, which raises label quality on nuanced tasks. + !!! warning "This takes significantly longer" + Unlike the Simple path, this first builds and trains an entirely new audience before the job can start collecting responses — expect it to take considerably longer to return results. + ```python from rapidata import RapidataClient, NoShuffleSetting @@ -108,5 +109,5 @@ In this example, we rate images on a Likert scale to assess how well generated i 1. Creates a new, empty audience. The `add_classification_example` calls below define who qualifies to join it. 2. Qualify labelers on the same UI they'll see in the job. Since the job uses `NoShuffleSetting`, the examples use it too — see [Custom Audiences](../audiences.md#matching-the-job-ui-with-settings). - !!! warning "Review your qualification examples carefully" - Every qualification example and its truth must be reviewed before use. An example with a wrong or ambiguous truth filters out good labelers while letting bad ones through — inverting your quality control. Use only examples with a clear, unambiguous correct answer, and add more than the few shown here for production workloads. See [Custom Audiences](../audiences.md) for the full guide. + !!! note + Review every qualification example and its truth carefully, and add more than the few shown here for production workloads — see [Custom Audiences](../audiences.md) for the full guide. diff --git a/docs/examples/compare_job.md b/docs/examples/compare_job.md index d3b4aede9..8fe9de3ee 100644 --- a/docs/examples/compare_job.md +++ b/docs/examples/compare_job.md @@ -29,9 +29,7 @@ In this example, we compare images from two image generation models (Flux and Mi client = RapidataClient() - audience = next( # (1)! - a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment" - ) + audience = client.audience.get_audience_by_id("aud_MU1GZYoESyO") # (1)! job_definition = client.job.create_compare_job_definition( name="Example Image Prompt Alignment Job", @@ -49,12 +47,15 @@ In this example, we compare images from two image generation models (Flux and Mi print(results) ``` - 1. Grabs the curated **Alignment** audience, which already has trained labelers. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses — see the Advanced tab for how to build and train your own. You can browse the curated audiences in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). + 1. Looks up the curated **Alignment** audience by id, which already has trained labelers. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses — see the Advanced tab for how to build and train your own. You can browse the curated audiences and copy their ids from the [Rapidata Dashboard](https://app.rapidata.ai/audiences). === "Advanced" The advanced version builds a **custom** audience and trains labelers with qualification examples before running the job. Only labelers who pick the correct image on the examples join the audience, which raises label quality. + !!! warning "This takes significantly longer" + Unlike the Simple path, this first builds and trains an entirely new audience before the job can start collecting responses — expect it to take considerably longer to return results. + ```python from rapidata import RapidataClient @@ -126,5 +127,5 @@ In this example, we compare images from two image generation models (Flux and Mi 1. Creates a new, empty audience. The `add_compare_example` calls train and filter the labelers who join it. - !!! warning "Review your qualification examples carefully" - Every qualification example and its truth must be reviewed before use. An example with a wrong or ambiguous truth filters out good labelers while letting bad ones through — inverting your quality control. Use only pairs with a clearly better image, and add more than the few shown here for production workloads. See [Custom Audiences](../audiences.md) for the full guide. + !!! note + Review every qualification example and its truth carefully, and add more than the few shown here for production workloads — see [Custom Audiences](../audiences.md) for the full guide. diff --git a/docs/examples/locate_job.md b/docs/examples/locate_job.md index 785764042..c0ced588a 100644 --- a/docs/examples/locate_job.md +++ b/docs/examples/locate_job.md @@ -4,7 +4,7 @@ To learn about the basics of creating a job, please refer to the [quickstart gui In a locate job, labelers tap the points in a datapoint that match your instruction. In this example, we ask people to point out visual artifacts in AI-generated images — a common way to find where a generator went wrong. -Locate jobs run on the **global** audience — the broadest pool of labelers, ready to work immediately. Unlike classification and comparison, locate tasks don't support custom qualification examples yet, so there's no custom-audience variant for this job type. +Like any other job, a locate job can be assigned to any audience. This example uses the **global** audience — the broadest pool of labelers, ready to work immediately — so it starts collecting responses right away. ```python from rapidata import RapidataClient @@ -17,9 +17,7 @@ IMAGE_URLS = [ client = RapidataClient() -audience = next( # (1)! - a for a in client.audience.find_audiences("Global Audience") if a.name == "Global Audience" -) +audience = client.audience.get_audience_by_id("global") # (1)! job_definition = client.job.create_locate_job_definition( name="Artifact Detection Example", @@ -36,5 +34,5 @@ results = job.get_results() print(results) ``` -1. The global audience already has labelers ready to work. A freshly created audience has no qualified labelers yet, so a job assigned to it would never collect responses. +1. The global audience (id `global`) already has labelers ready to work, so the job starts collecting responses immediately. You can assign a locate job to any audience — browse them in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). 2. The instruction tells labelers what to locate. Each response is the set of points they tapped on that datapoint. diff --git a/docs/quickstart.md b/docs/quickstart.md index a3bca5326..4f40bbdab 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -61,12 +61,10 @@ client = RapidataClient(client_id="Your client ID", client_secret="Your client s The simplest way to get started is with a curated audience: ```py -audience = next( # (1)! - a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment" -) +audience = client.audience.get_audience_by_id("aud_MU1GZYoESyO") # (1)! ``` -1. Curated audiences are pre-existing pools of labelers trained on a specific type of task. `find_audiences` returns matches ordered by recency, so selecting by exact name avoids accidentally picking a custom audience that happens to match the search term. You can browse the curated audiences in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). +1. Curated audiences are pre-existing pools of labelers trained on a specific type of task — this is the **Alignment** audience. You can browse the curated audiences and copy their ids from the [Rapidata Dashboard](https://app.rapidata.ai/audiences). !!! note The curated audience gets you started quickly, but results may be less accurate than a custom audience trained with examples specific to your task. For higher quality, see [Custom Audiences](audiences.md). @@ -167,9 +165,7 @@ from rapidata import RapidataClient client = RapidataClient() -audience = next( - a for a in client.audience.find_audiences("Alignment") if a.name == "Alignment" -) +audience = client.audience.get_audience_by_id("aud_MU1GZYoESyO") job_definition = client.job.create_compare_job_definition( name="Example Image Prompt Alignment", diff --git a/docs/starting_page.md b/docs/starting_page.md index d11ccb664..e2e6b1c6e 100644 --- a/docs/starting_page.md +++ b/docs/starting_page.md @@ -18,7 +18,7 @@ The SDK has three building blocks: **audiences** (who labels), **job definitions client = RapidataClient() - audience = client.audience.find_audiences("alignment")[0] + audience = client.audience.get_audience_by_id("aud_MU1GZYoESyO") job_definition = client.job.create_compare_job_definition( name="Example Image Comparison", @@ -41,7 +41,7 @@ The SDK has three building blocks: **audiences** (who labels), **job definitions client = RapidataClient() - audience = client.audience.find_audiences("alignment")[0] + audience = client.audience.get_audience_by_id("aud_MU1GZYoESyO") job_definition = client.job.create_compare_job_definition( name="Example Video Comparison", From 2ada47a1074d88d3a28073e2c842020c58694f0c Mon Sep 17 00:00:00 2001 From: RapidPoseidon Date: Thu, 11 Jun 2026 14:08:30 +0000 Subject: [PATCH 07/10] feat(audience): add add_locate_example for locate qualification audiences Lets users train a custom audience for locate jobs, mirroring the existing add_classification_example / add_compare_example. Locate truths are bounding boxes, so the method takes a list[Box]; the generic example endpoint and the LocateExamplePayload / LocateExampleTruth models already exist, so no backend change is needed. - Box.to_example_model() converts to ExampleBoxShape (0-100 scale). - Extract calculate_boxes_coverage() into box.py (shared with RapidsManager) and use it as the example's randomCorrectProbability. - Add a Simple/Advanced tab layout to the locate docs example. Co-Authored-By: Claude Opus 4.8 Co-Authored-By: karl --- docs/examples/locate_job.md | 111 ++++++++++++++---- .../audience/audience_example_handler.py | 70 +++++++++++ .../audience/rapidata_audience.py | 45 +++++++ .../rapidata_client/validation/rapids/box.py | 58 +++++++++ .../validation/rapids/rapids_manager.py | 57 +-------- 5 files changed, 265 insertions(+), 76 deletions(-) diff --git a/docs/examples/locate_job.md b/docs/examples/locate_job.md index c0ced588a..3e618a176 100644 --- a/docs/examples/locate_job.md +++ b/docs/examples/locate_job.md @@ -4,35 +4,98 @@ To learn about the basics of creating a job, please refer to the [quickstart gui In a locate job, labelers tap the points in a datapoint that match your instruction. In this example, we ask people to point out visual artifacts in AI-generated images — a common way to find where a generator went wrong. -Like any other job, a locate job can be assigned to any audience. This example uses the **global** audience — the broadest pool of labelers, ready to work immediately — so it starts collecting responses right away. +Like any other job, a locate job can be assigned to any audience — a ready-to-go curated one, or a custom audience you train with qualification examples. -```python -from rapidata import RapidataClient +=== "Simple" -IMAGE_URLS = [ - "https://assets.rapidata.ai/eac11c3e-ad57-402b-90ed-23378d2ff869.jpg", - "https://assets.rapidata.ai/04e7e3c6-5554-47ca-bdb2-950e48ac3e6c.jpg", - "https://assets.rapidata.ai/91d9913c-b399-47f8-ad19-767798cc951c.jpg", -] + The simple version runs straight away on a **generally available** audience — a pre-existing pool of labelers, ready to work immediately — so the job starts collecting responses right away. -client = RapidataClient() + ```python + from rapidata import RapidataClient -audience = client.audience.get_audience_by_id("global") # (1)! + IMAGE_URLS = [ + "https://assets.rapidata.ai/eac11c3e-ad57-402b-90ed-23378d2ff869.jpg", + "https://assets.rapidata.ai/04e7e3c6-5554-47ca-bdb2-950e48ac3e6c.jpg", + "https://assets.rapidata.ai/91d9913c-b399-47f8-ad19-767798cc951c.jpg", + ] -job_definition = client.job.create_locate_job_definition( - name="Artifact Detection Example", - instruction="Tap on any visual glitches or errors in the image.", # (2)! - datapoints=IMAGE_URLS, - responses_per_datapoint=35, -) + client = RapidataClient() -job_definition.preview() + audience = client.audience.get_audience_by_id("global") # (1)! -job = audience.assign_job(job_definition) -job.display_progress_bar() -results = job.get_results() -print(results) -``` + job_definition = client.job.create_locate_job_definition( + name="Artifact Detection Example", + instruction="Tap on any visual glitches or errors in the image.", # (2)! + datapoints=IMAGE_URLS, + responses_per_datapoint=35, + ) -1. The global audience (id `global`) already has labelers ready to work, so the job starts collecting responses immediately. You can assign a locate job to any audience — browse them in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). -2. The instruction tells labelers what to locate. Each response is the set of points they tapped on that datapoint. + job_definition.preview() + + job = audience.assign_job(job_definition) + job.display_progress_bar() + results = job.get_results() + print(results) + ``` + + 1. The global audience (id `global`) already has labelers ready to work, so the job starts collecting responses immediately. You can assign a locate job to any audience — browse them in the [Rapidata Dashboard](https://app.rapidata.ai/audiences). + 2. The instruction tells labelers what to locate. Each response is the set of points they tapped on that datapoint. + +=== "Advanced" + + The advanced version builds a **custom** audience and trains labelers with qualification examples before running the job. Each example carries the bounding box(es) covering the region a correct labeler should tap; only labelers who tap inside them join the audience, which raises label quality. + + !!! warning "This takes significantly longer" + Unlike the Simple path, this first builds and trains an entirely new audience before the job can start collecting responses — expect it to take considerably longer to return results. + + ```python + from rapidata import RapidataClient, Box + + IMAGE_URLS = [ + "https://assets.rapidata.ai/eac11c3e-ad57-402b-90ed-23378d2ff869.jpg", + "https://assets.rapidata.ai/04e7e3c6-5554-47ca-bdb2-950e48ac3e6c.jpg", + "https://assets.rapidata.ai/91d9913c-b399-47f8-ad19-767798cc951c.jpg", + ] + + # Qualification examples — each pairs an image with the bounding box(es) + # covering the region a correct labeler should tap. Coordinates are image + # ratios (0.0–1.0); the boxes below are illustrative, mark the real artifact + # regions in your own images. Use only examples whose target is unambiguous. + EXAMPLES = [ + ("https://assets.rapidata.ai/eac11c3e-ad57-402b-90ed-23378d2ff869.jpg", + [Box(x_min=0.10, y_min=0.55, x_max=0.30, y_max=0.80)]), + ("https://assets.rapidata.ai/04e7e3c6-5554-47ca-bdb2-950e48ac3e6c.jpg", + [Box(x_min=0.60, y_min=0.20, x_max=0.85, y_max=0.45)]), + ("https://assets.rapidata.ai/91d9913c-b399-47f8-ad19-767798cc951c.jpg", + [Box(x_min=0.40, y_min=0.40, x_max=0.60, y_max=0.65)]), + ] + + client = RapidataClient() + + audience = client.audience.create_audience(name="Artifact Detection Audience") # (1)! + for datapoint, truths in EXAMPLES: + audience.add_locate_example( + instruction="Tap on any visual glitches or errors in the image.", + datapoint=datapoint, + truths=truths, + ) + + job_definition = client.job.create_locate_job_definition( + name="Artifact Detection Example", + instruction="Tap on any visual glitches or errors in the image.", + datapoints=IMAGE_URLS, + responses_per_datapoint=35, + ) + + job_definition.preview() + + job = audience.assign_job(job_definition) + job.display_progress_bar() + results = job.get_results() + print(results) + ``` + + 1. Creates a new, empty audience. The `add_locate_example` calls train and filter the labelers who join it. + + !!! note + Review every qualification example and its truth regions carefully, and add more than the few shown here for production workloads — see [Custom Audiences](../audiences.md) for the full guide. diff --git a/src/rapidata/rapidata_client/audience/audience_example_handler.py b/src/rapidata/rapidata_client/audience/audience_example_handler.py index ded438c66..faa761768 100644 --- a/src/rapidata/rapidata_client/audience/audience_example_handler.py +++ b/src/rapidata/rapidata_client/audience/audience_example_handler.py @@ -19,6 +19,16 @@ from rapidata.api_client.models.i_example_truth_compare_example_truth import ( IExampleTruthCompareExampleTruth, ) +from rapidata.api_client.models.i_example_payload_locate_example_payload import ( + IExamplePayloadLocateExamplePayload, +) +from rapidata.api_client.models.i_example_truth_locate_example_truth import ( + IExampleTruthLocateExampleTruth, +) +from rapidata.rapidata_client.validation.rapids.box import ( + Box, + calculate_boxes_coverage, +) from rapidata.service.openapi_service import OpenAPIService from rapidata.api_client.models.i_example_payload import IExamplePayload from rapidata.api_client.models.i_example_truth import IExampleTruth @@ -186,6 +196,66 @@ def add_compare_example( ), ) + def add_locate_example( + self, + instruction: str, + datapoint: str, + truths: list[Box], + context: str | None = None, + media_context: list[str] | None = None, + explanation: str | None = None, + settings: Sequence[RapidataSetting] | None = None, + ) -> None: + """add a locate example to the audience + + Args: + instruction (str): The instruction telling the labeler what to locate. + datapoint (str): The media datapoint the labeler will be locating the target in. + truths (list[Box]): The bounding boxes covering the correct regions to tap. Coordinates are ratios of the image size (0.0 to 1.0). + context (str, optional): The context is text that will be shown in addition to the instruction. Defaults to None. + media_context (list[str], optional): A list of image URLs / paths that will be shown in addition to the instruction (can be combined with context). Pass a single-element list for one image, or multiple to display several images. Defaults to None. + explanation (str, optional): The explanation that will be shown to the labeler if the answer is wrong. Defaults to None. + settings (Sequence[RapidataSetting], optional): The list of settings to apply to the example as feature flags. Controls how the example is rendered to the labeler. Defaults to None. + """ + from rapidata.api_client.models.add_example_to_audience_endpoint_input import ( + AddExampleToAudienceEndpointInput, + ) + + if not truths: + raise ValueError("Locate example requires at least one truth bounding box") + + asset_input = self._asset_uploader.upload_and_map_asset(datapoint) + + payload = IExamplePayload( + actual_instance=IExamplePayloadLocateExamplePayload( + _t="LocateExamplePayload", target=instruction + ) + ) + model_truth = IExampleTruth( + actual_instance=IExampleTruthLocateExampleTruth( + _t="LocateExampleTruth", + boundingBoxes=[truth.to_example_model() for truth in truths], + ) + ) + + self._openapi_service.audience.examples_api.audience_audience_id_example_post( + audience_id=self._audience_id, + add_example_to_audience_endpoint_input=AddExampleToAudienceEndpointInput( + asset=asset_input, + payload=payload, + truth=model_truth, + context=context, + contextAsset=( + self._asset_uploader.upload_and_map_asset(media_context) + if media_context + else None + ), + explanation=explanation, + randomCorrectProbability=calculate_boxes_coverage(truths), + featureFlags=[s._to_feature_flag() for s in settings] if settings else None, + ), + ) + def _add_rapid_example(self, rapid: Rapid) -> None: """Add a rapid example to the audience (private method). diff --git a/src/rapidata/rapidata_client/audience/rapidata_audience.py b/src/rapidata/rapidata_client/audience/rapidata_audience.py index 3929d4006..a45c1437a 100644 --- a/src/rapidata/rapidata_client/audience/rapidata_audience.py +++ b/src/rapidata/rapidata_client/audience/rapidata_audience.py @@ -15,6 +15,7 @@ ) from rapidata.rapidata_client.filter import RapidataFilter from rapidata.rapidata_client.validation.rapids.rapids import Rapid + from rapidata.rapidata_client.validation.rapids.box import Box from rapidata.rapidata_client.settings._rapidata_setting import RapidataSetting import pandas as pd @@ -266,6 +267,50 @@ def add_compare_example( self._try_start_recruiting() return self + def add_locate_example( + self, + instruction: str, + datapoint: str, + truths: list[Box], + context: str | None = None, + media_context: list[str] | None = None, + explanation: str | None = None, + settings: Sequence[RapidataSetting] | None = None, + ) -> RapidataAudience: + """Add a locate training example to this audience. + + Training examples help annotators understand the task by showing them + a sample datapoint with the correct regions before they start labeling. + + Args: + instruction (str): The instruction telling annotators what to locate. + datapoint (str): The media datapoint (URL or path) to use as the training example. + truths (list[Box]): The bounding boxes covering the correct regions to tap, as :class:`Box` objects with coordinates in image ratios (0.0 to 1.0). + context (str, optional): Additional text context to display with the example. Defaults to None. + media_context (list[str], optional): Additional image URLs / paths to display with the example. Pass a single-element list for one image, or multiple to display several images. Defaults to None. + explanation (str, optional): An explanation of why the truth is correct. Defaults to None. + settings (Sequence[RapidataSetting], optional): Settings applied as feature flags on this example so the qualification example matches how the actual task will be rendered. Defaults to None. + + Returns: + RapidataAudience: The audience instance (self) for method chaining. + """ + media_context = coerce_media_context(media_context) + with tracer.start_as_current_span("RapidataAudience.add_locate_example"): + logger.debug( + f"Adding locate example to audience: {self.id} with instruction: {instruction}, datapoint: {datapoint}, truths: {truths}, context: {context}, media_context: {media_context}, explanation: {explanation}, settings: {settings}" + ) + self._example_handler.add_locate_example( + instruction, + datapoint, + truths, + context, + media_context, + explanation, + settings, + ) + self._try_start_recruiting() + return self + def get_examples( self, amount: int = 10, diff --git a/src/rapidata/rapidata_client/validation/rapids/box.py b/src/rapidata/rapidata_client/validation/rapids/box.py index 9b6b9041a..73ab80754 100644 --- a/src/rapidata/rapidata_client/validation/rapids/box.py +++ b/src/rapidata/rapidata_client/validation/rapids/box.py @@ -1,6 +1,7 @@ from rapidata.api_client.models.locate_box_truth_model_box import ( LocateBoxTruthModelBox, ) +from rapidata.api_client.models.example_box_shape import ExampleBoxShape from pydantic import BaseModel, field_validator, model_validator @@ -42,3 +43,60 @@ def to_model(self) -> LocateBoxTruthModelBox: xMax=self.x_max * 100, yMax=self.y_max * 100, ) + + def to_example_model(self) -> ExampleBoxShape: + return ExampleBoxShape( + xMin=self.x_min * 100, + yMin=self.y_min * 100, + xMax=self.x_max * 100, + yMax=self.y_max * 100, + ) + + +def calculate_boxes_coverage(boxes: list[Box]) -> float: + """Calculate the ratio of image area covered by a list of boxes. + + Args: + boxes: List of Box objects with coordinates in range [0, 1]. + + Returns: + float: Coverage ratio between 0.0 and 1.0. + """ + if not boxes: + return 0.0 + + # Sweep line over x: at each x-interval, sum the merged y-coverage of the + # currently active boxes, weighted by the interval width. + events: list[tuple[float, str, int]] = [] + for i, box in enumerate(boxes): + events.append((box.x_min, "start", i)) + events.append((box.x_max, "end", i)) + + events.sort(key=lambda e: (e[0], e[1] == "end")) + + total_area = 0.0 + active_boxes: set[int] = set() + prev_x = 0.0 + + for x, event_type, box_id in events: + if active_boxes and x > prev_x: + y_intervals = sorted( + (boxes[i].y_min, boxes[i].y_max) for i in active_boxes + ) + merged: list[tuple[float, float]] = [] + for start, end in y_intervals: + if merged and start <= merged[-1][1]: + merged[-1] = (merged[-1][0], max(merged[-1][1], end)) + else: + merged.append((start, end)) + y_coverage = sum(end - start for start, end in merged) + total_area += (x - prev_x) * y_coverage + + if event_type == "start": + active_boxes.add(box_id) + else: + active_boxes.discard(box_id) + + prev_x = x + + return total_area diff --git a/src/rapidata/rapidata_client/validation/rapids/rapids_manager.py b/src/rapidata/rapidata_client/validation/rapids/rapids_manager.py index b01c2c396..617b30f59 100644 --- a/src/rapidata/rapidata_client/validation/rapids/rapids_manager.py +++ b/src/rapidata/rapidata_client/validation/rapids/rapids_manager.py @@ -6,7 +6,10 @@ from rapidata.rapidata_client.validation.rapids.rapids import Rapid from rapidata.service.openapi_service import OpenAPIService -from rapidata.rapidata_client.validation.rapids.box import Box +from rapidata.rapidata_client.validation.rapids.box import ( + Box, + calculate_boxes_coverage, +) class RapidsManager: @@ -431,57 +434,7 @@ def _calculate_boxes_coverage(self, boxes: list[Box]) -> float: Returns: float: Coverage ratio between 0.0 and 1.0 """ - if not boxes: - return 0.0 - - # Convert boxes to intervals for sweep line algorithm - events = [] - - # Create events for x-coordinates - for i, box in enumerate(boxes): - events.append((box.x_min, "start", i, box)) - events.append((box.x_max, "end", i, box)) - - # Sort events by x-coordinate - events.sort(key=lambda x: (x[0], x[1] == "end")) - - total_area = 0.0 - active_boxes = set() - prev_x = 0.0 - - for x, event_type, box_id, box in events: - # Calculate area for the previous x-interval - if active_boxes and x > prev_x: - # Merge y-intervals for active boxes - y_intervals = [(boxes[i].y_min, boxes[i].y_max) for i in active_boxes] - y_intervals.sort() - - # Merge overlapping y-intervals - merged_intervals = [] - for start, end in y_intervals: - if merged_intervals and start <= merged_intervals[-1][1]: - # Overlapping intervals - merge them - merged_intervals[-1] = ( - merged_intervals[-1][0], - max(merged_intervals[-1][1], end), - ) - else: - # Non-overlapping interval - merged_intervals.append((start, end)) - - # Calculate total y-coverage for this x-interval - y_coverage = sum(end - start for start, end in merged_intervals) - total_area += (x - prev_x) * y_coverage - - # Update active boxes - if event_type == "start": - active_boxes.add(box_id) - else: - active_boxes.discard(box_id) - - prev_x = x - - return total_area + return calculate_boxes_coverage(boxes) @staticmethod def _calculate_coverage_ratio( From cae1228a56e0df7c83dabbcb2d38ccca5e756acf Mon Sep 17 00:00:00 2001 From: Karl-The-Man Date: Fri, 12 Jun 2026 10:03:03 +0200 Subject: [PATCH 08/10] feat(docs): update example images and bounding boxes in locate job documentation --- docs/examples/locate_job.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/examples/locate_job.md b/docs/examples/locate_job.md index 3e618a176..bd5804c94 100644 --- a/docs/examples/locate_job.md +++ b/docs/examples/locate_job.md @@ -62,12 +62,20 @@ Like any other job, a locate job can be assigned to any audience — a ready-to- # ratios (0.0–1.0); the boxes below are illustrative, mark the real artifact # regions in your own images. Use only examples whose target is unambiguous. EXAMPLES = [ - ("https://assets.rapidata.ai/eac11c3e-ad57-402b-90ed-23378d2ff869.jpg", - [Box(x_min=0.10, y_min=0.55, x_max=0.30, y_max=0.80)]), - ("https://assets.rapidata.ai/04e7e3c6-5554-47ca-bdb2-950e48ac3e6c.jpg", - [Box(x_min=0.60, y_min=0.20, x_max=0.85, y_max=0.45)]), - ("https://assets.rapidata.ai/91d9913c-b399-47f8-ad19-767798cc951c.jpg", - [Box(x_min=0.40, y_min=0.40, x_max=0.60, y_max=0.65)]), + ("https://assets.rapidata.ai/544b1210-1e91-4351-a97c-fe8263b319b4.webp", + [Box(x_min=0.44, y_min=0.42, x_max=0.58, y_max=0.63)]), + ("https://assets.rapidata.ai/f1e11611-7c5b-4186-8ddf-51e06c0859ff.webp", + [Box(x_min=0.07, y_min=0.37, x_max=0.39, y_max=0.71)]), + ("https://assets.rapidata.ai/ad816f8f-f7a9-4c90-90dd-9c10bc556856.webp", + [Box(x_min=0.04, y_min=0.10, x_max=0.31, y_max=0.28)]), + ("https://assets.rapidata.ai/a076ae24-4d5c-415d-9d41-6afbe2fbfcde.webp", + [Box(x_min=0.25, y_min=0.40, x_max=0.70, y_max=0.96)]), + ("https://assets.rapidata.ai/38753cb4-4b77-4fb7-b601-8a5bc3d166d7.webp", + [Box(x_min=0.41, y_min=0.09, x_max=0.87, y_max=0.45)]), + ("https://assets.rapidata.ai/50109592-b521-4dcb-a00f-453f6c026a52.webp", + [Box(x_min=0.25, y_min=0.03, x_max=0.71, y_max=0.48)]), + ("https://assets.rapidata.ai/a5a954d0-91e8-4b4e-bec6-2bb739444be8.webp", + [Box(x_min=0.57, y_min=0.40, x_max=0.96, y_max=0.89)]), ] client = RapidataClient() From 50f73c0eb5285d8475e70e25bcb82e0e77d16173 Mon Sep 17 00:00:00 2001 From: Karl-The-Man Date: Fri, 12 Jun 2026 10:27:15 +0200 Subject: [PATCH 09/10] docs(examples): remove the comment about example boxes being illustrative --- docs/examples/locate_job.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/examples/locate_job.md b/docs/examples/locate_job.md index bd5804c94..23be957ec 100644 --- a/docs/examples/locate_job.md +++ b/docs/examples/locate_job.md @@ -59,8 +59,7 @@ Like any other job, a locate job can be assigned to any audience — a ready-to- # Qualification examples — each pairs an image with the bounding box(es) # covering the region a correct labeler should tap. Coordinates are image - # ratios (0.0–1.0); the boxes below are illustrative, mark the real artifact - # regions in your own images. Use only examples whose target is unambiguous. + # ratios (0.0–1.0); EXAMPLES = [ ("https://assets.rapidata.ai/544b1210-1e91-4351-a97c-fe8263b319b4.webp", [Box(x_min=0.44, y_min=0.42, x_max=0.58, y_max=0.63)]), From b094296e405dfe1ab11e45483e230361341f4f8f Mon Sep 17 00:00:00 2001 From: Karl-The-Man Date: Fri, 12 Jun 2026 10:34:52 +0200 Subject: [PATCH 10/10] fix(docs): correct audience description in locate job example --- docs/examples/locate_job.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/examples/locate_job.md b/docs/examples/locate_job.md index 23be957ec..26472983e 100644 --- a/docs/examples/locate_job.md +++ b/docs/examples/locate_job.md @@ -8,7 +8,7 @@ Like any other job, a locate job can be assigned to any audience — a ready-to- === "Simple" - The simple version runs straight away on a **generally available** audience — a pre-existing pool of labelers, ready to work immediately — so the job starts collecting responses right away. + The simple version runs straight away on a **curated** audience — a pre-existing pool of labelers, ready to work immediately — so the job starts collecting responses right away. ```python from rapidata import RapidataClient