Skip to content

feat: create qc module#17

Merged
arjunsridhar12345 merged 29 commits into
devfrom
16-create-qc-module
Jun 25, 2026
Merged

feat: create qc module#17
arjunsridhar12345 merged 29 commits into
devfrom
16-create-qc-module

Conversation

@arjunsridhar12345

@arjunsridhar12345 arjunsridhar12345 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

This PR attempts to address #16.

Tries to follows the architecture diagram defined here

Structured as follows:

Core — shared infrastructure:

  • BaseQC — the abstract stage interface (run(...) -> List[QCMetric]).
  • QCResult — a lightweight per-check result type with to_metric() conversion to a schema QCMetric.
  • schema.py — QCMetric construction helpers, timestamp utilities, and the STATUS_CONVERTER mapping contraqctor statuses onto schema statuses.
  • build_quality_control — assembles a flat metric list into a QualityControl

Raw — the contract-QA stage. RawQC runs the aind_behavior_dynamic_foraging contraqctor runner over a dataset and maps each Result onto a QCMetric, saving any figure assets.

Processed — the behavior-metrics stage. ProcessedQC computes side-bias and inter-lick-interval checks from primitive per-trial/event-time arrays, renders supporting plots, and emits QCResult objects.

An example notebook has also been added, showing the resulting plots from synthetic data: https://github.com/AllenNeuralDynamics/dynamic-foraging-processing/blob/16-create-qc-module/examples/qc_example.ipynb

In addition, adds aind-data-schema to a qc group for installation and updates the lock file.

There's a lot going on here since its the first PR for this, and may have benefitted from breaking up into a couple PRs. But, I think most of the modules are relatively small. If it's confusing and hard to follow, happy to meet and do a group code review together.

@arjunsridhar12345 arjunsridhar12345 marked this pull request as ready for review June 22, 2026 17:49
@alexpiet

Copy link
Copy Markdown

Can you run this on the raw data assets we already have?

@alexpiet

Copy link
Copy Markdown

Feedback

  • In the example notebook, the average side bias is a long string of digits. We should round to 3 decimals points: value=0.1433020947617509 -> 0.143
  • I would change the description from description='Average of the per-trial side bias (right minus left) to description='Average side bias across the session (right is positive)'
  • The example notebook has lots of gaps in the side bias, but thats probably just the result of your synthetic data.
  • The lick interval values should also be rounded to three decimal points

Otherwise looks good. Please generate them for the existing data assets so we can assess on real data

----------
side_bias : numpy.ndarray
Per-trial side bias from the trial table (right minus left); positive
means a rightward bias. Trials with no response are ``nan``.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite correct. Side bias is computed over a sliding window, so trials with no response will still have a side bias value. The side bias will only be nan if the mouse hasn't responded in many trials

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to address this and the description (2nd bullet point in above comment) here: 9121783

@arielleleon arielleleon left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments and will create an issue based off one of them

# Logical input -> trials-table column name. Centralized so the mapping is easy
# to correct against the trial-table builder; ``side_bias`` and the
# ``lickspout_*`` arrays are not yet pinned down in trials_table_mapping.md.
_COLUMNS = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where pandera can be used for dataframe validation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call!

The average-side-bias result followed by the four lick-interval results.
"""
side_bias = _column(trials, "side_bias")
results = [

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, this can be a list and each QC metric/plot can be part of a registry

CHECK_REGISTRY: dict[str, Callable[[ProcessedSession, str | None], list[QCResult]]] = {}

def register_check(name):
    def deco(fn): CHECK_REGISTRY[name] = fn; return fn
    return deco

@register_check("side_bias")
def _side_bias(session, results_folder=None):
    if results_folder: plot_side_bias(...)
    return [side_bias_result(session.trials.side_bias)]

def behavior_qc_results(session, results_folder=None, checks=None):
    names = checks if checks is not None else list(CHECK_REGISTRY)
    return [r for name in names for r in CHECK_REGISTRY[name](session, results_folder)]

This means that the user adding QC need not care about the orchestration module (results.py) since metrics and plots would be in the registry called through the orchestration function. I will create a ticket for this since it's a nice-to-have

from aind_data_schema.core.quality_control import QCMetric


class BaseQC(abc.ABC):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem DF specific. Should this be defined elsewhere, like in aind_data_schema?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point. I think looking at it again, some of the classes from the diagram are generic and could live elsewhere and them imported here.

should we make an issue and think about it later on?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjunsridhar12345 - yes, make an issue. I don't think aind-data-schema is the right home, maybe models or elsewhere

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created here - #21

@arjunsridhar12345

Copy link
Copy Markdown
Collaborator Author

going to merge this later today barring any critical feedback

@alexpiet

Copy link
Copy Markdown

going to merge this later today barring any critical feedback

It looks good on my end, but I'm sure there will be more feedback when we see actual sessions processed.

@arjunsridhar12345

Copy link
Copy Markdown
Collaborator Author

going to merge this later today barring any critical feedback

It looks good on my end, but I'm sure there will be more feedback when we see actual sessions processed.

sounds good. yeah similar to the trials table, will run it on real data and then see what happens and open issues and such

@arjunsridhar12345 arjunsridhar12345 merged commit 03b64f7 into dev Jun 25, 2026
3 checks passed
@arjunsridhar12345 arjunsridhar12345 deleted the 16-create-qc-module branch June 25, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants