Skip to content

Improve stability and performance#218

Merged
d33bs merged 12 commits into
cytomining:mainfrom
d33bs:stability
Jul 2, 2026
Merged

Improve stability and performance#218
d33bs merged 12 commits into
cytomining:mainfrom
d33bs:stability

Conversation

@d33bs

@d33bs d33bs commented Jun 28, 2026

Copy link
Copy Markdown
Member

Description

Stability

  • Fix crash on single-channel grayscale images (common in microscopy) in the image brightness helpers.
  • Add behavioral tests found via mutation testing (previous tests only checked output shape, not values).

Performance — profiled the wide-table hot paths and cut redundant work:

  • find_image_columns now skips numeric columns instead of scanning every value.
  • Detection helpers no longer rebuild column lists / reconstruct frames on each render.
  • Stop treating Image_URL_* columns as image paths, which caused every image to be decoded twice per render.

Result (JUMP plate): construct −63%, slice −59%, full render −55%. Rendered output unchanged.

Tests: new tests/test_performance.py regression guards; full suite passes (162), pre-commit green.

What kind of change(s) are included?

  • Documentation (changes docs or other related content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • These changes pass all pre-commit checks.
  • I have added comments to my code to help provide understanding
  • I have added a test which covers the code changes found within this PR
  • I have deleted all non-relevant text in this pull request template.

Summary by CodeRabbit

Summary of changes

  • New Features
    • Added image-to-grayscale conversion supporting 2D, single-channel, RGB, and RGBA.
  • Bug Fixes
    • Improved image metadata detection and FileName→PathName mapping to avoid URL-like fields.
    • Improved bounding-box/center calculations and brightness adjustment (equalizes grayscale while preserving RGBA alpha).
  • Performance
    • Faster repeated renders via cached decoded/enhanced images; avoids redundant image-column rescanning.
  • Tests
    • Expanded image, frame, and performance regression coverage, plus package version resolution validation.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds grayscale image conversion and brightness handling, resolves package version metadata at import time, tightens image-column detection and path mapping in frame.py, adds cached display and 3D volume lookup behavior, and expands unit and performance tests for these behaviors.

Changes

CytoDataFrame image and frame updates

Layer / File(s) Summary
Package version resolution
src/cytodataframe/__init__.py, tests/test_frame.py
__version__ is resolved from generated version metadata or installed distribution metadata, with a fallback to 0.0.0, and tests assert the imported version matches installed package metadata.
Grayscale and brightness handling
src/cytodataframe/image.py, tests/test_image.py
Adds image_array_to_grayscale, routes brightness checks through it, and rebuilds brightness-adjusted output from equalized grayscale while preserving RGBA alpha; tests cover supported layouts and brightness behavior.
Frame image detection and rendering
src/cytodataframe/frame.py, tests/test_frame.py
Precomputes column sets for bounding-box and center checks, narrows image-column and image-path detection to FileName/PathName conventions, adds cached image display loading, and updates HTML rendering to call helper methods on the prepared frame directly.
3D volume cache behavior
src/cytodataframe/frame.py
_get_3d_volume_from_cell now uses shared cache-setting resolution for volume lookup and records non-3D cells in a negative cache set.
Frame, image, and performance tests
tests/test_performance.py
Adds regression tests for image-column detection, rendering call counts, cache reuse and disablement, and wide-frame timing.

Estimated code review effort: 4 (Complex) | ~45 minutes

Possibly related PRs

Suggested reviewers: jenna-tomkinson

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is broadly related to the changes, but it is too generic to convey the main update clearly. Use a more specific title that mentions the key fixes, such as grayscale image handling and image rendering performance improvements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@d33bs d33bs marked this pull request as ready for review June 28, 2026 18:42
@d33bs d33bs requested a review from jenna-tomkinson as a code owner June 28, 2026 18:42

@jenna-tomkinson jenna-tomkinson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will this solve some of the slowness with some datasets?

@jenna-tomkinson

Copy link
Copy Markdown
Member

@d33bs I am noting CytoDataFrame is very slow now, even with some of the test datasets from the cosmicqc_in_a_nutshell.

@d33bs

d33bs commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

Thanks @jenna-tomkinson ! Just double checking: are you meaning this PR's version of CytoDataFrame is slow or that the package is slow in general with a different release? If we've slowed down as a result of the changes here we should make further changes until performance change is neutral or improved.

@jenna-tomkinson

Copy link
Copy Markdown
Member

Hi @d33bs, apologises I mean I am using the poetry environment from coSMicQC and whatever we are setting the package with if I use CytoDataFrame outside of coSMicQC to render images it is very slow. Maybe I am missing a parameter, but we can discuss in person.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_performance.py`:
- Around line 98-183: The performance guards are using the debug-only render
path, so they miss the slower real notebook flow in `_repr_html_`. Update the
tests to exercise `CytoDataFrame._repr_html_()` with `debug=False` (mocking
notebook/display side effects as needed) so the assertions cover the actual
`_try_render_trame_widget_table()` / `_find_3d_columns_for_display()` path where
`find_image_columns()` and `process_image_data_as_html_display()` are triggered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f55bdc55-7a44-4417-939a-6d31e3ae9cc0

📥 Commits

Reviewing files that changed from the base of the PR and between 8e0772c and 084888b.

📒 Files selected for processing (3)
  • src/cytodataframe/frame.py
  • tests/test_frame.py
  • tests/test_performance.py

Comment thread tests/test_performance.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/cytodataframe/frame.py (1)

2466-2490: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate cache-config resolution logic vs. _get_3d_volume_from_cell.

_get_image_display_cache's disable/max-entries resolution (and the OrderedDict-wrapping fallback) duplicates the near-identical block in _get_3d_volume_from_cell (lines 3118-3134). Extracting a shared helper (e.g. _resolve_cache_settings(option_prefix, default_max)) would reduce drift risk between the two cache implementations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cytodataframe/frame.py` around lines 2466 - 2490, The cache settings
logic in _get_image_display_cache duplicates the same disable/max-entries
resolution and OrderedDict fallback used in _get_3d_volume_from_cell, so factor
that shared behavior into a small helper (for example, a cache-settings
resolver) and have both methods call it. Keep the helper responsible for reading
display_options, normalizing the max entry value, and wrapping any existing
cache into an OrderedDict so the two cache paths stay consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cytodataframe/frame.py`:
- Around line 272-275: The 3D probe path in _find_3d_columns_for_display() is
still re-reading 2D TIFFs on every _repr_html_() because _volume_cache only
stores successful 3D loads and ignores failed probes. Update the probing logic
around _load_image_volume/_find_3d_columns_for_display() to memoize “not 3D”
results per resolved path or cell, or reuse the existing _image_cache to
short-circuit once a 2D render has already succeeded. Keep the change localized
to the 3D detection/rendering flow so repeated renders do not call
imageio.imread() for the same 2D .tif/.tiff inputs.

In `@tests/test_performance.py`:
- Around line 260-305: Update the cache tests to exercise the notebook rendering
path by calling _render_via_notebook_path instead of _repr_html_(debug=True),
since the debug branch in CytodataFrame._repr_html_ bypasses the normal display
flow and won’t validate cached image reuse correctly. Keep the existing
decode-count assertions and HTML equivalence check, but ensure the test path
goes through the same rendering logic that uses _find_3d_columns_for_display and
image decoding/cache behavior.

---

Nitpick comments:
In `@src/cytodataframe/frame.py`:
- Around line 2466-2490: The cache settings logic in _get_image_display_cache
duplicates the same disable/max-entries resolution and OrderedDict fallback used
in _get_3d_volume_from_cell, so factor that shared behavior into a small helper
(for example, a cache-settings resolver) and have both methods call it. Keep the
helper responsible for reading display_options, normalizing the max entry value,
and wrapping any existing cache into an OrderedDict so the two cache paths stay
consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4aa187a0-f8e7-4043-be2e-61811d54dbbd

📥 Commits

Reviewing files that changed from the base of the PR and between 3d90b55 and e45cadc.

📒 Files selected for processing (4)
  • src/cytodataframe/__init__.py
  • src/cytodataframe/frame.py
  • tests/test_frame.py
  • tests/test_performance.py

Comment thread src/cytodataframe/frame.py
Comment thread tests/test_performance.py
@d33bs

d33bs commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Thanks @jenna-tomkinson !

@d33bs d33bs merged commit 16b7923 into cytomining:main Jul 2, 2026
9 checks passed
@d33bs d33bs deleted the stability branch July 2, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants