Improve stability and performance#218
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds grayscale image conversion and brightness handling, resolves package version metadata at import time, tightens image-column detection and path mapping in ChangesCytoDataFrame image and frame updates
Estimated code review effort: 4 (Complex) | ~45 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
jenna-tomkinson
left a comment
There was a problem hiding this comment.
LGTM! Will this solve some of the slowness with some datasets?
|
@d33bs I am noting CytoDataFrame is very slow now, even with some of the test datasets from the cosmicqc_in_a_nutshell. |
|
Thanks @jenna-tomkinson ! Just double checking: are you meaning this PR's version of CytoDataFrame is slow or that the package is slow in general with a different release? If we've slowed down as a result of the changes here we should make further changes until performance change is neutral or improved. |
|
Hi @d33bs, apologises I mean I am using the poetry environment from coSMicQC and whatever we are setting the package with if I use CytoDataFrame outside of coSMicQC to render images it is very slow. Maybe I am missing a parameter, but we can discuss in person. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_performance.py`:
- Around line 98-183: The performance guards are using the debug-only render
path, so they miss the slower real notebook flow in `_repr_html_`. Update the
tests to exercise `CytoDataFrame._repr_html_()` with `debug=False` (mocking
notebook/display side effects as needed) so the assertions cover the actual
`_try_render_trame_widget_table()` / `_find_3d_columns_for_display()` path where
`find_image_columns()` and `process_image_data_as_html_display()` are triggered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f55bdc55-7a44-4417-939a-6d31e3ae9cc0
📒 Files selected for processing (3)
src/cytodataframe/frame.pytests/test_frame.pytests/test_performance.py
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/cytodataframe/frame.py (1)
2466-2490: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicate cache-config resolution logic vs.
_get_3d_volume_from_cell.
_get_image_display_cache's disable/max-entries resolution (and theOrderedDict-wrapping fallback) duplicates the near-identical block in_get_3d_volume_from_cell(lines 3118-3134). Extracting a shared helper (e.g._resolve_cache_settings(option_prefix, default_max)) would reduce drift risk between the two cache implementations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cytodataframe/frame.py` around lines 2466 - 2490, The cache settings logic in _get_image_display_cache duplicates the same disable/max-entries resolution and OrderedDict fallback used in _get_3d_volume_from_cell, so factor that shared behavior into a small helper (for example, a cache-settings resolver) and have both methods call it. Keep the helper responsible for reading display_options, normalizing the max entry value, and wrapping any existing cache into an OrderedDict so the two cache paths stay consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cytodataframe/frame.py`:
- Around line 272-275: The 3D probe path in _find_3d_columns_for_display() is
still re-reading 2D TIFFs on every _repr_html_() because _volume_cache only
stores successful 3D loads and ignores failed probes. Update the probing logic
around _load_image_volume/_find_3d_columns_for_display() to memoize “not 3D”
results per resolved path or cell, or reuse the existing _image_cache to
short-circuit once a 2D render has already succeeded. Keep the change localized
to the 3D detection/rendering flow so repeated renders do not call
imageio.imread() for the same 2D .tif/.tiff inputs.
In `@tests/test_performance.py`:
- Around line 260-305: Update the cache tests to exercise the notebook rendering
path by calling _render_via_notebook_path instead of _repr_html_(debug=True),
since the debug branch in CytodataFrame._repr_html_ bypasses the normal display
flow and won’t validate cached image reuse correctly. Keep the existing
decode-count assertions and HTML equivalence check, but ensure the test path
goes through the same rendering logic that uses _find_3d_columns_for_display and
image decoding/cache behavior.
---
Nitpick comments:
In `@src/cytodataframe/frame.py`:
- Around line 2466-2490: The cache settings logic in _get_image_display_cache
duplicates the same disable/max-entries resolution and OrderedDict fallback used
in _get_3d_volume_from_cell, so factor that shared behavior into a small helper
(for example, a cache-settings resolver) and have both methods call it. Keep the
helper responsible for reading display_options, normalizing the max entry value,
and wrapping any existing cache into an OrderedDict so the two cache paths stay
consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4aa187a0-f8e7-4043-be2e-61811d54dbbd
📒 Files selected for processing (4)
src/cytodataframe/__init__.pysrc/cytodataframe/frame.pytests/test_frame.pytests/test_performance.py
|
Thanks @jenna-tomkinson ! |
Description
Stability
Performance — profiled the wide-table hot paths and cut redundant work:
Result (JUMP plate): construct −63%, slice −59%, full render −55%. Rendered output unchanged.
Tests: new tests/test_performance.py regression guards; full suite passes (162), pre-commit green.
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.
Summary by CodeRabbit
Summary of changes