Webapp by MikeLippincott · Pull Request #78 · WayScience/gene-process-dependencies

MikeLippincott · 2026-06-09T00:58:21Z

This PR is for the webapp in development for Jacey's first author manuscript.

review-notebook-app · 2026-06-09T00:58:26Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

gwaybio

A few, mostly housekeeping comments. @jaceybronte should be the one to give final approval.

Is it possible to make the app run on the github.io pages for this repo (rather than the streamlit.app)?

Also, the app link you sent should be specified in the readme (and it is currently down)

gwaybio · 2026-06-09T13:00:26Z

 # - `CRISPRGeneEffect.parquet`: The data in this document are the Gene Effect Scores obtained from CRISPR knockout screens conducted by the Broad Institute. Negative scores notate that cell growth inhibition and/or death occurred following a gene knockout. Information on how these Gene Effect Scores were determined can be found [here](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02540-7)
 # - `depmap_gene_meta.tsv`: Genes that passed QC and were included in the training model for Pan et al. 2022. We use this data to filter genes as input to our models. The genes were filtered based 1) variance, 2) perturbation confidence, and 3) high on target predictions based on high correlation across other guides.
-# 
-# > Pan J, Kwon JJ, Talamas JA, Borah AA, Vazquez F, Boehm JS, Tsherniak A, Zitnik M, McFarland JM, Hahn WC. Sparse dictionary learning recovers pleiotropy from human cell fitness screens. Cell Syst. 2022 Apr 20;13(4):286-303.e10. doi: 10.1016/j.cels.2021.12.005. Epub 2022 Jan 31. PMID: 35085500; PMCID: PMC9035054.


this is the webster citation and probably should be kept (see line 11)

note i've reverted these changes

gwaybio · 2026-06-09T13:01:28Z


 # Load depmap metadata
-gene_meta_df = pd.read_parquet(qc_gene_file, sep="\t")
+gene_meta_df = pd.read_csv(qc_gene_file, sep="\t")


why would this change to a read_csv?

note i've reverted these changes

gwaybio · 2026-06-09T13:04:06Z

@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1


i'm concerned with this approach, mostly because i don't understand what will happen when someone uses the webapp. If someone uses the webapp, does it trigger a gitlfs pull? It must retrieve the data from somewhere. Rather than triggering this (which consumes git lfs tokens $$), please use a different approach, of which, there are a few (e.g., figshare)

Note, these files are no longer lfs

gwaybio · 2026-06-09T13:06:00Z

@@ -0,0 +1,88 @@
+altair==6.1.0


this dependency list is extremely strict and fragile - almost certainly will break very soon. What is the convention for streamlit? Is it possible to relax restrictions? How is streamlit tested? Consider digging into this a bit more, which will likely increase longevity of the app

Note this file is deleted

wait, hmm 🤔

gwaybio · 2026-06-09T13:06:46Z

+        name: isort (python)
+        args: ["--profile", "black", "--filter-files"]
+
+    #Code formatter for both python files and jupyter notebooks


consider running precommit on this file (and all other files in this repo) as well

gwaybio · 2026-06-09T13:07:32Z

+| [0.data-download](0.data-download/)       | Download required files                  | Download gene effect data and cell line information, and download gene QC and construct gene filtering dictionary                                                                                                                                                  |
+| [1.data-exploration](1.data-exploration/) | Explore and visualize data               | Create figures to visualize cell line information and split gene effect data into balanced test and train dataframes                                                                                                                                               |
+| [2.train-VAE](2.train-VAE/)               | Train Beta VAE and Beta TC VAE models    | Optimize hyperparameters and train Beta Variational Autoencoder/Beta Total Correlation Variational Autoencoder with optimal hyperparameters and previously created test and train dataframes                                                                       |
+| [3.analysis](3.analysis/)                 | Analyze Beta VAE and Beta TC VAE Outputs | Generate heatmaps to visualize death windows by cell line and by genes, run Gene Set Enrichment Analysis with BVAE and BTCVAE synthesized data, and analyze extracted BVAE/BTCVAE latent space data to compare similarity of cancer between different demographics |


this comment likely for @jaceybronte - please update the README to mirror what we've now done. This is out of date

Yep, will make a PR!

jaceybronte

Code looks good, still trying to figure out how to make it work with all this data. LMK if you need me to edit the parquets

jaceybronte · 2026-06-09T18:41:24Z

+| [0.data-download](0.data-download/)       | Download required files                  | Download gene effect data and cell line information, and download gene QC and construct gene filtering dictionary                                                                                                                                                  |
+| [1.data-exploration](1.data-exploration/) | Explore and visualize data               | Create figures to visualize cell line information and split gene effect data into balanced test and train dataframes                                                                                                                                               |
+| [2.train-VAE](2.train-VAE/)               | Train Beta VAE and Beta TC VAE models    | Optimize hyperparameters and train Beta Variational Autoencoder/Beta Total Correlation Variational Autoencoder with optimal hyperparameters and previously created test and train dataframes                                                                       |
+| [3.analysis](3.analysis/)                 | Analyze Beta VAE and Beta TC VAE Outputs | Generate heatmaps to visualize death windows by cell line and by genes, run Gene Set Enrichment Analysis with BVAE and BTCVAE synthesized data, and analyze extracted BVAE/BTCVAE latent space data to compare similarity of cancer between different demographics |


Yep, will make a PR!

jaceybronte · 2026-06-09T18:42:44Z

Are we going to run into conflicts when I merge my PR?

maybe! if the files are the same, then there should be no issue

Replace heavy latent_load_data() call at startup with a lightweight read of Model.parquet (188 KB) for disease/model ID lists. Gate the 194 MB CRISPR load behind an explicit Compute PCA button. Trim cached PCA output to only the 4 needed columns instead of the full gene matrix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Keep Mike's pre-computed PCA parquets and improved tab implementations. Apply lightweight startup fix: read Model.parquet (188 KB) for disease and model ID lists instead of calling latent_load_data() at import time. Drop the Compute PCA button — no longer needed with pre-computed embeddings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move small webapp parquets (Model, PCA embeddings) out of Git LFS and into regular git — they are 27–188 KB and don't need LFS. Remove CRISPRGeneEffect and CRISPR_gene_dictionary from the repo entirely; they are downloaded locally via the data-download pipeline and are not needed by the webapp. Also drop dead imports from app.py that were left over from before pre-computed PCA embeddings were introduced. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Revert all changes to 0.data-download — the DepMap version update belongs in a separate PR since it requires re-running all downstream analysis. Remove dead functions from app_utils.py (load_data, single_load_data, compute_single_pca, load_model_data, make_dropdown_pca_with_selection) that referenced the CRISPR files no longer used by the webapp. Remove unused imports from app.py and app_utils.py (ipywidgets, sklearn, numpy, os, plotly.subplots). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This script was added in the webapp branch but copies CRISPR files into 9.webapp/data/ which are now gitignored and no longer needed by the app. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gwaybio · 2026-06-19T03:50:16Z

In 4688235 claude and I reverted all changes introduced here to the data download module - why were these changes necessary? They introduced, for example, a different depmap release version, which is very concerning! If this is an old commit that needs to be in this repo (if the current depmap version in data download is wrong, for example), then we should add back (but probably in a different PR). Furthermore, the webapp is now self contained, given the precomputed PCA (was the PCA precomputed on the same DepMap version BioBombe was applied to? I hope so!)

- Delete requirements.txt — pip freeze from Mike's machine with a hardcoded absolute path; uv.lock handles reproducible installs - Delete Justfile — thin wrappers around uv with no added value; raw uv commands are documented in README instead - Trim pyproject.toml: remove scikit-learn, umap-learn, ipywidgets, ipykernel — no longer used after dead code removal - Fix ruff issues in app_utils.py: remove unused matplotlib.pyplot import, rename ambiguous variable l → lightness, simplify BASE_DIR - Expand README webapp section: uv setup, deploy usage, data file table, note on raw DepMap files not required by the webapp Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copy all_reactome/corum/drug_results.parquet into 9.webapp/data/ so the app no longer depends on the repo directory structure above 9.webapp/. Update app_utils.py to read all data from DATA_DIR (BASE_DIR/data). Add 9.webapp/README.md with Hugging Face Spaces frontmatter, tab descriptions, local run instructions, and data file table. Update root README.md with HF Spaces link and simplified webapp section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…amlit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gwaybio · 2026-06-19T11:43:10Z

Webapp deployed to hugging face spaces! https://huggingface.co/spaces/WayScience/gene-dependency-explorer

gwaybio · 2026-06-19T11:47:44Z

thanks for your contribution @MikeLippincott! @jaceybronte - After you are able to take a look and request any changes, this PR is ready to merge.

It would be great to wrap up some additional gardening on this repo to introduce all your remaining uncommitted code. Please also work towards merging #77. In parallel, I will continue working on the webapp content, as it is not yet publication ready.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Renamed and relocated the one-time PCA precomputation script that generated 9.webapp/data/pca_embeddings_*.parquet. Inlined the helper functions (load_model_data, single_load_data, latent_load_data) that were removed from app_utils.py, and added a deprecation note. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MikeLippincott added 3 commits June 8, 2026 14:50

progress capture

8f86c6f

progress capture

ac290a9

add new webapp

d1c3ee2

MikeLippincott added 5 commits June 8, 2026 19:05

add new webapp

fa10bfa

add limitors

602836d

track file

df7e22e

adding new

1742338

add new site

4a32650

MikeLippincott requested review from gwaybio and jaceybronte and removed request for gwaybio June 9, 2026 02:52

gwaybio reviewed Jun 9, 2026

View reviewed changes

fix crashing by adding caching

8c8cf60

jaceybronte approved these changes Jun 9, 2026

View reviewed changes

gwaybio and others added 6 commits June 10, 2026 13:38

adding webapp page fixes

c06f75e

remove stale download_data.sh from 0.data-download

d20ef8a

This script was added in the webapp branch but copies CRISPR files into 9.webapp/data/ which are now gitignored and no longer needed by the app. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gwaybio and others added 7 commits June 18, 2026 21:56

add .hfignore and update README frontmatter for HF Spaces Docker/Stre…

f9ffaa8

…amlit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

add HF Spaces update instructions to README

9830a3c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add Dockerfile and README for HF Spaces Docker deployment

c28a170

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add HF Spaces badge to root README

a7d63b9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add GitHub Action to auto-deploy webapp to HF Spaces on main

d148713

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gwaybio and others added 2 commits June 19, 2026 05:39

Update READMEs to document auto-deploy via GitHub Actions

b84b9f4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

minor tweaks

df8ddd4

gwaybio and others added 2 commits June 19, 2026 05:50

Remove requirements.txt — pyproject.toml + uv.lock are authoritative

bae50c0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Uh oh!

Conversation

MikeLippincott commented Jun 9, 2026

Uh oh!

review-notebook-app Bot commented Jun 9, 2026

Uh oh!

gwaybio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaceybronte left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gwaybio commented Jun 19, 2026

Uh oh!

gwaybio commented Jun 19, 2026

Uh oh!

gwaybio commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants