Skip to content

Cell killing plot pipeline#79

Open
jaceybronte wants to merge 1 commit into
WayScience:mainfrom
jaceybronte:cell-killing-plots
Open

Cell killing plot pipeline#79
jaceybronte wants to merge 1 commit into
WayScience:mainfrom
jaceybronte:cell-killing-plots

Conversation

@jaceybronte

Copy link
Copy Markdown
Member

This PR processes our collaborator data and provides latent scores for cell killing comparisons.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MikeLippincott MikeLippincott left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some efficiency things and other concerns with large notebooks and separation of concerns, but overall looks good!

Comment on lines +67 to +71
# Calculate the Euclidean distance for each row from the mean values
distances = np.linalg.norm(data - mean_values, axis=1)

# Create a new DataFrame to store distances with SampleID
new_rnaseq_data['Euclidean_Distance'] = distances

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider merging these together to avoid multiple var calls

Comment on lines +73 to +76
# Print the SampleID and corresponding Euclidean Distance for each row
for idx, row in new_rnaseq_data.iterrows():
print(f"SampleID: {idx}, Euclidean Distance: {row['Euclidean_Distance']}")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding this to a log or only printing a few rows to avoid gunking up the stdout

latent_df = pd.DataFrame(latent_predictions, columns=["latent_score"])


print(latent_predictions)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for the printing!

Comment on lines +132 to +135
collab_preds_dir = pathlib.Path("../7.collab-data/results").resolve()
collab_preds_dir.mkdir(parents=True, exist_ok=True)

latent_pred_file = collab_preds_dir / "phgg_latent_predictions.parquet"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving this to the top of the notebook

# In[9]:


# Define the location of the saved models and output directory for results

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

overall_counts["percent"] = overall_counts["count"] / total_modelids * 100

# 2. Subset for brain tumors: Neuroblastoma and Diffuse Glioma
brain_df = total_drugs[total_drugs["OncotreePrimaryDisease"].isin(["Neuroblastoma", "Diffuse Glioma"])]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these the only brain tumors or only the ones you are interested in?

Comment on lines +122 to +123
df = compute_and_plot_latent_scores(sample, latent_df, drug_max, "name", "pearson_correlation", "Drug")
drug_merge_df.append(df)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding this to one line to avoid writing the df in memory and then in the list, write once to avoid mem leaks

p_df = compute_and_plot_latent_scores(sample, latent_df, reactome_max, "reactome_pathway", "nes_score", "Reactome")
c_df = compute_and_plot_latent_scores(sample, latent_df, corum_max, "reactome_pathway", "nes_score", "CORUM")
pathway_merge_df.append(p_df)
corum_merge_df.append(c_df)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see mem leak comment below

# In[5]:


cell_killing_df <- auc_df

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider avoid these rename calls and name the df when created

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice plots here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants