diff --git a/.github/instructions/config-method.instructions.md b/.github/instructions/config-method.instructions.md new file mode 100644 index 0000000..ec8346b --- /dev/null +++ b/.github/instructions/config-method.instructions.md @@ -0,0 +1,178 @@ +--- +description: "Use when writing, fixing, or reviewing config.vsh.yaml files in src/methods/ or src/control_methods/. Covers required metadata, info fields, docker engine setup, nextflow runner labels, and how to verify components." +applyTo: "src/methods/**/config.vsh.yaml,src/control_methods/**/config.vsh.yaml" +--- +# Method & Control Method Config Guidelines + +## Structure Overview + +### Single-step method (`comp_method.yaml`) + +```yaml +__merge__: ../../api/comp_method.yaml # or comp_control_method.yaml +name: "my_method" # snake_case, unique +label: My Method # human-readable, used in tables +summary: "One sentence summary." # used in overview tables +description: | # multi-paragraph, used in docs + Longer description... +references: # omit for control methods + doi: + - 10.xxxx/xxxxx +links: # omit for control methods + repository: https://github.com/... + documentation: https://... +info: + preferred_normalization: log_cp10k # or counts, log_scran_pooling + variants: + my_method_default: + my_method_variant: + some_param: value +arguments: # only if method has extra params + - name: "--some_param" + type: integer + description: "..." + example: 100 # use example, NOT default + info: + test_default: 1 # override value used during viash test only +resources: + - type: python_script # or r_script + path: script.py # or script.R +engines: + - type: docker + image: openproblems/base_python:1 # see base images below + setup: + - type: python + packages: [package1, package2] +runners: + - type: executable + - type: nextflow + directives: + label: [midtime, highmem, midcpu] # adjust to actual needs +``` + +### Two-step method (train + predict) + +Methods that benefit from a train/predict split use two separate components: + +**Train component** (`comp_method_train.yaml`): +```yaml +__merge__: ../../../api/comp_method_train.yaml +name: my_method_train +resources: + - type: python_script + path: script.py +engines: + - type: docker + image: openproblems/base_python:1 + setup: + - type: python + packages: [package1] +runners: + - type: executable + - type: nextflow + directives: + label: [hightime, highmem, midcpu] +``` + +**Predict component** (`comp_method_predict.yaml`): +```yaml +__merge__: ../../../api/comp_method_predict.yaml +name: my_method_predict +resources: + - type: python_script + path: script.py +engines: + - type: docker + image: openproblems/base_python:1 + setup: + - type: python + packages: [package1] +runners: + - type: executable + - type: nextflow + directives: + label: [midtime, midmem, midcpu] +``` + +## Methods vs Control Methods + +| Field | Method | Control Method | +|---|---|---| +| `__merge__` | `/src/api/comp_method.yaml` | `/src/api/comp_control_method.yaml` | +| `references` | required | omit | +| `links` | recommended | omit | +| inputs | `input_train_mod1`, `input_train_mod2`, `input_test_mod1` | `input_train_mod1`, `input_train_mod2`, `input_test_mod1`, `input_test_mod2` | + +## Required Metadata Fields + +- `name`: unique, matches `[a-z][a-z0-9_]*` +- `label`: short human-readable name +- `summary`: one sentence +- `description`: one or more paragraphs +- `references.doi` (methods only): list of DOIs + +## info Section + +- `preferred_normalization`: one of `counts`, `log_cp10k`, `log_scran_pooling` +- `variants`: each key becomes a separate benchmark entry. Override any argument value by nesting it under the variant key. Every method needs at least one variant with the same name as the method. + +## Arguments + +- Do **not** set `default` on any argument — defaults belong to the library, not the config. Use `example` to document a typical value. +- Use `info.test_default` to override a parameter value **only during `viash test`** (not in benchmarks). This is useful to reduce epoch counts, disable slow quality checks, etc., so tests run quickly without affecting real benchmark results. +- Argument names use `--snake_case`. Viash exposes them in the script as `par['snake_case']` (Python) or `par$snake_case` (R). +- After adding, removing, or renaming any argument, regenerate the `## VIASH START` block in the script so the `par` dict stays in sync: + ```bash + viash config inject src/methods//config.vsh.yaml + ``` + +```yaml +arguments: + - name: --n_epochs + type: integer + description: "Number of training epochs." + example: 100 + info: + test_default: 1 # 1 epoch during testing for speed + - name: --flow_threshold + type: double + description: "Flow error threshold. Set to 0 to skip flow quality check." + example: 0.4 + info: + test_default: 0 # skip check during testing +``` + +## Base Docker Images + +| Image | Use for | +|---|---| +| `openproblems/base_python:1` | Python, CPU | +| `openproblems/base_r:1` | R, CPU | +| `openproblems/base_pytorch_nvidia:1` | PyTorch + NVIDIA GPU | +| `openproblems/base_tensorflow_nvidia:1` | TensorFlow + NVIDIA GPU | + +## Nextflow Runner Labels + +Set in `runners[type=nextflow].directives.label`. Pick one from each category: + +| Category | Options | +|---|---| +| Time | `lowtime`, `midtime`, `hightime` | +| Memory | `lowmem`, `midmem`, `highmem`, `veryhighmem` | +| CPU | `lowcpu`, `midcpu`, `highcpu` | +| GPU (optional) | `gpu`, `biggpu` | + +## Rebuilding the Docker Image + +After changing the `setup` section: +```bash +viash run src/methods//config.vsh.yaml -- ---setup cachedbuild +``` + +## Verification + +```bash +viash test src/methods//config.vsh.yaml +``` + +Both test scripts must succeed (`2 out of 2`). diff --git a/.github/instructions/config-metric.instructions.md b/.github/instructions/config-metric.instructions.md new file mode 100644 index 0000000..c6ea585 --- /dev/null +++ b/.github/instructions/config-metric.instructions.md @@ -0,0 +1,93 @@ +--- +description: "Use when writing, fixing, or reviewing config.vsh.yaml files in src/metrics/. Covers required metadata, the info.metrics list structure, docker engine setup, nextflow runner labels, and how to verify components." +applyTo: "src/metrics/**/config.vsh.yaml" +--- +# Metric Config Guidelines + +## Structure Overview + +Metrics differ from methods: metadata (`label`, `summary`, `description`, `references`) lives inside the `info.metrics` list, not at the top level. A single component can expose multiple metric values. + +```yaml +__merge__: ../../api/comp_metric.yaml +name: "my_metric" # snake_case, unique component name +info: + metrics: + - name: my_metric_value1 # snake_case, unique metric name + label: My Metric Value 1 # human-readable, used in tables + summary: "One sentence summary." + description: "Longer description." + references: + doi: 10.xxxx/xxxxx + min: 0 + max: 1 + maximize: true # true if higher = better + - name: my_metric_value2 + label: My Metric Value 2 + summary: "..." + description: "..." + references: + doi: 10.xxxx/xxxxx + min: 0 + max: 1 + maximize: false +resources: + - type: python_script # or r_script + path: script.py # or script.R +engines: + - type: docker + image: openproblems/base_python:1 # see base images below + setup: + - type: python + packages: [scikit-learn] +runners: + - type: executable + - type: nextflow + directives: + label: [midtime, midmem, midcpu] +``` + +## Required Fields per Metric Entry + +Each entry in `info.metrics` must have: +- `name`: unique metric identifier, snake_case +- `label`: short human-readable name +- `summary`: one sentence +- `description`: full description +- `references.doi`: DOI(s) for the metric +- `min` / `max`: numeric range of possible values +- `maximize`: `true` if higher score = better performance + +## Base Docker Images + +| Image | Use for | +|---|---| +| `openproblems/base_python:1` | Python, CPU | +| `openproblems/base_r:1` | R, CPU | + +Metrics rarely need GPU images. + +## Nextflow Runner Labels + +Metrics are typically lightweight. Use conservative defaults: + +| Category | Options | +|---|---| +| Time | `lowtime`, `midtime`, `hightime` | +| Memory | `lowmem`, `midmem`, `highmem` | +| CPU | `lowcpu`, `midcpu`, `highcpu` | + +## Rebuilding the Docker Image + +After changing the `setup` section: +```bash +viash run src/metrics//config.vsh.yaml -- ---setup cachedbuild +``` + +## Verification + +```bash +viash test src/metrics//config.vsh.yaml +``` + +Both test scripts must succeed (`2 out of 2`). diff --git a/.github/instructions/method-scripts-python.instructions.md b/.github/instructions/method-scripts-python.instructions.md new file mode 100644 index 0000000..5cdaf45 --- /dev/null +++ b/.github/instructions/method-scripts-python.instructions.md @@ -0,0 +1,97 @@ +--- +description: "Use when writing, fixing, or reviewing method/metric script.py files in src/methods/, src/metrics/, or src/control_methods/. Covers script style, API compliance, and how to verify components." +applyTo: "src/methods/**/script.py,src/metrics/**/script.py,src/control_methods/**/script.py" +--- +# Method & Metric Script Guidelines (Python) + +## Core Principle + +`script.py` should represent **typical bioinformatician usage** of the tool with minimal modifications. Only adapt what is strictly necessary to: +1. Read inputs from the paths provided by `par` +2. Pass the right data structures to the method +3. Convert the method's output back into the expected output structures +4. Write outputs to `par['output']` + +Do **not** restructure the method's native API, add abstraction layers, or rewrite the algorithm logic. + +## Finding API Specs + +Input/output file formats are defined in `src/api/`. Key files: +- `file_train_mod1.yaml` / `file_train_mod2.yaml` — training data AnnData fields (mod1 and mod2) +- `file_test_mod1.yaml` / `file_test_mod2.yaml` — test data AnnData fields +- `file_prediction.yaml` — expected output format for methods (has `layers["normalized"]`) +- `file_pretrained_model.yaml` — expected output format for train components +- `file_score.yaml` — expected output format for metrics +- `comp_method.yaml`, `comp_metric.yaml` — component argument specs + +Always check these before deciding what fields to read or write. + +## The `## VIASH START` / `## VIASH END` Block + +This block is **auto-generated** by viash from the component's `config.vsh.yaml` arguments. It is replaced at build/test time with a real CLI parser. Keep it in the script only as a local development convenience. + +- **Do not edit it manually** to add or remove parameters — edit `config.vsh.yaml` instead. +- After adding, removing, or renaming an argument in the config, regenerate the block: + ```bash + viash config inject src/methods//config.vsh.yaml + ``` +- Argument names in the config (`--my_param`) map directly to `par['my_param']` keys. + +## Common Patterns + +**Reading inputs (single-step method):** +```python +input_train_mod1 = ad.read_h5ad(par['input_train_mod1']) +input_train_mod2 = ad.read_h5ad(par['input_train_mod2']) +input_test_mod1 = ad.read_h5ad(par['input_test_mod1']) +``` + +**Writing prediction output:** +```python +output = ad.AnnData( + layers={"normalized": y_pred}, + obs=input_test_mod1.obs, + var=input_train_mod2.var, + uns={ + "dataset_id": input_train_mod1.uns["dataset_id"], + "method_id": meta["name"], + }, +) +output.write_h5ad(par['output'], compression="gzip") +``` + +**Writing metric score output:** +```python +output = ad.AnnData( + uns={ + "dataset_id": ad_pred.uns["dataset_id"], + "method_id": ad_pred.uns["method_id"], + "metric_ids": ["my_metric"], + "metric_values": [score], + } +) +output.write_h5ad(par['output'], compression="gzip") +``` + +**Reading inputs (predict component of a two-step method):** +```python +input_test_mod1 = ad.read_h5ad(par['input_test_mod1']) +# load model from par['input_model'] directory +``` + +## Dependency Fixes + +If a library has a dependency conflict (e.g., incompatible with newer `anndata`, `numpy`, etc.), prefer replacing it with an alternative that provides the same model/algorithm natively rather than pinning transitive dependencies. + +Update `config.vsh.yaml` to remove the broken package from the `setup` block when replacing it. + +## Verification + +After any change to a method script or config, verify with: +```bash +viash test src/methods//config.vsh.yaml +# or +viash test src/metrics//config.vsh.yaml +``` + +Both test scripts must succeed (`2 out of 2`). diff --git a/.github/instructions/method-scripts-r.instructions.md b/.github/instructions/method-scripts-r.instructions.md new file mode 100644 index 0000000..3be2256 --- /dev/null +++ b/.github/instructions/method-scripts-r.instructions.md @@ -0,0 +1,91 @@ +--- +description: "Use when writing, fixing, or reviewing method/metric script.R files in src/methods/, src/metrics/, or src/control_methods/. Covers script style, API compliance, and how to verify components." +applyTo: "src/methods/**/script.R,src/metrics/**/script.R,src/control_methods/**/script.R" +--- +# Method & Metric Script Guidelines (R) + +## Core Principle + +`script.R` should represent **typical bioinformatician usage** of the tool with minimal modifications. Only adapt what is strictly necessary to: +1. Read inputs from the paths provided by `par` +2. Pass the right data structures to the method +3. Convert the method's output back into the expected output structures +4. Write outputs to `par$output` + +Do **not** restructure the method's native API, add abstraction layers, or rewrite the algorithm logic. + +## Finding API Specs + +Input/output file formats are defined in `src/api/`. Key files: +- `file_train_mod1.yaml` / `file_train_mod2.yaml` — training data AnnData fields (mod1 and mod2) +- `file_test_mod1.yaml` / `file_test_mod2.yaml` — test data AnnData fields +- `file_prediction.yaml` — expected output format for methods (has `layers["normalized"]`) +- `file_pretrained_model.yaml` — expected output format for train components +- `file_score.yaml` — expected output format for metrics +- `comp_method.yaml`, `comp_metric.yaml` — component argument specs + +Always check these before deciding what fields to read or write. + +## The `## VIASH START` / `## VIASH END` Block + +This block is **auto-generated** by viash from the component's `config.vsh.yaml` arguments. It is replaced at build/test time with a real CLI parser. Keep it in the script only as a local development convenience. + +- **Do not edit it manually** to add or remove parameters — edit `config.vsh.yaml` instead. +- After adding, removing, or renaming an argument in the config, regenerate the block: + ```bash + viash config inject src/methods//config.vsh.yaml + ``` +- Argument names in the config (`--my_param`) map directly to `par$my_param` keys. + +## Common Patterns + +**Reading inputs (single-step method):** +```r +library(anndata, warn.conflicts = FALSE) +input_train_mod1 <- read_h5ad(par$input_train_mod1) +input_train_mod2 <- read_h5ad(par$input_train_mod2) +input_test_mod1 <- read_h5ad(par$input_test_mod1) +``` + +**Writing prediction output:** +```r +out <- anndata::AnnData( + layers = list(normalized = pred), + shape = dim(pred), + uns = list( + dataset_id = input_train_mod1$uns[["dataset_id"]], + method_id = meta$name + ) +) +out$write_h5ad(par$output, compression = "gzip") +``` + +**Writing metric score output:** +```r +out <- anndata::AnnData( + uns = list( + dataset_id = ad_pred$uns[["dataset_id"]], + method_id = ad_pred$uns[["method_id"]], + metric_ids = "my_metric", + metric_values = score + ) +) +out$write_h5ad(par$output, compression = "gzip") +``` + +## Dependency Fixes + +If a library has a dependency conflict (e.g., incompatible with a newer Bioconductor version, `anndata` R package, etc.), prefer replacing it with an alternative that provides the same model/algorithm natively rather than pinning transitive dependencies. + +Update `config.vsh.yaml` to remove the broken package from the `setup` block when replacing it. + +## Verification + +After any change to a method script or config, verify with: +```bash +viash test src/methods//config.vsh.yaml +# or +viash test src/metrics//config.vsh.yaml +``` + +Both test scripts must succeed (`2 out of 2`). diff --git a/_viash.yaml b/_viash.yaml index 7bcc914..a0d5c19 100644 --- a/_viash.yaml +++ b/_viash.yaml @@ -1,4 +1,4 @@ -viash_version: 0.9.4 +viash_version: 0.9.7 name: task_predict_modality organization: openproblems-bio diff --git a/src/methods/cellmapper_linear/script.py b/src/methods/cellmapper_linear/script.py index df35423..a363592 100644 --- a/src/methods/cellmapper_linear/script.py +++ b/src/methods/cellmapper_linear/script.py @@ -49,7 +49,7 @@ print("Predict on test data", flush=True) cmap.map_obsm(key="mod2", prediction_postfix="pred") -mod2_pred = csc_matrix(cmap.query.obsm["mod2_pred"]) +mod2_pred = csc_matrix(cmap.query.obsm["mod2pred"]) print("Write output AnnData to file", flush=True) output = ad.AnnData( diff --git a/src/methods/guanlab_dengkw_pm/script.py b/src/methods/guanlab_dengkw_pm/script.py index b9f3cb1..d11b735 100644 --- a/src/methods/guanlab_dengkw_pm/script.py +++ b/src/methods/guanlab_dengkw_pm/script.py @@ -123,7 +123,7 @@ ## Changed from csr to csc matrix as this is more supported. y_pred = csc_matrix(y_pred) -print("Write output AnnData to file", flush=True) +ad.settings.allow_write_nullable_strings = True output = ad.AnnData( layers = { 'normalized': y_pred }, obs = input_test_mod1.obs[[]], @@ -133,4 +133,8 @@ 'method_id': meta['name'] } ) + +print("Output AnnData object:", output, flush=True) + +print("Write output AnnData to file", flush=True) output.write_h5ad(par['output'], compression='gzip')