Add distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b by gagika · Pull Request #4028 · AI-Hypercomputer/maxtext

gagika · 2026-05-31T20:57:45Z

Description

One-command launchers for running distillation on TPU v7x. Each script sets the
right XLA flags, mounts a grain arrayrecord dataset via gcsfuse (ClimbMix by
default; configurable via XPK_DATASET_BUCKET / XPK_DATASET_SUBPATH),
configures distillation knobs, stages the HF tokenizer when needed, and submits
a workload via XPK.

Usage

# qwen3-30b-a3b-base distillation (~20% MFU)
bash scripts/distillation/distill_qwen3_30b_base.sh submit

# gpt-oss-20b distillation (~17% MFU)
bash scripts/distillation/distill_gpt_oss_20b.sh submit

# qwen3-30b at pdbs=8 with activation offload (~22% MFU)
XPK_DISTILL_CONFIG=src/maxtext/configs/post_train/distillation_qwen3_30b_base_pdbs8.yml \
XPK_YAML_GCS=gs://agagik-us/distill-configs/distillation_qwen3_30b_base_pdbs8.yml \
  bash scripts/distillation/distill_qwen3_30b_base.sh submit

Each launcher takes a mode argument (default submit):

submit — stage the YAML to GCS and create the xpk workload
monitor — stream logs for the last submitted workload
resume_until_done — auto-resubmit on failure until the run completes

Tests

End to end test for both gpt-oss and qwen3-30b models.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-31T21:02:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2026-05-31T22:34:07Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This Pull Request introduces distillation launchers and configurations for qwen3-30b-a3b-base and gpt-oss-20b models on TPU v7x. The additions are useful for standardizing distillation runs, but there are a few issues regarding redundancy and hardcoded personal paths.

🔍 General Feedback

Redundant Patch File: The file distillation-wrappers.patch appears to be a redundant diff of the entire PR and should be removed.
Hardcoded Defaults: Several scripts and configuration files contain default GCS paths and images pointing to personal buckets (agagik-us, yujiedeng-maxtext-dev). These should ideally be replaced with generic placeholders or public resources to improve maintainability and portability for other users.
Environment Management: The use of /dev/shm for TMPDIR and Hugging Face caches is a good performance optimization to avoid ephemeral storage limits, but setting it globally as TMPDIR should be done with caution.

github-actions · 2026-06-01T17:17:07Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR introduces comprehensive one-command distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b on TPU v7x. The additions include performance-tuned XLA flags, optimized YAML configurations (including activation offload for higher batch sizes), and enhancements to the shared XPK submission script to handle tokenizer staging and HF caching efficiently.

🔍 General Feedback

Robustness: The shared run_distill_xpk.sh was improved to handle HF caching in /dev/shm, which is a great optimization for TPU workloads. I've suggested some minor quoting fixes to ensure these scripts handle paths with spaces or special characters reliably.
Documentation: The scripts and YAML files include helpful comments explaining specific model quirks (e.g., the distill_beta=0 requirement for gpt-oss).
Defaults: While demo defaults are provided, I recommended using more generic placeholders for buckets and images to prevent accidental use of dev resources by other users.

github-actions · 2026-06-01T17:19:29Z

+export XPK_ZONE="${XPK_ZONE:-us-central1}"
+export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}"
+export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"
+export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"


🟢 Using a dev image as a demo default is acceptable, but it might be better to point to a more stable or public reference if available.

Suggested change

export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"

export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"

github-actions · 2026-06-01T17:19:29Z

    "$image_flag=$XPK_BASE_IMAGE" \
    --command "export PYTHONPATH=/deps/src:/app/src; \
 export BASE_OUTPUT_DIRECTORY=${OUTPUT_DIR}; \
+export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \


🟡 Wrap variables in quotes within the command string to handle paths with spaces correctly when executed in the TPU pod.

Suggested change

export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \

export HF_HOME=\"${XPK_HF_CACHE_DIR}\"; export HF_DATASETS_CACHE=\"${XPK_HF_CACHE_DIR}/datasets\"; mkdir -p \"${XPK_HF_CACHE_DIR}/datasets\"; \

github-actions · 2026-06-01T17:19:29Z

+--xla_tpu_aggressive_opt_barrier_removal=ENABLED \
+--xla_lhs_prioritize_async_depth_over_stall=ENABLED \
+--xla_tpu_enable_ag_backward_pipelining=true \
+--xla_should_allow_loop_variant_parameter_in_chain=ENABLED \


🟢 Use `printf --` to safely handle cases where the expansion of `${XPK_LIBTPU_INIT_ARGS:-$default_libtpu_args}` might start with a hyphen, preventing it from being interpreted as a `printf` flag.

Suggested change

--xla_should_allow_loop_variant_parameter_in_chain=ENABLED \

libtpu_init_args=$(printf -- '%s' "${XPK_LIBTPU_INIT_ARGS:-$default_libtpu_args}" | tr -s '[:space:]' ' ')

github-actions · 2026-06-01T17:19:29Z

+
+# Optional: stage HF tokenizer files from GCS for models whose tokenizer isn't
+# baked into the image (e.g. gpt-oss).
+tokenizer_prelude=""


🟡 Wrap variables in quotes to ensure the command remains valid if paths contain spaces or special characters.

Suggested change

tokenizer_prelude=""

tokenizer_prelude="mkdir -p \"${XPK_TOKENIZER_LOCAL}\" && gcloud storage rsync \"${XPK_TOKENIZER_GCS}\" \"${XPK_TOKENIZER_LOCAL}\";"

github-actions · 2026-06-01T17:19:29Z

+export XPK_PROJECT="${XPK_PROJECT:-cloud-tpu-multipod-dev}"
+export XPK_ZONE="${XPK_ZONE:-us-central1}"
+export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}"
+export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"


🟢 Using a specific user bucket as a demo default is acceptable since it's explicitly labeled, but consider using a generic placeholder like `gs://YOUR-BUCKET/distillation` to encourage users to set their own environment variables.

Suggested change

export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"

export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://YOUR-BUCKET/distillation}"

github-actions · 2026-06-01T17:19:36Z

+export XPK_ZONE="${XPK_ZONE:-us-central1}"
+export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}"
+export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"
+export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"


🟢 Consider using a more generic or stable base image reference for the demo default.

Suggested change

export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"

export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"

github-actions · 2026-06-01T17:19:36Z

+export XPK_PROJECT="${XPK_PROJECT:-cloud-tpu-multipod-dev}"
+export XPK_ZONE="${XPK_ZONE:-us-central1}"
+export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}"
+export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"


🟢 Consider using a generic placeholder for the demo default.

Suggested change

export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"

export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://YOUR-BUCKET/distillation}"

github-actions · 2026-06-02T02:14:16Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR introduces comprehensive distillation launchers and configurations for qwen3-30b-a3b-base and gpt-oss-20b models on TPU v7x. The updates to the underlying run_distill_xpk.sh script, including HF cache management and GCS-based asset staging, significantly improve the robustness and ease of use for distillation workloads.

🔍 General Feedback

Out-of-the-box Usability: While the scripts are well-structured, several defaults point to specific user buckets (gs://agagik-us/). Replacing these with generic placeholders or documenting them as mandatory overrides would improve the experience for the broader team.
Parallelism Consistency: The gpt-oss-20b configuration has a hardcoded ici_fsdp_parallelism that conflicts with the default cluster size in its launcher script. Using auto (-1) is preferred for better scalability.
XLA Tuning: The inclusion of tuned XLA flags for both models is a great addition, providing clear performance targets (~17-20% MFU).

github-actions · 2026-06-02T02:15:46Z

@@ -290,6 +330,11 @@ submit_workload() {
    "$image_flag=$XPK_BASE_IMAGE" \
    --command "export PYTHONPATH=/deps/src:/app/src; \


🟡 Exporting `LIBTPU_INIT_ARGS` with single quotes inside the `--command` string can be fragile if any XLA flags contain single quotes (e.g., for values with spaces). While current flags seem safe, using double quotes or a more robust escaping method would be more future-proof.

Suggested change

--command "export PYTHONPATH=/deps/src:/app/src; \

export LIBTPU_INIT_ARGS=\"${libtpu_init_args}\"; \

github-actions · 2026-06-02T02:15:46Z

+export XPK_DATASET_BUCKET="${XPK_DATASET_BUCKET:-maxtext-dataset}"
+export XPK_DATASET_SUBPATH="${XPK_DATASET_SUBPATH:-array-record/climbmix/*.arrayrecord}"
+
+# Stage HF tokenizer files (not in the image for gpt-oss).


🟡 Using a specific user's bucket as a default for `XPK_YAML_GCS` will cause the `submit` mode to fail for any other user due to lack of write permissions. Consider using a more generic placeholder or documenting this as a mandatory override.

Suggested change

# Stage HF tokenizer files (not in the image for gpt-oss).

export XPK_YAML_GCS="${XPK_YAML_GCS:-gs://YOUR-BUCKET/distill-configs/distillation_gpt_oss_20b.yml}"

github-actions · 2026-06-02T02:15:46Z

+export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"
+export XPK_PRIORITY="${XPK_PRIORITY:-high}"
+
+export XPK_USE_GCSFUSE=1


🟡 Similar to the GPT-OSS script, the default `XPK_YAML_GCS` points to a specific user's bucket, which will prevent other users from using the `submit` mode out of the box.

Suggested change

export XPK_USE_GCSFUSE=1

export XPK_YAML_GCS="${XPK_YAML_GCS:-gs://YOUR-BUCKET/distill-configs/distillation_qwen3_30b_base.yml}"

github-actions · 2026-06-02T02:15:46Z

+# distill_beta=0: decoder feature loss is broken on gpt-oss.
+export DISTILL_ALPHA="${DISTILL_ALPHA:-0.5}"
+export DISTILL_TEMPERATURE="${DISTILL_TEMPERATURE:-1.0}"
+export DISTILL_BETA="${DISTILL_BETA:-0}"


🟢 For consistency with the `qwen3` script and the default in `run_distill_xpk.sh`, consider using `61440` (60MB) unless `65536` (64MB) was specifically found to be necessary for `gpt-oss-20b`.

Suggested change

export DISTILL_BETA="${DISTILL_BETA:-0}"

export XPK_LIBTPU_INIT_ARGS="${XPK_LIBTPU_INIT_ARGS:---xla_tpu_scoped_vmem_limit_kib=61440 \

github-actions · 2026-06-02T02:16:20Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR introduces well-structured distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b models, significantly simplifying the setup for these workloads on TPU v7x. The inclusion of tuned XLA flags and optimized configuration files demonstrates a strong focus on performance (MFU).

🔍 General Feedback

Robustness: The shell scripts could be made more robust by consistently quoting paths and environment variables to handle potential special characters or spaces.
Consistency: A few XLA flags use true instead of the more standard ENABLED value found elsewhere in the repository; aligning these improves maintainability.
Explicit Overrides: Explicitly passing the staged tokenizer path to the training script ensures that the workload uses the intended assets regardless of the pod's working directory.
Documentation: The scripts include helpful comments and usage examples, which is great for usability.

github-actions · 2026-06-02T02:18:36Z

+# by latency_hiding_layer_scheduler.
+export XPK_LIBTPU_INIT_ARGS="${XPK_LIBTPU_INIT_ARGS:---xla_tpu_scoped_vmem_limit_kib=65536 \
+--xla_tpu_impure_enable_packed_bf16_math_ops=true \
+--xla_tpu_aggressive_opt_barrier_removal=true \


🟡 For consistency with other scripts in the repository (e.g., `distill_qwen3_30b_base.sh` and `run_distill_xpk.sh`) and the default XLA flags defined in `benchmarks/xla_flags_library.py`, consider using `ENABLED` instead of `true` for this flag.

Suggested change

--xla_tpu_aggressive_opt_barrier_removal=true \

--xla_tpu_aggressive_opt_barrier_removal=ENABLED \

github-actions · 2026-06-02T02:18:36Z

 export BASE_OUTPUT_DIRECTORY=${OUTPUT_DIR}; \
+export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \
+export TMPDIR=/dev/shm; export JAX_COMPILATION_CACHE_DIR=/dev/shm/jax_cache; \
+export HF_HOME=${XPK_HF_CACHE_DIR}; export HF_DATASETS_CACHE=${XPK_HF_CACHE_DIR}/datasets; mkdir -p ${XPK_HF_CACHE_DIR}/datasets; \


🟡 Quote the environment variable expansion and the directory path for robustness.

Suggested change

export HF_HOME=${XPK_HF_CACHE_DIR}; export HF_DATASETS_CACHE=${XPK_HF_CACHE_DIR}/datasets; mkdir -p ${XPK_HF_CACHE_DIR}/datasets; \

export HF_HOME='${XPK_HF_CACHE_DIR}'; export HF_DATASETS_CACHE='${XPK_HF_CACHE_DIR}/datasets'; mkdir -p '${XPK_HF_CACHE_DIR}/datasets'; \

github-actions · 2026-06-02T02:18:37Z

+export HF_HOME=${XPK_HF_CACHE_DIR}; export HF_DATASETS_CACHE=${XPK_HF_CACHE_DIR}/datasets; mkdir -p ${XPK_HF_CACHE_DIR}/datasets; \
+${yaml_prelude} \
+${tokenizer_prelude} \
 ${gcsfuse_prelude} \


🟠 When `XPK_TOKENIZER_LOCAL` is provided and staged, it should be explicitly passed to the training script as `tokenizer_path`. This ensures it's used instead of the potentially relative default value in the YAML, which might not resolve correctly depending on the working directory in the pod.

Suggested change

${gcsfuse_prelude} \

python3 -m maxtext.trainers.post_train.distillation.train_distill ${XPK_DISTILL_CONFIG} \

run_name=${XPK_RUN_NAME} \

${grain_files_override} \

${steps_override} \

${checkpoint_period_override} \

tokenizer_path=${XPK_TOKENIZER_LOCAL:-} \

distill_alpha=${DISTILL_ALPHA} \

distill_temperature=${DISTILL_TEMPERATURE} \

distill_beta=${DISTILL_BETA} \

distill_layer_indices="${DISTILL_LAYER_INDICES}"

github-actions · 2026-06-02T02:18:37Z

+# Optional: stage HF tokenizer files from GCS for models whose tokenizer isn't
+# baked into the image (e.g. gpt-oss).
+tokenizer_prelude=""
+if [ -n "${XPK_TOKENIZER_GCS:-}" ] && [ -n "${XPK_TOKENIZER_LOCAL:-}" ]; then


🟡 Quote the paths to handle potential spaces or special characters.

Suggested change

if [ -n "${XPK_TOKENIZER_GCS:-}" ] && [ -n "${XPK_TOKENIZER_LOCAL:-}" ]; then

tokenizer_prelude="mkdir -p '${XPK_TOKENIZER_LOCAL}' && gcloud storage rsync '${XPK_TOKENIZER_GCS}' '${XPK_TOKENIZER_LOCAL}';"

github-actions · 2026-06-02T02:18:37Z

  grain_files_override="grain_train_files=gs://${XPK_DATASET_BUCKET}/${XPK_DATASET_SUBPATH}"
 fi

+# Optional: stage the YAML from GCS instead of baking via upload_runner.


🟡 Quote the paths to handle potential spaces or special characters in the GCS path or local config path.

Suggested change

# Optional: stage the YAML from GCS instead of baking via upload_runner.

yaml_prelude="gcloud storage cp '${XPK_YAML_GCS}' '${XPK_DISTILL_CONFIG}';"

JamesDeng42

LGTM

gagika force-pushed the gagik-distill-perf branch from 94bcf37 to e202120 Compare May 31, 2026 22:24

gagika added the gemini-review label May 31, 2026

github-actions Bot reviewed May 31, 2026

View reviewed changes

gagika force-pushed the gagik-distill-perf branch from e202120 to 1a00405 Compare June 1, 2026 17:15

gagika added gemini-review and removed gemini-review labels Jun 1, 2026

github-actions Bot reviewed Jun 1, 2026

View reviewed changes

vlad-karp approved these changes Jun 1, 2026

View reviewed changes

gagika force-pushed the gagik-distill-perf branch from 1a00405 to bc5cc4a Compare June 2, 2026 02:12

gagika added gemini-review and removed gemini-review labels Jun 2, 2026

gagika force-pushed the gagik-distill-perf branch from bc5cc4a to fb8d0fb Compare June 2, 2026 02:15

github-actions Bot reviewed Jun 2, 2026

View reviewed changes

gagika added gemini-review and removed gemini-review labels Jun 2, 2026

github-actions Bot reviewed Jun 2, 2026

View reviewed changes

gagika force-pushed the gagik-distill-perf branch 2 times, most recently from c6e4983 to eba25b0 Compare June 2, 2026 03:55

gagika marked this pull request as ready for review June 2, 2026 03:55

gagika requested review from RissyRan, abhinavclemson, bvandermoon, gobbleturk, khatwanimohit and vipannalla as code owners June 2, 2026 03:55

gagika requested review from A9isha, NicoGrande, NuojCheng, SurbhiJainUSC, aireenmei, darisoy, dipannita08, hengtaoguo, igorts-git, jesselu-google, jiangjy1982, richjames0, shralex and suexu1025 as code owners June 2, 2026 03:55

vlad-karp approved these changes Jun 2, 2026

View reviewed changes

JamesDeng42 approved these changes Jun 2, 2026

View reviewed changes

Add distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b

03dafe7

gagika force-pushed the gagik-distill-perf branch from eba25b0 to 03dafe7 Compare June 2, 2026 17:51

gagika added the pull ready label Jun 2, 2026

	export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"
	export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/cloud-tpu-multipod-dev/maxtext_base_image:agagik-distill}"

	export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \
	export HF_HOME=\"${XPK_HF_CACHE_DIR}\"; export HF_DATASETS_CACHE=\"${XPK_HF_CACHE_DIR}/datasets\"; mkdir -p \"${XPK_HF_CACHE_DIR}/datasets\"; \

	--xla_should_allow_loop_variant_parameter_in_chain=ENABLED \
	libtpu_init_args=$(printf -- '%s' "${XPK_LIBTPU_INIT_ARGS:-$default_libtpu_args}" \| tr -s '[:space:]' ' ')

	tokenizer_prelude=""
	tokenizer_prelude="mkdir -p \"${XPK_TOKENIZER_LOCAL}\" && gcloud storage rsync \"${XPK_TOKENIZER_GCS}\" \"${XPK_TOKENIZER_LOCAL}\";"

	export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"
	export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://YOUR-BUCKET/distillation}"

		@@ -290,6 +330,11 @@ submit_workload() {
		"$image_flag=$XPK_BASE_IMAGE" \
		--command "export PYTHONPATH=/deps/src:/app/src; \

	--command "export PYTHONPATH=/deps/src:/app/src; \
	export LIBTPU_INIT_ARGS=\"${libtpu_init_args}\"; \

	# Stage HF tokenizer files (not in the image for gpt-oss).
	export XPK_YAML_GCS="${XPK_YAML_GCS:-gs://YOUR-BUCKET/distill-configs/distillation_gpt_oss_20b.yml}"

	export XPK_USE_GCSFUSE=1
	export XPK_YAML_GCS="${XPK_YAML_GCS:-gs://YOUR-BUCKET/distill-configs/distillation_qwen3_30b_base.yml}"

	export DISTILL_BETA="${DISTILL_BETA:-0}"
	export XPK_LIBTPU_INIT_ARGS="${XPK_LIBTPU_INIT_ARGS:---xla_tpu_scoped_vmem_limit_kib=61440 \

	--xla_tpu_aggressive_opt_barrier_removal=true \
	--xla_tpu_aggressive_opt_barrier_removal=ENABLED \

	export HF_HOME=${XPK_HF_CACHE_DIR}; export HF_DATASETS_CACHE=${XPK_HF_CACHE_DIR}/datasets; mkdir -p ${XPK_HF_CACHE_DIR}/datasets; \
	export HF_HOME='${XPK_HF_CACHE_DIR}'; export HF_DATASETS_CACHE='${XPK_HF_CACHE_DIR}/datasets'; mkdir -p '${XPK_HF_CACHE_DIR}/datasets'; \

-${gcsfuse_prelude} \
+python3 -m maxtext.trainers.post_train.distillation.train_distill ${XPK_DISTILL_CONFIG} \
+  run_name=${XPK_RUN_NAME} \
+  ${grain_files_override} \
+  ${steps_override} \
+  ${checkpoint_period_override} \
+  tokenizer_path=${XPK_TOKENIZER_LOCAL:-} \
+  distill_alpha=${DISTILL_ALPHA} \
+  distill_temperature=${DISTILL_TEMPERATURE} \
+  distill_beta=${DISTILL_BETA} \
+  distill_layer_indices="${DISTILL_LAYER_INDICES}"

	if [ -n "${XPK_TOKENIZER_GCS:-}" ] && [ -n "${XPK_TOKENIZER_LOCAL:-}" ]; then
	tokenizer_prelude="mkdir -p '${XPK_TOKENIZER_LOCAL}' && gcloud storage rsync '${XPK_TOKENIZER_GCS}' '${XPK_TOKENIZER_LOCAL}';"

	# Optional: stage the YAML from GCS instead of baking via upload_runner.
	yaml_prelude="gcloud storage cp '${XPK_YAML_GCS}' '${XPK_DISTILL_CONFIG}';"

Conversation

gagika commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Tests

Checklist

Uh oh!

codecov Bot commented May 31, 2026

Codecov Report

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

github-actions Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

github-actions Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gagika commented May 31, 2026 •

edited

Loading