update rope_max_timescale to 1M for qwen3-30b-a3b-base to match HF#4039
update rope_max_timescale to 1M for qwen3-30b-a3b-base to match HF#4039JamesDeng42 wants to merge 1 commit into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
831ade9 to
6ba34f4
Compare
|
🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This Pull Request correctly updates the rope_max_timescale (and its Hugging Face counterpart rope_theta) for the qwen3-30b-a3b-base model to align with the Hugging Face configuration. The change ensures consistency between the MaxText model configuration and the checkpoint conversion utilities.
🔍 General Feedback
- The changes are focused and follow the established patterns in the codebase for model configuration and HF mapping.
- I noticed that while the model has been added to
hf_model_configs.pyandparam_mapping.py, it is currently missing fromsrc/maxtext/checkpoint_conversion/utils/hf_shape.py. This omission will likely cause theto_huggingfaceconversion script to fail for this specific model variant. I have added an inline comment suggesting this addition. - Overall, the PR is high quality and addresses the requirement of matching HF configurations.
…F configuration/shape mappings
6ba34f4 to
659d5b1
Compare
| vocab_size=151936, | ||
| ) | ||
|
|
||
| qwen3_30b_a3b_base_config = transformers.Qwen3MoeConfig( |
There was a problem hiding this comment.
can you have qwen3_30b_a3b_thinking_2507_config inherit from qwen3_30b_a3b_base_config and just override rope_theta=1000000?
also @parambole could you review qwen3-30b base model config change to match hf?
Description
This PR updates the
rope_max_timescalefor theqwen3-30b-a3b-basemodel configuration from10,000,000to1,000,000.Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.