Skip to content

Add Windows ARM64 wheel build support with NEON optimization#1959

Merged
matthewdouglas merged 5 commits into
bitsandbytes-foundation:mainfrom
pdeep854:win-arm-neon
May 29, 2026
Merged

Add Windows ARM64 wheel build support with NEON optimization#1959
matthewdouglas merged 5 commits into
bitsandbytes-foundation:mainfrom
pdeep854:win-arm-neon

Conversation

@pdeep854
Copy link
Copy Markdown
Contributor

Summary
Add Windows ARM64 wheel support and NEON-based performance optimizations.
Changes
Added Windows ARM64 (.whl) build support
Implemented vectorized 4-bit dequantization
Added NEON-optimized conversions:
BF16 → Float32
Float32 → BF16
Float32 → FP16
Optimized absmax computation for float32 blocks
Optimized norm_to_lut_index to process 4 float values per iteration
Impact
Improves ARM64 performance using NEON SIMD and enables native Windows ARM64 wheel support

Comment thread .github/workflows/python-package.yml Outdated
# code. This ensures compatibility across Python versions, as compatibility is
# dictated by the packaged code itself, not the Python version used for packaging.
python-version: ["3.10"]
python-version: ["3.12"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed specifically just for WoA? Could we consider making it conditional?

Is there anything we should consider in docs for a minimum Python version for WoA build/runtime requirement?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for WoA native Python is supported from version 3.11 onwards.

@github-actions
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@matthewdouglas
Copy link
Copy Markdown
Member

@pdeep854 This looks great, thanks! Can you please fix the lint issues with clang-format? Also, an update as appropriate to the installation/build instructions in the docs would be appreciated!

@matthewdouglas matthewdouglas merged commit 2177945 into bitsandbytes-foundation:main May 29, 2026
150 of 151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants