Skip to content

Add stream-dse fused SwiGLU-prefill operator#122

Open
asyms wants to merge 2 commits into
amd:develfrom
KULeuven-MICAS:stream-dse-fused-swiglu
Open

Add stream-dse fused SwiGLU-prefill operator#122
asyms wants to merge 2 commits into
amd:develfrom
KULeuven-MICAS:stream-dse-fused-swiglu

Conversation

@asyms

@asyms asyms commented Jun 18, 2026

Copy link
Copy Markdown

Adds SwiGLUPrefillStream, a fused SwiGLU-prefill operator whose single MLIR design (gate/up GEMMs + SiLU + elementwise-mul + down GEMM) is generated by stream-dse and compiled to one xclbin, instead of chaining separately-compiled sub-operators.

Its per-kernel operand layouts (the tiled-strided DMA tiling) are authored on the IRON side and injected into stream-dse code generation via optimize_allocation_co(kernels=...) — so IRON owns the layouts while stream keeps kernel construction and the MLIR rewrite, instead of the layouts being hand-copied on both sides.

Added

  • SwiGLUPrefillStream (iron/operators/swiglu_prefill_stream/): fused stream-dse design → one xclbin; MLIR generated at build time by stream_design.py.
  • iron.common.layout: a TiledStridedLayout type (with to_snaxc()) for handing IRON-authored operand layouts to stream-dse.
  • stream_kernels.py: injects IRON's operand layouts into codegen through the kernels= override, replacing only operand_layouts() on stream's own kernels (requires stream-dse ≥ 1.13.4).
  • requirements_stream.txt (optional dependency stream-dse>=1.13.4); the operator's test skips when stream-dse is absent.
  • Minimal demo under demos/swiglu_prefill_stream/.

Changed

  • Importing iron.operators no longer requires an NPU runtime: lazy XRT/pyxrt import and PEP 562 lazy operator exports, so the package loads (and tests collect) on hosts without XRT/pyxrt.

Removed

  • None.

Running the demo

Prerequisites: the XDNA driver + XRT installed (/opt/xilinx/xrt) and an npu2 device. From a fresh clone of this branch:

python3 -m venv .venv && source .venv/bin/activate
source /opt/xilinx/xrt/setup.sh            # provides pyxrt
pip install --upgrade pip
pip install -r requirements.txt            # IRON + mlir_aie/llvm-aie toolchain + torch
pip install -r requirements_stream.txt     # stream-dse>=1.13.4 (PyPI)
stream-setup-aie                           # required: installs snaxc / xdsl-aie / aie-python-extras
python demos/swiglu_prefill_stream/demo.py

This generates the fused design with stream-dse, compiles it to an xclbin, and runs it once on the NPU (≈2 ms for the 256×512×2048 shape). stream-setup-aie is required: it installs the AIE codegen packages stream-dse needs that cannot be plain PyPI dependencies.

Licensing note

The new IRON-side files — iron/common/layout.py, iron/operators/swiglu_prefill_stream/stream_kernels.py, and demos/swiglu_prefill_stream/demo.py — carry a KU Leuven (MICAS) copyright header (Apache-2.0), as they were authored by MICAS; all other touched files keep their existing AMD headers. We can discuss this further.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR has been reviewed and approved.
  3. All checks are passing.

@andrej

andrej commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Hi Arne, sorry for the CI failures, if you rebase on #125 maybe once it's merged hopefully these should pass

SwiGLUPrefillStream compiles the whole SwiGLU-prefill block (gate/up GEMMs +
SiLU + elementwise-mul + down GEMM) as a single fused MLIR design generated by
stream-dse, producing one xclbin instead of chaining separately-compiled
sub-operators. The design is generated at build time by stream_design.py and
compiled through IRON's normal flow.

The fused design's per-kernel operand layouts (the tiled-strided DMA tiling) are
authored on the IRON side and fed into stream-dse code generation rather than
hand-copied inside stream: iron.common.layout provides a TiledStridedLayout type,
and swiglu_prefill_stream/stream_kernels.py injects IRON's layouts through
optimize_allocation_co(kernels=...) -- the override hook added in stream-dse
1.13.4 -- keeping stream's kernel construction and replacing only
operand_layouts().

stream-dse is an optional dependency (requirements_stream.txt); the operator's
test skips when it is absent. Importing iron.operators no longer requires an NPU
runtime (lazy XRT import), so the package loads on hosts without XRT/pyxrt.
Includes a minimal k=1 demo under demos/swiglu_prefill_stream/.
@asyms asyms force-pushed the stream-dse-fused-swiglu branch from 7f54de3 to 5103ce5 Compare June 22, 2026 19:55
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

9d092aa (2026_06_23_16_12_55)

IRON - CI Summary

Examples

iron/applications/llama_3.2_1b
Test Krackan Status Krackan Phoenix Status Phoenix
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40] - - -

Small

iron/operators/axpy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0] 149.00 352.04
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0] 188.30 376.80
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0] 159.44 887.56
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0] 203.12 - -
iron/operators/dequant
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32] 145.52 394.94
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32] 158.04 383.82
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32] 170.50 298.12
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32] 206.70 342.56
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32] 186.20 337.14
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32] 174.60 496.40
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32] 172.98 - -
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32] 214.04 - -
iron/operators/elementwise_add
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048] 146.22 345.96
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024] 175.32 410.54
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512] 200.82 426.16
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256] 203.54 - -
iron/operators/elementwise_mul
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048] 168.98 347.06
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024] 177.96 470.22
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512] 182.62 293.72
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256] 203.06 - -
iron/operators/gelu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 188.74 336.42
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 180.94 810.66
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 157.66 377.94
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 146.08 504.46
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 157.38 403.68
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 163.70 550.22
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 163.02 - -
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 203.30 - -
iron/operators/gemm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1] 2228.48 - -
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1] 263.92 891.64
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1] 266.18 538.24
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 48644.68 82419.90
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 28451.88 24955.14
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1] 7704.28 - -
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1] 2381.38 3222.12
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4] 3467.22 5832.76
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1] 1390.92 - -
iron/operators/gemv
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128] 0.23 0.09
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048] 12.43 3.57
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024] 24.43 6.63
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512] 39.61 9.39
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256] 41.28 - -
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024] 12.25 3.49
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024] 23.04 6.66
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024] 40.02 11.21
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024] 41.95 - -
iron/operators/layer_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 180.66 306.02
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 163.88 305.20
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 185.78 287.80
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 194.82 395.92
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 185.64 376.00
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 191.38 383.04
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 203.82 - -
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 237.40 - -
iron/operators/mem_copy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048] 142.32 303.26
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128] 206.46 - -
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024] 163.72 353.46
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024] 173.40 351.54
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512] 169.96 414.48
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512] 158.14 449.66
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256] 170.42 - -
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256] 195.72 443.04
iron/operators/mha
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0] 40903.22 - -
iron/operators/relu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 174.36 420.18
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 188.96 397.38
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 166.64 309.94
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 172.56 474.56
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 162.98 470.24
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 195.24 389.84
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 196.96 - -
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 277.82 - -
iron/operators/rms_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False] 163.24 363.98
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True] 191.72 352.88
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False] 161.82 597.40
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True] 173.86 415.40
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False] 164.82 436.20
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True] 161.04 414.12
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False] 215.88 613.70
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True] 218.50 497.98
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False] 186.54 436.52
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True] 184.66 387.70
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False] 187.38 451.44
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True] 237.86 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False] 205.00 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True] 226.04 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False] 243.46 - -
iron/operators/rope
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0] 170.02 418.40
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0] 180.84 465.80
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0] 178.54 533.62
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0] 216.10 - -
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0] 161.48 486.92
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0] 157.62 466.60
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0] 190.00 400.12
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0] 202.40 - -
iron/operators/sigmoid
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 130.36 272.40
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 135.44 250.46
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 128.86 364.08
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 143.86 384.42
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 147.94 373.22
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 157.20 425.62
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 145.36 - -
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 204.46 - -
iron/operators/silu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 168.30 493.66
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 170.32 421.52
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 167.86 358.68
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 168.64 - -
iron/operators/softmax
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024] 165.58 419.44
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048] 187.18 551.36
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512] 189.76 527.98
iron/operators/swiglu_decode
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584] 3713.77 11142.47
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048] 3746.93 15533.16
iron/operators/swiglu_prefill
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False] 10526.80 22250.99
iron/operators/tanh
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 168.32 600.02
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 171.50 290.72
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 211.16 367.78
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 161.56 427.46
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 186.68 397.26
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 160.96 431.70
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 191.98 - -
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 212.78 - -
iron/operators/transpose
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8] 156.18 1192.74
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8] 183.50 644.58
Krackan - Small

IRON

Tested on 2026_06_23_16_12_55 at commit 9d092aa.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5149.000.08n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5188.300.07n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5159.440.08n/a
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]✅ 5/5203.120.06n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5145.520.04n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5158.040.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5170.500.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5206.700.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5186.200.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5174.600.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]✅ 5/5172.980.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]✅ 5/5214.040.02n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5146.220.09n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5175.320.07n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5200.820.07n/a
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5203.540.07n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5168.980.08n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5177.960.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5182.620.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5203.060.06n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5188.740.04n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5180.940.05n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5157.660.05n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5146.080.06n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5157.380.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5163.700.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5163.020.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5203.300.04n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]✅ 5/52228.484.301689.76
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5263.920.9239.15
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5266.180.9440.19
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/548644.680.52353.17
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/528451.880.88603.83
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/57704.283.272230.52
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/52381.383.41894.71
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/53467.220.3820.36
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]✅ 5/51390.924.921518.57
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.230.23
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a12.4312.42
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a24.4324.41
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a39.6139.58
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]✅ 5/5n/a41.2841.26
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a12.2512.24
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a23.0423.03
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a40.0239.99
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a41.9541.92
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5180.660.05n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5163.880.05n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5185.780.05n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5194.820.04n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5185.640.05n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5191.380.05n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5203.820.04n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5237.400.04n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5142.320.06n/a
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]✅ 5/5206.460.04n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5163.720.05n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5173.400.05n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5169.960.05n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5158.140.05n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]✅ 5/5170.420.05n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5195.720.04n/a
iron/operators/mha
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]✅ 5/540903.220.21n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5174.360.05n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5188.960.04n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5166.640.05n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5172.560.05n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5162.980.05n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5195.240.04n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5196.960.04n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5277.820.03n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5163.240.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5191.720.06n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5161.820.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5173.860.06n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5164.820.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5161.040.06n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5215.880.04n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5218.500.04n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5186.540.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5184.660.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5187.380.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]✅ 5/5237.860.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]✅ 5/5205.000.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]✅ 5/5226.040.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]✅ 5/5243.460.04n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5170.020.59n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5180.840.57n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5178.540.58n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]✅ 5/5216.100.47n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5161.480.47n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5157.620.48n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5190.000.40n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]✅ 5/5202.400.37n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5130.360.06n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5135.440.06n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5128.860.06n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5143.860.06n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5147.940.06n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5157.200.05n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5145.360.06n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5204.460.04n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5168.300.05n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5170.320.05n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5167.860.05n/a
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5168.640.05n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5165.580.86n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5187.180.72n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5189.760.72n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/53713.770.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/53746.930.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/510526.800.21n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5168.320.05n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5171.500.05n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5211.160.04n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5161.560.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5186.680.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5160.960.05n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5191.980.04n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5212.780.04n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]✅ 5/5156.183.46n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]✅ 5/5183.502.87n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.10 (+0.35%)0.08 (-2.97%)0.09 (-3.31%)0.07 (-2.26%)0.01 (+6.50%)174.90 (+2.28%)149.00 (+3.30%)143.90 (+3.45%)118.40 (-0.34%)21.98 (+7.91%)
4d4b803 — 2026-06-22 17:54:570.10 (n/a)0.09 (n/a)0.09 (n/a)0.07 (n/a)0.01 (n/a)171.00 (n/a)144.24 (n/a)139.10 (n/a)118.80 (n/a)20.37 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (-17.03%)0.07 (-12.84%)0.07 (-12.39%)0.05 (-10.61%)0.01 (-30.44%)233.10 (+11.85%)188.30 (+13.98%)179.40 (+14.12%)170.70 (+20.55%)25.57 (-4.93%)
4d4b803 — 2026-06-22 17:54:570.09 (n/a)0.08 (n/a)0.08 (n/a)0.06 (n/a)0.01 (n/a)208.40 (n/a)165.20 (n/a)157.20 (n/a)141.60 (n/a)26.89 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.09 (-18.66%)0.08 (+3.75%)0.08 (+9.57%)0.06 (+15.64%)0.01 (-55.53%)199.00 (-13.55%)159.44 (-8.15%)151.50 (-8.73%)144.50 (+22.98%)22.38 (-52.58%)
4d4b803 — 2026-06-22 17:54:570.10 (n/a)0.08 (n/a)0.07 (n/a)0.05 (n/a)0.02 (n/a)230.20 (n/a)173.58 (n/a)166.00 (n/a)117.50 (n/a)47.20 (n/a)

test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.08 (+1.71%)0.06 (+9.92%)0.07 (+24.35%)0.04 (+5.18%)0.02 (+1.52%)279.30 (-4.90%)203.12 (-8.94%)181.20 (-19.61%)147.40 (-1.67%)51.44 (-0.58%)
4d4b803 — 2026-06-22 17:54:570.08 (n/a)0.06 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)293.70 (n/a)223.06 (n/a)225.40 (n/a)149.90 (n/a)51.74 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.04 (+4.15%)0.04 (+15.01%)0.03 (+10.92%)0.03 (+13.35%)0.01 (+6.02%)172.90 (-11.79%)145.52 (-13.13%)155.60 (-9.85%)120.40 (-3.99%)23.51 (-10.38%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)196.00 (n/a)167.52 (n/a)172.60 (n/a)125.40 (n/a)26.23 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (+3.85%)0.03 (-2.91%)0.03 (+2.59%)0.03 (+21.87%)0.01 (-19.70%)191.80 (-17.96%)158.04 (-0.65%)164.60 (-2.55%)106.40 (-3.71%)34.21 (-32.99%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)233.80 (n/a)159.08 (n/a)168.90 (n/a)110.50 (n/a)51.06 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.03 (-24.45%)0.03 (-13.05%)0.03 (-6.21%)0.03 (+0.49%)0.00 (-68.23%)183.90 (-0.49%)170.50 (+12.10%)173.00 (+6.59%)157.40 (+32.38%)12.06 (-57.64%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)184.80 (n/a)152.10 (n/a)162.30 (n/a)118.90 (n/a)28.47 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.04 (+5.95%)0.03 (-5.02%)0.03 (+5.03%)0.01 (-40.38%)0.01 (+72.90%)381.50 (+67.69%)206.70 (+16.89%)168.90 (-4.79%)131.10 (-5.62%)100.06 (+196.46%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)227.50 (n/a)176.84 (n/a)177.40 (n/a)138.90 (n/a)33.75 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.04 (-8.99%)0.03 (-20.98%)0.03 (-24.14%)0.02 (-28.07%)0.01 (+29.80%)243.00 (+39.02%)186.20 (+29.99%)175.60 (+31.83%)131.50 (+9.86%)43.43 (+97.50%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)174.80 (n/a)143.24 (n/a)133.20 (n/a)119.70 (n/a)21.99 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.04 (+5.29%)0.03 (-0.55%)0.03 (-5.20%)0.02 (-4.90%)0.00 (+6.19%)223.40 (+5.18%)174.60 (+0.85%)167.60 (+5.48%)142.80 (-4.99%)30.68 (+9.47%)
4d4b803 — 2026-06-22 17:54:570.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)212.40 (n/a)173.12 (n/a)158.90 (n/a)150.30 (n/a)28.03 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.04 (+4.67%)0.03 (+6.83%)0.03 (+8.94%)0.02 (+0.72%)0.01 (+3.84%)220.70 (-0.72%)172.98 (-6.44%)179.30 (-8.24%)118.10 (-4.45%)40.50 (-4.21%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)222.30 (n/a)184.88 (n/a)195.40 (n/a)123.60 (n/a)42.28 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.03 (-4.99%)0.02 (-2.79%)0.02 (-5.38%)0.02 (+4.78%)0.00 (-17.40%)247.60 (-4.55%)214.04 (+2.32%)218.40 (+5.71%)188.40 (+5.25%)24.95 (-19.66%)
4d4b803 — 2026-06-22 17:54:570.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)259.40 (n/a)209.18 (n/a)206.60 (n/a)179.00 (n/a)31.06 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.11 (n/a)0.09 (n/a)0.09 (n/a)0.07 (n/a)0.02 (n/a)182.00 (n/a)146.22 (n/a)143.10 (n/a)115.20 (n/a)28.97 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.09 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.02 (n/a)264.30 (n/a)175.32 (n/a)166.10 (n/a)137.80 (n/a)51.47 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.10 (n/a)0.07 (n/a)0.07 (n/a)0.04 (n/a)0.02 (n/a)316.40 (n/a)200.82 (n/a)186.40 (n/a)126.90 (n/a)70.10 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.09 (n/a)0.07 (n/a)0.08 (n/a)0.03 (n/a)0.02 (n/a)375.60 (n/a)203.54 (n/a)160.40 (n/a)141.80 (n/a)97.45 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.10 (n/a)0.08 (n/a)0.08 (n/a)0.05 (n/a)0.02 (n/a)233.50 (n/a)168.98 (n/a)160.60 (n/a)122.00 (n/a)45.31 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.10 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.02 (n/a)221.30 (n/a)177.96 (n/a)182.20 (n/a)124.10 (n/a)35.59 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.08 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.01 (n/a)227.80 (n/a)182.62 (n/a)180.80 (n/a)146.10 (n/a)30.58 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)247.70 (n/a)203.06 (n/a)196.90 (n/a)172.30 (n/a)28.42 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (n/a)0.04 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)223.30 (n/a)188.74 (n/a)176.00 (n/a)169.10 (n/a)24.19 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)243.70 (n/a)180.94 (n/a)174.20 (n/a)150.80 (n/a)37.64 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)183.40 (n/a)157.66 (n/a)162.90 (n/a)116.00 (n/a)24.92 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)173.30 (n/a)146.08 (n/a)140.60 (n/a)121.30 (n/a)22.33 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (n/a)0.05 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)202.80 (n/a)157.38 (n/a)145.90 (n/a)135.10 (n/a)27.55 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)195.90 (n/a)163.70 (n/a)161.40 (n/a)127.20 (n/a)31.00 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)215.30 (n/a)163.02 (n/a)170.80 (n/a)117.20 (n/a)36.96 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.00 (n/a)235.40 (n/a)203.30 (n/a)195.20 (n/a)176.50 (n/a)25.16 (n/a)
iron/operators/gemm

test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:314.96 (-0.61%)4.30 (+1.22%)4.19 (+3.57%)3.50 (-11.64%)0.63 (+48.35%)2684.40 (+13.18%)2228.48 (-0.16%)2246.30 (-3.45%)1895.60 (+0.61%)334.17 (+66.66%)1951.52 (-0.61%)1689.76 (+1.22%)1646.87 (+3.57%)1378.11 (-11.64%)249.00 (+48.35%)
4d4b803 — 2026-06-22 17:54:574.99 (n/a)4.24 (n/a)4.04 (n/a)3.96 (n/a)0.43 (n/a)2371.90 (n/a)2232.02 (n/a)2326.50 (n/a)1884.20 (n/a)200.51 (n/a)1963.40 (n/a)1669.47 (n/a)1590.09 (n/a)1559.66 (n/a)167.85 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:311.42 (-8.65%)0.92 (-1.29%)0.72 (-6.05%)0.64 (-1.01%)0.33 (-12.70%)347.00 (+1.02%)263.92 (-0.32%)305.60 (+6.44%)155.70 (+9.42%)80.70 (-4.51%)60.60 (-8.65%)39.15 (-1.29%)30.88 (-6.05%)27.19 (-1.01%)14.13 (-12.70%)
4d4b803 — 2026-06-22 17:54:571.55 (n/a)0.93 (n/a)0.77 (n/a)0.64 (n/a)0.38 (n/a)343.50 (n/a)264.76 (n/a)287.10 (n/a)142.30 (n/a)84.51 (n/a)66.33 (n/a)39.67 (n/a)32.87 (n/a)27.47 (n/a)16.19 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:311.41 (+42.56%)0.94 (+12.84%)1.01 (+15.06%)0.57 (-14.62%)0.36 (+182.62%)391.00 (+17.14%)266.18 (-1.47%)220.00 (-13.08%)156.70 (-29.86%)105.54 (+144.31%)60.22 (+42.56%)40.19 (+12.84%)42.90 (+15.06%)24.14 (-14.62%)15.29 (+182.62%)
4d4b803 — 2026-06-22 17:54:570.99 (n/a)0.83 (n/a)0.87 (n/a)0.66 (n/a)0.13 (n/a)333.80 (n/a)270.16 (n/a)253.10 (n/a)223.40 (n/a)43.20 (n/a)42.24 (n/a)35.62 (n/a)37.28 (n/a)28.27 (n/a)5.41 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:310.52 (-0.67%)0.52 (-0.15%)0.52 (-0.01%)0.52 (-0.12%)0.00 (-70.77%)48728.60 (+0.12%)48644.68 (+0.15%)48632.40 (+0.01%)48598.30 (+0.67%)49.18 (-70.52%)353.51 (-0.67%)353.17 (-0.15%)353.26 (-0.01%)352.56 (-0.12%)0.36 (-70.77%)
4d4b803 — 2026-06-22 17:54:570.52 (n/a)0.52 (n/a)0.52 (n/a)0.52 (n/a)0.00 (n/a)48670.60 (n/a)48570.74 (n/a)48627.00 (n/a)48274.70 (n/a)166.79 (n/a)355.88 (n/a)353.71 (n/a)353.30 (n/a)352.98 (n/a)1.22 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:310.89 (+0.06%)0.88 (+0.47%)0.88 (+0.44%)0.88 (+1.28%)0.00 (-50.12%)28568.70 (-1.26%)28451.88 (-0.47%)28474.40 (-0.44%)28257.00 (-0.06%)120.35 (-50.81%)607.99 (+0.06%)603.83 (+0.47%)603.34 (+0.44%)601.35 (+1.28%)2.56 (-50.12%)
4d4b803 — 2026-06-22 17:54:570.89 (n/a)0.88 (n/a)0.88 (n/a)0.87 (n/a)0.01 (n/a)28932.90 (n/a)28586.36 (n/a)28600.70 (n/a)28274.00 (n/a)244.64 (n/a)607.62 (n/a)601.02 (n/a)600.68 (n/a)593.78 (n/a)5.14 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:313.34 (+2.00%)3.27 (+3.24%)3.28 (+3.85%)3.19 (+3.27%)0.06 (-12.77%)7899.80 (-3.17%)7704.28 (-3.14%)7670.90 (-3.71%)7539.10 (-1.96%)141.38 (-16.83%)2278.78 (+2.00%)2230.52 (+3.24%)2239.62 (+3.85%)2174.73 (+3.27%)40.75 (-12.77%)
4d4b803 — 2026-06-22 17:54:573.27 (n/a)3.16 (n/a)3.16 (n/a)3.08 (n/a)0.07 (n/a)8158.20 (n/a)7954.42 (n/a)7966.10 (n/a)7689.80 (n/a)169.99 (n/a)2234.13 (n/a)2160.59 (n/a)2156.62 (n/a)2105.84 (n/a)46.71 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:313.68 (-15.42%)3.41 (-10.09%)3.51 (-5.16%)2.90 (-8.67%)0.33 (-30.95%)2779.90 (+9.50%)2381.38 (+10.69%)2295.60 (+5.44%)2192.40 (+18.23%)245.20 (-10.09%)964.21 (-15.42%)894.71 (-10.09%)920.88 (-5.16%)760.44 (-8.67%)85.28 (-30.95%)
4d4b803 — 2026-06-22 17:54:574.35 (n/a)3.79 (n/a)3.70 (n/a)3.18 (n/a)0.47 (n/a)2538.80 (n/a)2151.30 (n/a)2177.10 (n/a)1854.30 (n/a)272.71 (n/a)1140.03 (n/a)995.10 (n/a)970.97 (n/a)832.66 (n/a)123.51 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:310.56 (+57.67%)0.38 (+19.53%)0.35 (+7.13%)0.29 (+5.55%)0.11 (+230.44%)4333.10 (-5.26%)3467.22 (-12.72%)3569.90 (-6.66%)2223.40 (-36.57%)779.52 (+87.35%)30.18 (+57.67%)20.36 (+19.53%)18.80 (+7.13%)15.49 (+5.55%)5.71 (+230.44%)
4d4b803 — 2026-06-22 17:54:570.36 (n/a)0.32 (n/a)0.33 (n/a)0.27 (n/a)0.03 (n/a)4573.70 (n/a)3972.46 (n/a)3824.60 (n/a)3505.50 (n/a)416.07 (n/a)19.14 (n/a)17.04 (n/a)17.55 (n/a)14.67 (n/a)1.73 (n/a)

test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:316.36 (-3.14%)4.92 (+7.35%)4.80 (+7.09%)3.78 (+5.57%)0.93 (-23.09%)1759.80 (-5.28%)1390.92 (-8.71%)1386.40 (-6.61%)1045.40 (+3.24%)253.58 (-25.75%)1965.99 (-3.14%)1518.57 (+7.35%)1482.45 (+7.09%)1167.84 (+5.57%)286.42 (-23.09%)
4d4b803 — 2026-06-22 17:54:576.57 (n/a)4.58 (n/a)4.48 (n/a)3.58 (n/a)1.21 (n/a)1857.80 (n/a)1523.60 (n/a)1484.60 (n/a)1012.60 (n/a)341.54 (n/a)2029.68 (n/a)1414.56 (n/a)1384.37 (n/a)1106.25 (n/a)372.43 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:310.30 (+41.17%)0.23 (+18.83%)0.21 (+7.64%)0.19 (+25.32%)0.04 (+76.75%)0.30 (+41.17%)0.23 (+18.83%)0.21 (+7.64%)0.19 (+25.32%)0.04 (+76.75%)
4d4b803 — 2026-06-22 17:54:570.21 (n/a)0.19 (n/a)0.20 (n/a)0.15 (n/a)0.02 (n/a)0.21 (n/a)0.19 (n/a)0.20 (n/a)0.15 (n/a)0.02 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3113.63 (+1.73%)12.43 (-6.06%)12.49 (-5.53%)11.23 (-14.06%)0.86 (+534.77%)13.62 (+1.73%)12.42 (-6.06%)12.48 (-5.53%)11.22 (-14.06%)0.86 (+534.76%)
4d4b803 — 2026-06-22 17:54:5713.40 (n/a)13.23 (n/a)13.22 (n/a)13.06 (n/a)0.14 (n/a)13.39 (n/a)13.22 (n/a)13.21 (n/a)13.05 (n/a)0.14 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3125.26 (-0.33%)24.43 (-2.05%)24.30 (-2.78%)23.84 (-2.48%)0.53 (+43.89%)25.24 (-0.33%)24.41 (-2.05%)24.29 (-2.78%)23.83 (-2.48%)0.53 (+43.89%)
4d4b803 — 2026-06-22 17:54:5725.34 (n/a)24.94 (n/a)25.00 (n/a)24.45 (n/a)0.37 (n/a)25.33 (n/a)24.92 (n/a)24.98 (n/a)24.43 (n/a)0.37 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3141.11 (-8.10%)39.61 (-4.96%)40.30 (-3.45%)37.32 (-4.27%)1.62 (-21.01%)41.08 (-8.10%)39.58 (-4.96%)40.27 (-3.45%)37.30 (-4.27%)1.62 (-21.01%)
4d4b803 — 2026-06-22 17:54:5744.73 (n/a)41.67 (n/a)41.73 (n/a)38.99 (n/a)2.06 (n/a)44.70 (n/a)41.65 (n/a)41.71 (n/a)38.96 (n/a)2.06 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3142.76 (-3.70%)41.28 (-3.38%)42.30 (-1.03%)37.78 (-6.57%)2.08 (+39.38%)42.74 (-3.70%)41.26 (-3.38%)42.27 (-1.03%)37.76 (-6.57%)2.08 (+39.38%)
4d4b803 — 2026-06-22 17:54:5744.41 (n/a)42.73 (n/a)42.73 (n/a)40.44 (n/a)1.49 (n/a)44.38 (n/a)42.70 (n/a)42.71 (n/a)40.41 (n/a)1.49 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3113.22 (-0.52%)12.25 (-5.88%)12.22 (-7.73%)11.30 (-7.01%)0.94 (+93.43%)13.21 (-0.52%)12.24 (-5.88%)12.21 (-7.73%)11.29 (-7.01%)0.94 (+93.43%)
4d4b803 — 2026-06-22 17:54:5713.29 (n/a)13.01 (n/a)13.24 (n/a)12.15 (n/a)0.48 (n/a)13.28 (n/a)13.00 (n/a)13.23 (n/a)12.14 (n/a)0.48 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3125.03 (+1.20%)23.04 (-5.44%)24.00 (-1.32%)17.99 (-25.54%)2.86 (+1156.33%)25.01 (+1.20%)23.03 (-5.44%)23.98 (-1.32%)17.98 (-25.54%)2.86 (+1156.34%)
4d4b803 — 2026-06-22 17:54:5724.73 (n/a)24.37 (n/a)24.32 (n/a)24.16 (n/a)0.23 (n/a)24.71 (n/a)24.35 (n/a)24.30 (n/a)24.15 (n/a)0.23 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3142.05 (-1.05%)40.02 (-0.99%)39.90 (+0.27%)38.61 (-0.69%)1.28 (-9.54%)42.02 (-1.05%)39.99 (-0.99%)39.88 (+0.27%)38.58 (-0.69%)1.28 (-9.54%)
4d4b803 — 2026-06-22 17:54:5742.49 (n/a)40.41 (n/a)39.80 (n/a)38.87 (n/a)1.42 (n/a)42.47 (n/a)40.39 (n/a)39.77 (n/a)38.85 (n/a)1.42 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 16:07:3143.63 (-0.39%)41.95 (+0.02%)42.54 (-2.60%)38.70 (+2.47%)1.89 (-29.22%)43.61 (-0.39%)41.92 (+0.02%)42.52 (-2.60%)38.68 (+2.47%)1.89 (-29.22%)
4d4b803 — 2026-06-22 17:54:5743.81 (n/a)41.94 (n/a)43.68 (n/a)37.77 (n/a)2.67 (n/a)43.78 (n/a)41.91 (n/a)43.65 (n/a)37.75 (n/a)2.67 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)245.30 (n/a)180.66 (n/a)174.10 (n/a)138.90 (n/a)39.22 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)203.80 (n/a)163.88 (n/a)164.90 (n/a)132.70 (n/a)31.12 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)241.30 (n/a)185.78 (n/a)192.30 (n/a)120.70 (n/a)44.60 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)276.40 (n/a)194.82 (n/a)169.80 (n/a)139.50 (n/a)57.51 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)255.70 (n/a)185.64 (n/a)174.50 (n/a)134.10 (n/a)48.67 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)262.60 (n/a)191.38 (n/a)200.00 (n/a)137.80 (n/a)49.64 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.04 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)222.80 (n/a)203.82 (n/a)205.40 (n/a)188.50 (n/a)14.49 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)317.50 (n/a)237.40 (n/a)223.70 (n/a)167.00 (n/a)55.36 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (-6.63%)0.06 (+11.65%)0.06 (+17.00%)0.05 (+13.65%)0.01 (-35.79%)174.30 (-12.01%)142.32 (-12.31%)140.70 (-14.52%)122.20 (+7.10%)20.01 (-37.43%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)198.10 (n/a)162.30 (n/a)164.60 (n/a)114.10 (n/a)31.98 (n/a)

test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (-5.63%)0.04 (+27.33%)0.04 (+40.41%)0.04 (+36.90%)0.01 (-43.28%)233.50 (-26.94%)206.46 (-24.86%)217.20 (-28.79%)175.20 (+5.99%)28.27 (-55.17%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)319.60 (n/a)274.78 (n/a)305.00 (n/a)165.30 (n/a)63.06 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (+11.65%)0.05 (+0.56%)0.05 (-4.09%)0.04 (-16.93%)0.01 (+99.29%)226.10 (+20.39%)163.72 (+3.12%)162.70 (+4.29%)121.60 (-10.46%)41.34 (+111.76%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)187.80 (n/a)158.76 (n/a)156.00 (n/a)135.80 (n/a)19.52 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-14.76%)0.05 (-14.83%)0.05 (-11.17%)0.04 (-8.15%)0.01 (-35.20%)196.30 (+8.87%)173.40 (+15.83%)177.40 (+12.56%)136.40 (+17.28%)22.32 (-19.01%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)180.30 (n/a)149.70 (n/a)157.60 (n/a)116.30 (n/a)27.56 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-17.47%)0.05 (-5.29%)0.05 (-5.40%)0.04 (-7.88%)0.01 (-19.87%)203.00 (+8.56%)169.96 (+5.13%)178.40 (+5.75%)133.90 (+21.18%)32.23 (+8.71%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)187.00 (n/a)161.66 (n/a)168.70 (n/a)110.50 (n/a)29.64 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-8.56%)0.05 (+3.67%)0.05 (+12.52%)0.05 (+34.93%)0.01 (-54.39%)174.10 (-25.88%)158.14 (-8.76%)167.30 (-11.15%)129.90 (+9.34%)18.90 (-61.36%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)234.90 (n/a)173.32 (n/a)188.30 (n/a)118.80 (n/a)48.92 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (+2.94%)0.05 (+15.04%)0.05 (+9.99%)0.04 (+11.30%)0.01 (-2.92%)209.10 (-10.14%)170.42 (-13.54%)176.20 (-9.08%)137.90 (-2.89%)29.73 (-15.96%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)232.70 (n/a)197.10 (n/a)193.80 (n/a)142.00 (n/a)35.38 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-1.44%)0.04 (-11.66%)0.04 (-7.52%)0.03 (-27.27%)0.01 (+17.24%)288.20 (+37.50%)195.72 (+16.49%)190.00 (+8.14%)136.50 (+1.49%)56.13 (+75.58%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)209.60 (n/a)168.02 (n/a)175.70 (n/a)134.50 (n/a)31.97 (n/a)
iron/operators/mha

test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.21 (-0.11%)0.21 (-0.15%)0.20 (-0.20%)0.20 (-0.15%)0.00 (+22.86%)40943.50 (+0.15%)40903.22 (+0.15%)40923.70 (+0.20%)40835.20 (+0.11%)42.94 (+23.14%)
4d4b803 — 2026-06-22 17:54:570.21 (n/a)0.21 (n/a)0.21 (n/a)0.21 (n/a)0.00 (n/a)40882.90 (n/a)40842.88 (n/a)40843.80 (n/a)40791.90 (n/a)34.87 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (+2.87%)0.05 (+8.18%)0.06 (+21.94%)0.04 (+1.92%)0.01 (+12.89%)218.70 (-1.88%)163.24 (-6.66%)140.60 (-17.97%)122.10 (-2.79%)41.42 (+9.84%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)222.90 (n/a)174.88 (n/a)171.40 (n/a)125.60 (n/a)37.70 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (-26.49%)0.06 (-21.66%)0.06 (-20.74%)0.05 (-23.23%)0.01 (-34.98%)227.00 (+30.24%)191.72 (+27.22%)192.00 (+26.15%)166.40 (+35.95%)22.61 (+18.10%)
4d4b803 — 2026-06-22 17:54:570.10 (n/a)0.08 (n/a)0.08 (n/a)0.07 (n/a)0.01 (n/a)174.30 (n/a)150.70 (n/a)152.20 (n/a)122.40 (n/a)19.14 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (-5.34%)0.05 (-12.51%)0.05 (-21.66%)0.04 (-7.04%)0.01 (-10.38%)201.80 (+7.57%)161.82 (+13.85%)161.10 (+27.65%)115.10 (+5.69%)36.53 (+2.51%)
4d4b803 — 2026-06-22 17:54:570.08 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)187.60 (n/a)142.14 (n/a)126.20 (n/a)108.90 (n/a)35.64 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.08 (+12.97%)0.06 (+10.47%)0.05 (-4.69%)0.05 (+37.41%)0.01 (+5.81%)208.70 (-27.23%)173.86 (-10.84%)193.40 (+4.94%)128.10 (-11.47%)35.29 (-34.99%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)286.80 (n/a)195.00 (n/a)184.30 (n/a)144.70 (n/a)54.29 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.07 (+1.19%)0.05 (+0.35%)0.06 (+17.67%)0.03 (-15.04%)0.01 (+33.64%)239.40 (+17.70%)164.82 (+3.22%)137.30 (-14.98%)125.00 (-1.19%)50.98 (+55.97%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)203.40 (n/a)159.68 (n/a)161.50 (n/a)126.50 (n/a)32.68 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.08 (-9.23%)0.06 (+10.82%)0.07 (+10.71%)0.05 (+52.57%)0.01 (-45.88%)190.40 (-34.46%)161.04 (-15.54%)157.20 (-9.71%)131.40 (+10.23%)25.01 (-61.02%)
4d4b803 — 2026-06-22 17:54:570.09 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.02 (n/a)290.50 (n/a)190.66 (n/a)174.10 (n/a)119.20 (n/a)64.15 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-12.09%)0.04 (-22.59%)0.04 (-25.78%)0.03 (-22.01%)0.01 (+3.89%)279.40 (+28.22%)215.88 (+31.75%)224.90 (+34.75%)135.90 (+13.82%)53.28 (+45.73%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)217.90 (n/a)163.86 (n/a)166.90 (n/a)119.40 (n/a)36.56 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-3.84%)0.04 (-22.99%)0.04 (-25.89%)0.03 (-34.79%)0.01 (+130.74%)282.20 (+53.29%)218.50 (+34.41%)212.90 (+34.92%)156.20 (+3.99%)47.46 (+261.20%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.00 (n/a)184.10 (n/a)162.56 (n/a)157.80 (n/a)150.20 (n/a)13.14 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-16.54%)0.05 (-16.35%)0.04 (-21.08%)0.04 (-8.78%)0.01 (-15.71%)232.80 (+9.60%)186.54 (+19.13%)194.30 (+26.74%)144.60 (+19.80%)35.87 (+5.55%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)212.40 (n/a)156.58 (n/a)153.30 (n/a)120.70 (n/a)33.98 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (-19.71%)0.05 (-9.80%)0.05 (-9.24%)0.04 (-13.62%)0.01 (-27.33%)220.50 (+15.75%)184.66 (+10.32%)191.60 (+10.18%)154.60 (+24.58%)27.03 (+6.85%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)190.50 (n/a)167.38 (n/a)173.90 (n/a)124.10 (n/a)25.30 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (+31.38%)0.05 (+3.27%)0.04 (-7.31%)0.03 (-20.65%)0.01 (+411.21%)246.50 (+26.02%)187.38 (+1.24%)194.20 (+7.83%)135.10 (-23.89%)44.53 (+383.31%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.04 (n/a)0.05 (n/a)0.04 (n/a)0.00 (n/a)195.60 (n/a)185.08 (n/a)180.10 (n/a)177.50 (n/a)9.21 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (-16.88%)0.04 (-7.71%)0.04 (-10.06%)0.03 (-9.18%)0.01 (-32.15%)320.70 (+10.09%)237.86 (+6.18%)233.20 (+11.15%)185.40 (+20.31%)50.82 (-9.86%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)291.30 (n/a)224.02 (n/a)209.80 (n/a)154.10 (n/a)56.38 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (-18.35%)0.04 (-20.74%)0.04 (-26.38%)0.04 (-22.10%)0.01 (+7.03%)225.80 (+28.37%)205.00 (+27.01%)223.20 (+35.85%)168.70 (+22.42%)27.26 (+70.36%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)175.90 (n/a)161.40 (n/a)164.30 (n/a)137.80 (n/a)16.00 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.06 (+1.95%)0.04 (-13.57%)0.04 (-10.34%)0.02 (-37.71%)0.01 (+65.32%)354.00 (+60.54%)226.04 (+22.94%)205.30 (+11.52%)152.50 (-1.93%)77.41 (+176.19%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)220.50 (n/a)183.86 (n/a)184.10 (n/a)155.50 (n/a)28.03 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.05 (+4.88%)0.04 (-7.65%)0.03 (-12.16%)0.02 (-19.32%)0.01 (+36.88%)358.40 (+23.97%)243.46 (+12.60%)245.00 (+13.85%)153.20 (-4.67%)74.77 (+59.70%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)289.10 (n/a)216.22 (n/a)215.20 (n/a)160.70 (n/a)46.82 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.74 (+6.65%)0.59 (+6.68%)0.53 (+2.00%)0.51 (+15.88%)0.10 (-9.79%)193.10 (-13.68%)170.02 (-7.27%)185.70 (-1.95%)132.00 (-6.25%)26.95 (-25.67%)
4d4b803 — 2026-06-22 17:54:570.70 (n/a)0.55 (n/a)0.52 (n/a)0.44 (n/a)0.11 (n/a)223.70 (n/a)183.34 (n/a)189.40 (n/a)140.80 (n/a)36.26 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.73 (+14.17%)0.57 (+4.90%)0.53 (-6.06%)0.41 (-6.80%)0.13 (+59.88%)237.20 (+7.28%)180.84 (-2.46%)186.50 (+6.45%)134.60 (-12.43%)40.71 (+46.00%)
4d4b803 — 2026-06-22 17:54:570.64 (n/a)0.54 (n/a)0.56 (n/a)0.44 (n/a)0.08 (n/a)221.10 (n/a)185.40 (n/a)175.20 (n/a)153.70 (n/a)27.88 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.82 (+32.62%)0.58 (+11.82%)0.57 (+13.39%)0.40 (-9.29%)0.15 (+126.80%)244.90 (+10.22%)178.54 (-6.87%)172.20 (-11.83%)120.50 (-24.59%)45.67 (+88.33%)
4d4b803 — 2026-06-22 17:54:570.62 (n/a)0.52 (n/a)0.50 (n/a)0.44 (n/a)0.07 (n/a)222.20 (n/a)191.72 (n/a)195.30 (n/a)159.80 (n/a)24.25 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.61 (+5.58%)0.47 (-3.90%)0.47 (+0.81%)0.36 (-16.45%)0.10 (+61.02%)276.80 (+19.72%)216.10 (+6.36%)209.30 (-0.81%)161.90 (-5.27%)43.86 (+84.33%)
4d4b803 — 2026-06-22 17:54:570.58 (n/a)0.49 (n/a)0.47 (n/a)0.43 (n/a)0.06 (n/a)231.20 (n/a)203.18 (n/a)211.00 (n/a)170.90 (n/a)23.79 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.56 (-7.87%)0.47 (-7.87%)0.48 (-4.00%)0.37 (-0.39%)0.08 (-19.86%)198.50 (+0.35%)161.48 (+7.58%)154.70 (+4.18%)131.00 (+8.53%)27.42 (-11.30%)
4d4b803 — 2026-06-22 17:54:570.61 (n/a)0.51 (n/a)0.50 (n/a)0.37 (n/a)0.10 (n/a)197.80 (n/a)150.10 (n/a)148.50 (n/a)120.70 (n/a)30.91 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.58 (+0.16%)0.48 (+0.73%)0.44 (+1.58%)0.40 (+4.49%)0.08 (-9.78%)183.00 (-4.29%)157.62 (-1.24%)166.10 (-1.54%)128.20 (-0.16%)24.62 (-12.20%)
4d4b803 — 2026-06-22 17:54:570.57 (n/a)0.47 (n/a)0.44 (n/a)0.39 (n/a)0.09 (n/a)191.20 (n/a)159.60 (n/a)168.70 (n/a)128.40 (n/a)28.04 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.54 (+3.34%)0.40 (-7.82%)0.37 (-12.27%)0.33 (-8.18%)0.08 (+17.58%)222.40 (+8.91%)190.00 (+9.38%)200.20 (+13.94%)136.70 (-3.26%)33.22 (+20.55%)
4d4b803 — 2026-06-22 17:54:570.52 (n/a)0.43 (n/a)0.42 (n/a)0.36 (n/a)0.07 (n/a)204.20 (n/a)173.70 (n/a)175.70 (n/a)141.30 (n/a)27.56 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.46 (+2.18%)0.37 (-4.27%)0.35 (+0.04%)0.32 (-5.37%)0.05 (-4.08%)231.80 (+5.65%)202.40 (+4.33%)208.00 (-0.05%)159.90 (-2.14%)26.32 (-2.57%)
4d4b803 — 2026-06-22 17:54:570.45 (n/a)0.39 (n/a)0.35 (n/a)0.34 (n/a)0.06 (n/a)219.40 (n/a)194.00 (n/a)208.10 (n/a)163.40 (n/a)27.02 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:311.21 (+29.15%)0.86 (+5.20%)0.76 (-8.85%)0.57 (-16.79%)0.27 (+143.16%)231.10 (+20.18%)165.58 (+1.22%)173.00 (+9.70%)108.20 (-22.55%)50.29 (+119.85%)
4d4b803 — 2026-06-22 17:54:570.94 (n/a)0.81 (n/a)0.83 (n/a)0.68 (n/a)0.11 (n/a)192.30 (n/a)163.58 (n/a)157.70 (n/a)139.70 (n/a)22.87 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.91 (-18.74%)0.72 (-7.45%)0.70 (+4.58%)0.54 (-13.98%)0.14 (-31.09%)241.20 (+16.24%)187.18 (+6.62%)186.10 (-4.37%)144.40 (+23.10%)36.62 (+0.35%)
4d4b803 — 2026-06-22 17:54:571.12 (n/a)0.78 (n/a)0.67 (n/a)0.63 (n/a)0.20 (n/a)207.50 (n/a)175.56 (n/a)194.60 (n/a)117.30 (n/a)36.50 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.94 (-18.61%)0.72 (-15.07%)0.63 (-24.99%)0.57 (-5.22%)0.16 (-24.98%)229.60 (+5.51%)189.76 (+16.12%)207.10 (+33.27%)138.70 (+22.85%)39.96 (-3.60%)
4d4b803 — 2026-06-22 17:54:571.16 (n/a)0.85 (n/a)0.84 (n/a)0.60 (n/a)0.22 (n/a)217.60 (n/a)163.42 (n/a)155.40 (n/a)112.90 (n/a)41.45 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.00 (+9.09%)0.00 (+21.74%)0.00 (+22.22%)0.00 (+25.00%)0.00 (-23.62%)3935.40 (-23.85%)3713.77 (-16.20%)3774.17 (-16.00%)3500.13 (-2.14%)202.33 (-64.71%)
4d4b803 — 2026-06-22 17:54:570.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)5168.09 (n/a)4431.69 (n/a)4492.92 (n/a)3576.64 (n/a)573.28 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.00 (+4.55%)0.00 (+10.00%)0.00 (+10.00%)0.00 (+11.11%)0.00 (-22.54%)4016.44 (-11.77%)3746.93 (-9.72%)3737.35 (-9.35%)3559.87 (-6.49%)178.50 (-42.68%)
4d4b803 — 2026-06-22 17:54:570.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4552.05 (n/a)4150.54 (n/a)4122.63 (n/a)3806.88 (n/a)311.43 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:310.25 (-9.40%)0.21 (-13.24%)0.24 (-2.23%)0.15 (-18.71%)0.05 (+54.68%)13600.77 (+23.02%)10526.80 (+19.06%)8686.67 (+2.26%)8430.52 (+10.40%)2652.74 (+104.37%)
4d4b803 — 2026-06-22 17:54:570.27 (n/a)0.24 (n/a)0.25 (n/a)0.19 (n/a)0.03 (n/a)11055.95 (n/a)8841.86 (n/a)8494.64 (n/a)7636.48 (n/a)1298.01 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:314.39 (+42.06%)3.46 (+28.45%)3.58 (+25.07%)2.71 (+31.41%)0.67 (+68.36%)193.20 (-23.91%)156.18 (-21.38%)146.60 (-20.07%)119.40 (-29.60%)29.97 (-10.13%)
4d4b803 — 2026-06-22 17:54:573.09 (n/a)2.69 (n/a)2.86 (n/a)2.07 (n/a)0.40 (n/a)253.90 (n/a)198.64 (n/a)183.40 (n/a)169.60 (n/a)33.34 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 16:07:313.19 (-20.99%)2.87 (-0.54%)2.91 (+6.74%)2.58 (+29.24%)0.23 (-75.07%)203.20 (-22.62%)183.50 (-7.03%)180.50 (-6.28%)164.40 (+26.56%)14.81 (-76.17%)
4d4b803 — 2026-06-22 17:54:574.04 (n/a)2.89 (n/a)2.72 (n/a)2.00 (n/a)0.93 (n/a)262.60 (n/a)197.38 (n/a)192.60 (n/a)129.90 (n/a)62.14 (n/a)
Krackan - Examples

IRON

Tested on 2026_06_23_15_37_36 at commit 9d092aa.

iron/applications/llama_3.2_1b
TestChecksTTFT (mean)TPS (mean)
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]✅ 5/52.12n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]✅ 5/52.164.17
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]✅ 5/52.08n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]✅ 5/52.084.17

Trends:

IRON Trends

iron/applications/llama_3.2_1b

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
9d092aa — 2026-06-23 15:31:502.14 (-0.33%)2.12 (+0.08%)2.13 (+0.05%)2.10 (-0.19%)0.02 (-15.02%)
4d4b803 — 2026-06-22 18:03:472.15 (n/a)2.12 (n/a)2.13 (n/a)2.10 (n/a)0.02 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
9d092aa — 2026-06-23 15:31:504.18 (-0.19%)4.17 (+0.21%)4.18 (+0.19%)4.16 (+0.43%)0.01 (-64.19%)2.29 (+1.11%)2.16 (-0.05%)2.13 (+0.00%)2.11 (-0.52%)0.07 (+24.30%)
4d4b803 — 2026-06-22 18:03:474.19 (n/a)4.17 (n/a)4.17 (n/a)4.14 (n/a)0.02 (n/a)2.26 (n/a)2.16 (n/a)2.13 (n/a)2.12 (n/a)0.06 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
9d092aa — 2026-06-23 15:31:502.11 (+0.38%)2.08 (+0.15%)2.08 (+0.14%)2.07 (+0.24%)0.02 (+3.55%)
4d4b803 — 2026-06-22 18:03:472.10 (n/a)2.08 (n/a)2.08 (n/a)2.06 (n/a)0.02 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
9d092aa — 2026-06-23 15:31:504.18 (+0.10%)4.17 (+0.24%)4.17 (+0.29%)4.17 (+0.46%)0.01 (-44.66%)2.09 (-0.33%)2.08 (-0.29%)2.08 (-0.62%)2.08 (+0.53%)0.01 (-44.08%)
4d4b803 — 2026-06-22 18:03:474.18 (n/a)4.16 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.10 (n/a)2.09 (n/a)2.10 (n/a)2.06 (n/a)0.01 (n/a)
Phoenix - Small

IRON

Tested on 2026_06_23_15_41_18 at commit 9d092aa.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5352.040.04n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5376.800.03n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5887.560.02n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5394.940.02n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5383.820.01n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5298.120.02n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5342.560.02n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5337.140.02n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5496.400.01n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5345.960.04n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5410.540.03n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5426.160.03n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5347.060.04n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5470.220.03n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5293.720.04n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5336.420.03n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5810.660.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5377.940.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5504.460.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5403.680.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5550.220.02n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5891.640.3514.92
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5538.240.4418.96
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/582419.900.31208.48
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/524955.141.01688.53
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/53222.122.69705.65
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/55832.760.2211.95
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.090.09
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a3.573.57
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a6.636.63
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a9.399.38
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a3.493.48
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a6.666.66
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a11.2111.21
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5306.020.03n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5305.200.03n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5287.800.03n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5395.920.02n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5376.000.02n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5383.040.02n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5303.260.03n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5353.460.03n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5351.540.03n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5414.480.02n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5449.660.02n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5443.040.02n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5420.180.02n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5397.380.03n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5309.940.03n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5474.560.02n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5470.240.02n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5389.840.02n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5363.980.02n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5352.880.04n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5597.400.02n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5415.400.03n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5436.200.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5414.120.03n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5613.700.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5497.980.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5436.520.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5387.700.03n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5451.440.02n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5418.400.25n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5465.800.23n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5533.620.22n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5486.920.16n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5466.600.17n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5400.120.20n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5272.400.03n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5250.460.03n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5364.080.03n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5384.420.02n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5373.220.03n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5425.620.02n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5493.660.02n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5421.520.02n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5358.680.02n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5419.440.34n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5551.360.25n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5527.980.26n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/511142.470.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/515533.160.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/522250.990.10n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5600.020.03n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5290.720.03n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5367.780.02n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5427.460.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5397.260.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5431.700.02n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]✅ 5/51192.740.71n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]✅ 5/5644.581.06n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.05 (-0.26%)0.04 (+1.94%)0.04 (-10.88%)0.02 (-15.15%)0.01 (-16.19%)618.60 (+17.85%)352.04 (-3.52%)298.40 (+12.22%)247.50 (+0.28%)150.69 (+3.42%)
4d4b803 — 2026-06-22 18:12:430.05 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.01 (n/a)524.90 (n/a)364.88 (n/a)265.90 (n/a)246.80 (n/a)145.71 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (-31.50%)0.03 (-3.88%)0.04 (+40.43%)0.02 (-10.02%)0.01 (-51.12%)588.10 (+11.13%)376.80 (-4.88%)340.30 (-28.79%)289.50 (+45.99%)121.06 (-19.74%)
4d4b803 — 2026-06-22 18:12:430.06 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)529.20 (n/a)396.12 (n/a)477.90 (n/a)198.30 (n/a)150.84 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (-29.60%)0.02 (-39.84%)0.02 (-20.76%)0.01 (-65.74%)0.01 (-20.09%)1974.50 (+191.87%)887.56 (+104.89%)584.50 (+26.21%)320.30 (+42.04%)657.71 (+258.08%)
4d4b803 — 2026-06-22 18:12:430.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)676.50 (n/a)433.18 (n/a)463.10 (n/a)225.50 (n/a)183.68 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.02 (+10.80%)0.02 (-6.82%)0.02 (+8.27%)0.01 (-35.99%)0.01 (+113.05%)688.50 (+56.19%)394.94 (+24.39%)265.40 (-7.65%)238.50 (-9.73%)200.80 (+183.02%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)440.80 (n/a)317.50 (n/a)287.40 (n/a)264.20 (n/a)70.95 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.02 (+7.97%)0.01 (-10.91%)0.02 (-17.91%)0.01 (-13.06%)0.00 (+23.87%)514.30 (+15.00%)383.82 (+16.32%)347.30 (+21.82%)242.10 (-7.38%)122.71 (+43.74%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)447.20 (n/a)329.98 (n/a)285.10 (n/a)261.40 (n/a)85.37 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (+42.11%)0.02 (+41.69%)0.02 (+70.62%)0.01 (+15.16%)0.01 (+26.43%)502.20 (-13.16%)298.12 (-29.53%)257.20 (-41.39%)184.20 (-29.64%)126.15 (-19.34%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)578.30 (n/a)423.02 (n/a)438.80 (n/a)261.80 (n/a)156.39 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.02 (+43.91%)0.02 (+65.44%)0.02 (+66.88%)0.01 (+280.57%)0.01 (+2.19%)540.60 (-73.72%)342.56 (-55.91%)283.00 (-40.08%)230.80 (-30.50%)123.02 (-82.96%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)0.00 (n/a)2057.20 (n/a)777.02 (n/a)472.30 (n/a)332.10 (n/a)721.88 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (+32.05%)0.02 (+1.62%)0.02 (+3.63%)0.01 (+11.23%)0.01 (+37.45%)568.40 (-10.11%)337.14 (+1.11%)262.40 (-3.49%)184.90 (-24.25%)155.51 (-7.23%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)632.30 (n/a)333.44 (n/a)271.90 (n/a)244.10 (n/a)167.63 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.01 (-24.71%)0.01 (-24.78%)0.01 (-30.61%)0.01 (+1.38%)0.00 (-53.03%)593.70 (-1.36%)496.40 (+27.31%)515.00 (+44.10%)406.00 (+32.81%)72.86 (-40.35%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)601.90 (n/a)389.90 (n/a)357.40 (n/a)305.70 (n/a)122.15 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.05 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.01 (n/a)542.00 (n/a)345.96 (n/a)259.10 (n/a)246.20 (n/a)135.84 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)507.80 (n/a)410.54 (n/a)379.80 (n/a)301.50 (n/a)88.59 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)658.70 (n/a)426.16 (n/a)359.50 (n/a)290.00 (n/a)161.17 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)487.90 (n/a)347.06 (n/a)284.20 (n/a)265.20 (n/a)102.45 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.05 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.02 (n/a)829.20 (n/a)470.22 (n/a)380.50 (n/a)225.40 (n/a)238.23 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.06 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)453.10 (n/a)293.72 (n/a)265.90 (n/a)221.90 (n/a)91.34 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)602.30 (n/a)336.42 (n/a)285.20 (n/a)186.40 (n/a)166.83 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2483.40 (n/a)810.66 (n/a)421.20 (n/a)313.10 (n/a)936.41 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)489.60 (n/a)377.94 (n/a)443.00 (n/a)203.70 (n/a)127.89 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)602.40 (n/a)504.46 (n/a)549.00 (n/a)313.80 (n/a)116.08 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)486.80 (n/a)403.68 (n/a)424.70 (n/a)293.80 (n/a)75.30 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)732.20 (n/a)550.22 (n/a)578.10 (n/a)375.40 (n/a)132.68 (n/a)
iron/operators/gemm

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:240.60 (-9.46%)0.35 (-24.36%)0.34 (-24.79%)0.12 (-63.17%)0.20 (+62.61%)1820.10 (+171.54%)891.64 (+76.81%)652.20 (+32.97%)369.70 (+10.46%)607.09 (+392.11%)25.53 (-9.46%)14.92 (-24.36%)14.47 (-24.79%)5.19 (-63.17%)8.61 (+62.61%)
4d4b803 — 2026-06-22 18:12:430.66 (n/a)0.46 (n/a)0.45 (n/a)0.33 (n/a)0.12 (n/a)670.30 (n/a)504.30 (n/a)490.50 (n/a)334.70 (n/a)123.36 (n/a)28.19 (n/a)19.72 (n/a)19.24 (n/a)14.08 (n/a)5.29 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:240.69 (+70.31%)0.44 (+23.76%)0.35 (-0.84%)0.34 (+5.50%)0.15 (+387.85%)658.90 (-5.21%)538.24 (-13.15%)626.60 (+0.85%)321.20 (-41.28%)149.64 (+179.11%)29.38 (+70.31%)18.96 (+23.76%)15.06 (-0.84%)14.32 (+5.50%)6.51 (+387.85%)
4d4b803 — 2026-06-22 18:12:430.40 (n/a)0.36 (n/a)0.36 (n/a)0.32 (n/a)0.03 (n/a)695.10 (n/a)619.70 (n/a)621.30 (n/a)547.00 (n/a)53.61 (n/a)17.25 (n/a)15.32 (n/a)15.19 (n/a)13.58 (n/a)1.33 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:240.31 (-0.71%)0.31 (-0.49%)0.31 (-0.19%)0.30 (-0.19%)0.00 (-24.04%)83764.40 (+0.19%)82419.90 (+0.48%)82058.40 (+0.19%)81026.90 (+0.71%)1156.25 (-23.28%)212.03 (-0.71%)208.48 (-0.49%)209.36 (-0.19%)205.10 (-0.19%)2.92 (-24.04%)
4d4b803 — 2026-06-22 18:12:430.31 (n/a)0.31 (n/a)0.31 (n/a)0.30 (n/a)0.01 (n/a)83603.40 (n/a)82023.64 (n/a)81903.40 (n/a)80454.70 (n/a)1507.16 (n/a)213.53 (n/a)209.51 (n/a)209.76 (n/a)205.49 (n/a)3.85 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:241.02 (-0.77%)1.01 (-1.13%)1.01 (-1.04%)0.99 (-1.68%)0.01 (+39.56%)25485.40 (+1.71%)24955.14 (+1.15%)24884.40 (+1.05%)24627.90 (+0.78%)329.81 (+42.95%)697.58 (-0.77%)688.53 (-1.13%)690.39 (-1.04%)674.11 (-1.68%)9.01 (+39.56%)
4d4b803 — 2026-06-22 18:12:431.03 (n/a)1.02 (n/a)1.02 (n/a)1.00 (n/a)0.01 (n/a)25057.80 (n/a)24670.26 (n/a)24626.20 (n/a)24437.30 (n/a)230.71 (n/a)703.02 (n/a)696.43 (n/a)697.63 (n/a)685.61 (n/a)6.46 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:243.84 (+10.29%)2.69 (+31.96%)2.18 (+29.50%)2.03 (+52.35%)0.85 (+0.57%)3979.00 (-34.36%)3222.12 (-26.66%)3704.90 (-22.78%)2098.80 (-9.33%)900.63 (-34.87%)1007.22 (+10.29%)705.65 (+31.96%)570.57 (+29.50%)531.28 (+52.35%)222.75 (+0.57%)
4d4b803 — 2026-06-22 18:12:433.48 (n/a)2.04 (n/a)1.68 (n/a)1.33 (n/a)0.84 (n/a)6061.80 (n/a)4393.58 (n/a)4798.00 (n/a)2314.70 (n/a)1382.73 (n/a)913.28 (n/a)534.74 (n/a)440.59 (n/a)348.73 (n/a)221.49 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:240.29 (-8.56%)0.22 (-3.41%)0.20 (-6.43%)0.17 (-4.47%)0.05 (-12.10%)7318.20 (+4.68%)5832.76 (+3.07%)6272.50 (+6.87%)4221.90 (+9.36%)1222.38 (+1.26%)15.90 (-8.56%)11.95 (-3.41%)10.70 (-6.43%)9.17 (-4.47%)2.69 (-12.10%)
4d4b803 — 2026-06-22 18:12:430.32 (n/a)0.23 (n/a)0.21 (n/a)0.18 (n/a)0.06 (n/a)6990.80 (n/a)5658.88 (n/a)5869.50 (n/a)3860.40 (n/a)1207.19 (n/a)17.38 (n/a)12.37 (n/a)11.43 (n/a)9.60 (n/a)3.06 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:240.11 (-16.33%)0.09 (+0.92%)0.11 (+24.10%)0.06 (-3.60%)0.03 (-1.93%)0.11 (-16.33%)0.09 (+0.92%)0.11 (+24.10%)0.05 (-3.60%)0.03 (-1.93%)
4d4b803 — 2026-06-22 18:12:430.13 (n/a)0.09 (n/a)0.09 (n/a)0.06 (n/a)0.03 (n/a)0.13 (n/a)0.09 (n/a)0.09 (n/a)0.06 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:243.95 (+1.15%)3.57 (-1.03%)3.47 (-7.46%)3.17 (-4.35%)0.33 (+22.37%)3.95 (+1.15%)3.57 (-1.03%)3.46 (-7.46%)3.17 (-4.35%)0.33 (+22.37%)
4d4b803 — 2026-06-22 18:12:433.91 (n/a)3.61 (n/a)3.75 (n/a)3.31 (n/a)0.27 (n/a)3.91 (n/a)3.61 (n/a)3.74 (n/a)3.31 (n/a)0.27 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:247.56 (+3.96%)6.63 (+4.61%)6.53 (+3.60%)5.97 (+4.85%)0.65 (+11.87%)7.55 (+3.96%)6.63 (+4.61%)6.53 (+3.60%)5.97 (+4.85%)0.65 (+11.87%)
4d4b803 — 2026-06-22 18:12:437.27 (n/a)6.34 (n/a)6.31 (n/a)5.70 (n/a)0.58 (n/a)7.27 (n/a)6.34 (n/a)6.30 (n/a)5.70 (n/a)0.58 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:2411.43 (+19.93%)9.39 (+11.27%)9.71 (+13.51%)7.93 (+9.25%)1.42 (+44.59%)11.43 (+19.93%)9.38 (+11.27%)9.71 (+13.51%)7.93 (+9.25%)1.42 (+44.59%)
4d4b803 — 2026-06-22 18:12:439.53 (n/a)8.44 (n/a)8.56 (n/a)7.26 (n/a)0.98 (n/a)9.53 (n/a)8.43 (n/a)8.55 (n/a)7.25 (n/a)0.98 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:243.79 (-3.44%)3.49 (-7.34%)3.48 (-7.90%)3.03 (-16.83%)0.31 (+190.96%)3.78 (-3.44%)3.48 (-7.34%)3.47 (-7.90%)3.03 (-16.83%)0.31 (+190.96%)
4d4b803 — 2026-06-22 18:12:433.92 (n/a)3.76 (n/a)3.77 (n/a)3.64 (n/a)0.11 (n/a)3.92 (n/a)3.76 (n/a)3.77 (n/a)3.64 (n/a)0.11 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:247.45 (+7.73%)6.66 (+6.66%)6.78 (+11.65%)5.56 (-1.80%)0.71 (+17.57%)7.45 (+7.73%)6.66 (+6.66%)6.78 (+11.65%)5.56 (-1.80%)0.71 (+17.57%)
4d4b803 — 2026-06-22 18:12:436.92 (n/a)6.25 (n/a)6.07 (n/a)5.66 (n/a)0.60 (n/a)6.91 (n/a)6.24 (n/a)6.07 (n/a)5.66 (n/a)0.60 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
9d092aa — 2026-06-23 15:38:2413.56 (-3.91%)11.21 (+3.03%)13.00 (+11.97%)7.64 (+3.49%)2.88 (-13.17%)13.56 (-3.91%)11.21 (+3.03%)12.99 (+11.97%)7.64 (+3.49%)2.88 (-13.17%)
4d4b803 — 2026-06-22 18:12:4314.12 (n/a)10.88 (n/a)11.61 (n/a)7.39 (n/a)3.32 (n/a)14.11 (n/a)10.88 (n/a)11.60 (n/a)7.38 (n/a)3.32 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)437.80 (n/a)306.02 (n/a)267.60 (n/a)239.20 (n/a)82.03 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)455.00 (n/a)305.20 (n/a)274.50 (n/a)238.10 (n/a)89.00 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)426.90 (n/a)287.80 (n/a)242.90 (n/a)227.10 (n/a)83.56 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)592.40 (n/a)395.92 (n/a)350.80 (n/a)262.90 (n/a)139.75 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)589.60 (n/a)376.00 (n/a)319.20 (n/a)229.30 (n/a)139.82 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)482.20 (n/a)383.04 (n/a)452.20 (n/a)247.70 (n/a)113.35 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (+12.80%)0.03 (+54.66%)0.03 (+121.46%)0.02 (+24.43%)0.01 (+6.32%)475.70 (-19.63%)303.26 (-36.59%)241.60 (-54.85%)228.60 (-11.36%)105.32 (-25.84%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)591.90 (n/a)478.24 (n/a)535.10 (n/a)257.90 (n/a)142.02 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (+22.51%)0.03 (+34.00%)0.03 (+88.53%)0.01 (-11.66%)0.01 (+83.60%)591.60 (+13.20%)353.46 (-16.95%)247.30 (-46.97%)222.10 (-18.38%)168.97 (+67.50%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)522.60 (n/a)425.62 (n/a)466.30 (n/a)272.10 (n/a)100.88 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (+25.28%)0.03 (+18.23%)0.03 (+42.28%)0.01 (-15.28%)0.01 (+31.42%)595.10 (+18.03%)351.54 (-11.84%)291.70 (-29.71%)227.70 (-20.19%)143.95 (+32.51%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)504.20 (n/a)398.76 (n/a)415.00 (n/a)285.30 (n/a)108.64 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (+1.10%)0.02 (+21.75%)0.02 (+28.40%)0.01 (+15.92%)0.01 (-5.77%)598.70 (-13.73%)414.48 (-19.36%)407.40 (-22.12%)267.20 (-1.07%)133.10 (-14.59%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)694.00 (n/a)514.00 (n/a)523.10 (n/a)270.10 (n/a)155.85 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (-13.33%)0.02 (-31.55%)0.02 (-47.06%)0.01 (+0.67%)0.01 (-21.83%)554.70 (-0.68%)449.66 (+40.82%)477.50 (+88.88%)263.90 (+15.39%)110.95 (-18.95%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)558.50 (n/a)319.32 (n/a)252.80 (n/a)228.70 (n/a)136.88 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.02 (+13.60%)0.02 (+29.62%)0.02 (+25.99%)0.02 (+89.92%)0.00 (-28.67%)522.70 (-47.35%)443.04 (-28.85%)455.00 (-20.63%)335.60 (-11.96%)79.34 (-66.83%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)992.80 (n/a)622.66 (n/a)573.30 (n/a)381.20 (n/a)239.16 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (-11.35%)0.02 (+26.36%)0.03 (+75.74%)0.02 (+93.18%)0.01 (-41.14%)479.60 (-48.24%)363.98 (-32.13%)305.80 (-43.10%)284.00 (+12.83%)93.22 (-64.56%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)926.50 (n/a)536.26 (n/a)537.40 (n/a)251.70 (n/a)263.01 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (-15.66%)0.04 (+8.55%)0.04 (+19.46%)0.02 (+57.41%)0.01 (-42.40%)514.00 (-36.46%)352.88 (-19.99%)301.80 (-16.28%)276.40 (+18.58%)99.93 (-57.16%)
4d4b803 — 2026-06-22 18:12:430.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)809.00 (n/a)441.06 (n/a)360.50 (n/a)233.10 (n/a)233.25 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (+8.00%)0.02 (+7.06%)0.02 (-5.15%)0.01 (-43.03%)0.01 (+46.49%)1358.20 (+75.52%)597.40 (+19.58%)524.20 (+5.43%)236.70 (-7.39%)451.62 (+143.84%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)773.80 (n/a)499.60 (n/a)497.20 (n/a)255.60 (n/a)185.21 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.05 (+23.96%)0.03 (+37.27%)0.02 (+12.14%)0.02 (+214.46%)0.01 (+15.77%)588.80 (-68.20%)415.40 (-42.52%)426.60 (-10.83%)211.60 (-19.33%)176.72 (-72.42%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)1851.60 (n/a)722.74 (n/a)478.40 (n/a)262.30 (n/a)640.71 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (+21.93%)0.02 (+1.37%)0.02 (-36.43%)0.01 (-0.73%)0.01 (+45.57%)606.70 (+0.73%)436.20 (+5.71%)521.60 (+57.30%)226.00 (-18.00%)182.14 (+16.89%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)602.30 (n/a)412.64 (n/a)331.60 (n/a)275.60 (n/a)155.83 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.05 (+22.79%)0.03 (+4.83%)0.02 (+3.09%)0.02 (-7.09%)0.01 (+44.91%)598.60 (+7.62%)414.12 (+0.71%)430.00 (-2.98%)219.80 (-18.56%)150.16 (+27.95%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)556.20 (n/a)411.20 (n/a)443.20 (n/a)269.90 (n/a)117.35 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (-18.60%)0.02 (-24.68%)0.02 (-43.38%)0.01 (-54.39%)0.01 (+13.01%)1367.90 (+119.25%)613.70 (+62.59%)539.80 (+76.58%)281.60 (+22.81%)444.23 (+182.17%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)623.90 (n/a)377.46 (n/a)305.70 (n/a)229.30 (n/a)157.43 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (+52.24%)0.02 (+20.70%)0.02 (+7.76%)0.02 (+6.82%)0.00 (+296.73%)603.90 (-6.39%)497.98 (-14.00%)539.20 (-7.19%)346.00 (-34.31%)105.63 (+141.59%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)645.10 (n/a)579.08 (n/a)581.00 (n/a)526.70 (n/a)43.73 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.03 (-22.86%)0.02 (+0.52%)0.02 (+5.93%)0.01 (+310.91%)0.01 (-56.17%)577.30 (-75.66%)436.52 (-44.73%)472.20 (-5.60%)286.20 (+29.62%)110.93 (-87.61%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2372.10 (n/a)789.76 (n/a)500.20 (n/a)220.80 (n/a)895.19 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.04 (-1.31%)0.03 (-20.21%)0.02 (-38.34%)0.02 (-15.26%)0.01 (+40.84%)508.20 (+17.99%)387.70 (+32.27%)431.60 (+62.19%)241.60 (+1.34%)126.29 (+61.15%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)430.70 (n/a)293.12 (n/a)266.10 (n/a)238.40 (n/a)78.37 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.02 (-31.55%)0.02 (-0.75%)0.02 (+8.93%)0.01 (+16.19%)0.00 (-56.70%)560.50 (-13.94%)451.44 (-8.22%)456.50 (-8.19%)344.90 (+46.14%)91.68 (-42.44%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)651.30 (n/a)491.86 (n/a)497.20 (n/a)236.00 (n/a)159.27 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.32 (-25.34%)0.25 (-2.89%)0.25 (+14.01%)0.17 (+32.43%)0.06 (-49.57%)586.50 (-24.49%)418.40 (-9.98%)385.60 (-12.28%)310.20 (+33.94%)113.69 (-48.51%)
4d4b803 — 2026-06-22 18:12:430.42 (n/a)0.26 (n/a)0.22 (n/a)0.13 (n/a)0.12 (n/a)776.70 (n/a)464.78 (n/a)439.60 (n/a)231.60 (n/a)220.81 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.34 (+19.45%)0.23 (+28.14%)0.20 (+0.22%)0.16 (+211.30%)0.08 (-10.79%)625.60 (-67.88%)465.80 (-40.32%)501.80 (-0.22%)287.00 (-16.28%)138.43 (-79.03%)
4d4b803 — 2026-06-22 18:12:430.29 (n/a)0.18 (n/a)0.20 (n/a)0.05 (n/a)0.09 (n/a)1947.60 (n/a)780.52 (n/a)502.90 (n/a)342.80 (n/a)660.10 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.40 (+51.26%)0.22 (+14.68%)0.18 (+1.54%)0.13 (-18.29%)0.11 (+142.49%)785.70 (+22.38%)533.62 (-1.84%)561.10 (-1.51%)243.00 (-33.88%)199.61 (+92.74%)
4d4b803 — 2026-06-22 18:12:430.27 (n/a)0.19 (n/a)0.17 (n/a)0.15 (n/a)0.05 (n/a)642.00 (n/a)543.60 (n/a)569.70 (n/a)367.50 (n/a)103.56 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.20 (+4.51%)0.16 (-5.45%)0.15 (-9.43%)0.12 (-18.77%)0.03 (+47.48%)632.30 (+23.09%)486.92 (+7.61%)476.30 (+10.41%)370.70 (-4.31%)94.35 (+71.06%)
4d4b803 — 2026-06-22 18:12:430.19 (n/a)0.16 (n/a)0.17 (n/a)0.14 (n/a)0.02 (n/a)513.70 (n/a)452.50 (n/a)431.40 (n/a)387.40 (n/a)55.16 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.25 (-3.75%)0.17 (+17.64%)0.16 (+19.76%)0.12 (+211.67%)0.06 (-30.36%)624.90 (-67.92%)466.60 (-38.86%)465.40 (-16.51%)299.30 (+3.92%)147.09 (-78.24%)
4d4b803 — 2026-06-22 18:12:430.26 (n/a)0.15 (n/a)0.13 (n/a)0.04 (n/a)0.08 (n/a)1947.80 (n/a)763.12 (n/a)557.40 (n/a)288.00 (n/a)675.98 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.26 (+10.21%)0.20 (+15.00%)0.21 (+47.80%)0.11 (-15.82%)0.06 (+20.91%)669.40 (+18.79%)400.12 (-9.86%)344.80 (-32.34%)280.80 (-9.27%)159.82 (+34.99%)
4d4b803 — 2026-06-22 18:12:430.24 (n/a)0.18 (n/a)0.14 (n/a)0.13 (n/a)0.05 (n/a)563.50 (n/a)443.88 (n/a)509.60 (n/a)309.50 (n/a)118.40 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.48 (+22.11%)0.34 (+41.02%)0.28 (+8.60%)0.24 (+275.69%)0.11 (-25.19%)539.30 (-73.38%)419.44 (-51.33%)461.60 (-7.92%)275.00 (-18.11%)120.49 (-83.18%)
4d4b803 — 2026-06-22 18:12:430.39 (n/a)0.24 (n/a)0.26 (n/a)0.06 (n/a)0.14 (n/a)2025.90 (n/a)861.80 (n/a)501.30 (n/a)335.80 (n/a)716.33 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.32 (-27.23%)0.25 (-15.59%)0.26 (-8.02%)0.19 (-0.47%)0.05 (-47.18%)692.10 (+0.46%)551.36 (+13.03%)513.30 (+8.70%)412.30 (+37.39%)110.32 (-25.76%)
4d4b803 — 2026-06-22 18:12:430.44 (n/a)0.29 (n/a)0.28 (n/a)0.19 (n/a)0.09 (n/a)688.90 (n/a)487.82 (n/a)472.20 (n/a)300.10 (n/a)148.59 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.33 (-34.88%)0.26 (-15.69%)0.25 (-1.81%)0.20 (+23.46%)0.05 (-64.25%)667.40 (-19.00%)527.98 (+2.49%)527.00 (+1.86%)395.30 (+53.51%)103.95 (-54.90%)
4d4b803 — 2026-06-22 18:12:430.51 (n/a)0.30 (n/a)0.25 (n/a)0.16 (n/a)0.15 (n/a)824.00 (n/a)515.14 (n/a)517.40 (n/a)257.50 (n/a)230.46 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.00 (-12.50%)0.00 (+4.55%)0.00 (+66.67%)0.00 (+0.00%)0.00 (-17.38%)19612.94 (+10.30%)11142.47 (-4.97%)7810.44 (-43.30%)5561.04 (+7.43%)6316.42 (+19.64%)
4d4b803 — 2026-06-22 18:12:430.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)17782.11 (n/a)11725.34 (n/a)13775.42 (n/a)5176.26 (n/a)5279.68 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.00 (+0.00%)0.00 (-34.69%)0.00 (-58.33%)0.00 (+0.00%)0.00 (-4.82%)19942.85 (+4.98%)15533.16 (+52.01%)16932.49 (+141.42%)5677.59 (-5.14%)5757.60 (+1.96%)
4d4b803 — 2026-06-22 18:12:430.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)18997.36 (n/a)10218.74 (n/a)7013.58 (n/a)5985.05 (n/a)5646.66 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:240.13 (+4.21%)0.10 (+10.98%)0.09 (+8.54%)0.07 (-7.39%)0.03 (+39.94%)29388.70 (+8.05%)22250.99 (-6.86%)23922.96 (-7.83%)15682.45 (-4.09%)6215.88 (+41.53%)
4d4b803 — 2026-06-22 18:12:430.13 (n/a)0.09 (n/a)0.08 (n/a)0.08 (n/a)0.02 (n/a)27199.35 (n/a)23890.15 (n/a)25953.98 (n/a)16350.74 (n/a)4391.76 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:241.31 (+20.21%)0.71 (-30.79%)0.75 (-27.93%)0.22 (-77.57%)0.47 (+832.00%)2415.40 (+345.81%)1192.74 (+132.94%)696.60 (+38.74%)400.70 (-16.82%)915.76 (+3530.41%)
4d4b803 — 2026-06-22 18:12:431.09 (n/a)1.03 (n/a)1.04 (n/a)0.97 (n/a)0.05 (n/a)541.80 (n/a)512.04 (n/a)502.10 (n/a)481.70 (n/a)25.22 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
9d092aa — 2026-06-23 15:38:241.60 (-23.21%)1.06 (-13.13%)0.95 (-8.78%)0.38 (-25.64%)0.52 (-12.02%)1369.80 (+34.49%)644.58 (+21.66%)552.60 (+9.64%)327.30 (+30.19%)427.18 (+45.16%)
4d4b803 — 2026-06-22 18:12:432.09 (n/a)1.22 (n/a)1.04 (n/a)0.51 (n/a)0.59 (n/a)1018.50 (n/a)529.80 (n/a)504.00 (n/a)251.40 (n/a)294.28 (n/a)
Phoenix - Examples

IRON

Tested on 2026_06_23_15_48_46 at commit 9d092aa.

Trends:

IRON Trends

Comment on lines +11 to +13
pytest.importorskip(
"stream", reason="stream-dse not installed (see requirements_stream.txt)"
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add requirements_stream.txt here so that this test runs in CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants