Skip to content

transpose: add num_batches to batch independent transposes into one dispatch#124

Open
atassis wants to merge 8 commits into
amd:develfrom
atassis:iron-transpose-numbatches
Open

transpose: add num_batches to batch independent transposes into one dispatch#124
atassis wants to merge 8 commits into
amd:develfrom
atassis:iron-transpose-numbatches

Conversation

@atassis

@atassis atassis commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

GEMV and StridedCopy already take num_batches to batch B independent same-shape operations into a single dispatch; Transpose did not, forcing callers to unroll B per-head/per-batch transposes into B separate dispatches for identical kernel work (a common multi-head-attention pattern).

Added

  • num_batches on Transpose (default 1). num_batches>1 lays B contiguous (M,N) matrices back-to-back and streams them through the same ObjectFifos (one task group per batch); the core still only sees s×s sub-tiles, so the kernel is unchanged.
  • num_batches>1 test coverage (the batched path was previously untested), with a batched golden reference.

Changed

  • get_arg_spec prepends a batch dim only when num_batches>1; num_batches=1 is byte-identical to the previous single-transpose schedule.

Removed

  • None.

Verified on device (NPU2): num_batches in {1, 2, 4} pass. Mirrors GEMV's existing num_batches.

@andrej andrej left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! This will be a useful addition. Just a couple nitpicks, then please rebase on devel and we can go ahead and merge this.

Comment thread iron/operators/transpose/design.py Outdated
Comment thread iron/operators/transpose/design.py Outdated
Comment thread iron/operators/transpose/design.py Outdated
Comment thread iron/operators/transpose/op.py Outdated
Comment thread iron/operators/transpose/reference.py Outdated
@andrej

andrej commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Note the CI failures we're seeing should disappear after a rebase. :)

atassis and others added 6 commits June 22, 2026 22:05
…ispatch

GEMV and StridedCopy already take num_batches to batch B independent same-shape
operations into a single dispatch; Transpose did not, forcing callers to unroll
B per-head/per-batch transposes into B separate dispatches for identical kernel
work (a common multi-head-attention pattern).

num_batches>1 lays B contiguous (M,N) matrices back-to-back and streams them
through the same ObjectFifos (one task group per batch); the core still only
sees s*s sub-tiles, so the kernel is unchanged. num_batches=1 (default) is
byte-identical to the previous single-transpose schedule.
Adds num_batches=2 (default suite) and num_batches=4 (extensive) cases to the
transpose test, with a batched golden reference. The operator's batched path was
previously untested. Verified on device (NPU2): num_batches in {1,2,4} pass.
Co-authored-by: André Rösti <androsti@amd.com>
Co-authored-by: André Rösti <androsti@amd.com>
Co-authored-by: André Rösti <androsti@amd.com>
Co-authored-by: André Rösti <androsti@amd.com>
@atassis atassis force-pushed the iron-transpose-numbatches branch from 722c6f9 to 4aa4c91 Compare June 22, 2026 19:06
Drop the diff-relative phrasing ('original'/'unchanged') flagged in review;
the comment now describes the access-pattern layout as-is. Rationale moved to
the PR description.
@atassis

atassis commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Hi there! Thanks for the feedback, have applied it in both PRs.
I have also made a Xilinx/mlir-aie#3178 PR with compile-time fix O(n^2) to O(1) in several places which haven't got a human attention, it seems. I'll be grateful for any help to move that forward. I have a few more generic fixes queued after that, if one proves to be useful =)
BTW, I am trying to build a general purpose multi precision NPU inference engine on IRON for consumer Ryzen AI / XDNA2 laptops (might later make it work with other hardware, too, though, I have none to test against), and this work helped me to find out these fixes.
So, not promising, but might make some more PR's in close future!

@thomthehound

Copy link
Copy Markdown

Hi there! Thanks for the feedback, have applied it in both PRs. I have also made a Xilinx/mlir-aie#3178 PR with compile-time fix O(n^2) to O(1) in several places which haven't got a human attention, it seems. I'll be grateful for any help to move that forward. I have a few more generic fixes queued after that, if one proves to be useful =) BTW, I am trying to build a general purpose multi precision NPU inference engine on IRON for consumer Ryzen AI / XDNA2 laptops (might later make it work with other hardware, too, though, I have none to test against), and this work helped me to find out these fixes. So, not promising, but might make some more PR's in close future!

I think both of these contributions are valuable, so thank you for them!

Your [Xilinx/mlir-aie#3178] PR has had human eyes on it, but, speaking for myself, I would prefer to see the CoPilot review issues there explained or resolved before commenting further.

@atassis

atassis commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@thomthehound good point, resolved them

@atassis atassis requested a review from andrej June 23, 2026 09:19
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

b6ae95b (2026_06_23_16_04_10)

IRON - CI Summary

Examples

iron/applications/llama_3.2_1b
Test Krackan Status Krackan Phoenix Status Phoenix
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40] - - -

Small

iron/operators/axpy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0] 190.76 402.42
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0] 159.06 625.22
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0] 215.58 526.36
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0] 211.90 - -
iron/operators/dequant
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32] 200.26 303.22
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32] 187.60 306.60
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32] 188.20 352.40
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32] 183.26 287.38
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32] 188.96 377.26
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32] 179.78 376.78
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32] 191.84 - -
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32] 203.88 - -
iron/operators/elementwise_add
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048] 147.66 406.34
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024] 135.66 405.96
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512] 154.62 490.66
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256] 188.30 - -
iron/operators/elementwise_mul
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048] 127.14 429.56
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024] 136.94 723.16
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512] 167.92 498.92
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256] 181.98 - -
iron/operators/gelu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 172.30 431.78
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 166.32 466.34
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 162.66 384.24
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 168.50 401.10
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 169.40 779.34
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 216.86 481.78
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 186.70 - -
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 234.54 - -
iron/operators/gemm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1] 2138.24 - -
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1] 264.60 620.36
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1] 309.54 470.86
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 48704.32 84281.54
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 28666.92 24699.94
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1] 7912.54 - -
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1] 2310.78 3552.52
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4] 3731.56 5233.86
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1] 1370.12 - -
iron/operators/gemv
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128] 0.20 0.09
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048] 12.84 3.73
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024] 23.69 6.08
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512] 39.99 7.96
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256] 42.51 - -
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024] 12.97 3.57
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024] 23.89 6.76
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024] 37.06 8.64
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024] 42.60 - -
iron/operators/layer_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 218.16 321.84
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 184.30 369.36
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 196.68 306.14
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 236.96 498.64
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 177.56 384.26
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 181.46 431.64
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 209.44 - -
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 222.66 - -
iron/operators/mem_copy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048] 160.36 449.82
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128] 214.30 - -
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024] 178.26 405.10
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024] 164.40 379.08
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512] 167.56 788.08
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512] 180.16 462.90
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256] 178.96 - -
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256] 154.56 447.14
iron/operators/mha
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0] 40900.74 - -
iron/operators/relu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 146.16 381.04
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 161.00 295.32
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 165.26 325.48
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 205.74 332.98
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 195.68 393.18
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 169.78 541.28
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 176.68 - -
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 199.88 - -
iron/operators/rms_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False] 165.80 386.40
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True] 184.06 338.74
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False] 168.06 450.62
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True] 177.50 464.78
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False] 170.30 406.86
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True] 165.08 405.24
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False] 189.40 532.08
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True] 186.72 741.50
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False] 178.30 842.34
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True] 181.86 516.66
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False] 243.76 530.48
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True] 242.96 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False] 190.20 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True] 226.74 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False] 244.92 - -
iron/operators/rope
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0] 168.46 367.66
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0] 227.52 318.60
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0] 215.84 436.42
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0] 203.46 - -
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0] 178.26 371.64
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0] 190.30 426.66
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0] 171.40 399.58
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0] 216.92 - -
iron/operators/sigmoid
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 177.54 322.76
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 167.08 420.16
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 181.78 389.06
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 201.76 412.30
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 219.50 590.60
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 192.80 465.32
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 209.00 - -
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 228.22 - -
iron/operators/silu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 152.98 334.76
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 156.52 366.20
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 164.40 837.50
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 181.86 - -
iron/operators/softmax
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024] 150.84 420.22
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048] 145.68 391.14
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512] 149.24 366.46
iron/operators/swiglu_decode
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584] 3909.66 18569.88
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048] 3847.70 11372.47
iron/operators/swiglu_prefill
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False] 10388.04 18550.86
iron/operators/tanh
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 146.22 681.58
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 210.46 294.84
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 196.12 308.96
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 156.88 292.72
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 152.36 777.32
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 165.40 297.46
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 164.84 - -
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 207.42 - -
iron/operators/transpose
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1] 196.00 485.92
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2] 231.48 482.90
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1] 168.72 407.26
Krackan - Small

IRON

Tested on 2026_06_23_16_04_10 at commit b6ae95b.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5190.760.07n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5159.060.08n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5215.580.06n/a
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]✅ 5/5211.900.07n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5200.260.03n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5187.600.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5188.200.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5183.260.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5188.960.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5179.780.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]✅ 5/5191.840.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]✅ 5/5203.880.03n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5147.660.08n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5135.660.09n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5154.620.08n/a
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5188.300.07n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5127.140.10n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5136.940.09n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5167.920.08n/a
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5181.980.07n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5172.300.05n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5166.320.05n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5162.660.05n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5168.500.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5169.400.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5216.860.04n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5186.700.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5234.540.04n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]✅ 5/52138.244.431742.07
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5264.600.8837.53
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5309.540.7632.52
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/548704.320.52352.74
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/528666.920.88599.31
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/57912.543.182171.30
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/52310.783.52921.87
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/53731.560.3418.52
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]✅ 5/51370.125.101575.69
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.200.19
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a12.8412.84
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a23.6923.68
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a39.9939.96
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]✅ 5/5n/a42.5142.48
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a12.9712.96
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a23.8923.88
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a37.0637.04
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a42.6042.57
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5218.160.04n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5184.300.04n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5196.680.04n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5236.960.04n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5177.560.05n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5181.460.05n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5209.440.04n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5222.660.04n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5160.360.05n/a
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]✅ 5/5214.300.04n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5178.260.05n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5164.400.05n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5167.560.05n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5180.160.05n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]✅ 5/5178.960.05n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5154.560.05n/a
iron/operators/mha
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]✅ 5/540900.740.21n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5146.160.06n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5161.000.05n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5165.260.05n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5205.740.04n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5195.680.04n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5169.780.05n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5176.680.05n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5199.880.04n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5165.800.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5184.060.07n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5168.060.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5177.500.06n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5170.300.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5165.080.06n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5189.400.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5186.720.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5178.300.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5181.860.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5243.760.04n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]✅ 5/5242.960.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]✅ 5/5190.200.05n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]✅ 5/5226.740.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]✅ 5/5244.920.03n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5168.460.59n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5227.520.45n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5215.840.49n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]✅ 5/5203.460.49n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5178.260.42n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5190.300.40n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5171.400.44n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]✅ 5/5216.920.34n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5177.540.05n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5167.080.05n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5181.780.05n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5201.760.04n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5219.500.04n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5192.800.04n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5209.000.04n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5228.220.04n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5152.980.06n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5156.520.05n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5164.400.05n/a
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5181.860.05n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5150.840.88n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5145.680.91n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5149.240.89n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/53909.660.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/53847.700.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/510388.040.21n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5146.220.06n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5210.460.04n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5196.120.05n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5156.880.06n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5152.360.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5165.400.05n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5164.840.05n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5207.420.04n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]✅ 5/5196.002.75n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]✅ 5/5231.484.64n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]✅ 5/5168.723.19n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.08 (-26.46%)0.07 (-24.25%)0.07 (-23.40%)0.05 (-28.11%)0.01 (-23.74%)237.80 (+39.06%)190.76 (+32.25%)181.60 (+30.55%)161.60 (+36.03%)29.59 (+45.29%)
4d4b803 — 2026-06-22 17:54:570.10 (n/a)0.09 (n/a)0.09 (n/a)0.07 (n/a)0.01 (n/a)171.00 (n/a)144.24 (n/a)139.10 (n/a)118.80 (n/a)20.37 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.10 (+13.51%)0.08 (+4.23%)0.07 (-7.45%)0.07 (+14.78%)0.01 (+23.86%)181.60 (-12.86%)159.06 (-3.72%)169.80 (+8.02%)124.70 (-11.94%)25.84 (-3.93%)
4d4b803 — 2026-06-22 17:54:570.09 (n/a)0.08 (n/a)0.08 (n/a)0.06 (n/a)0.01 (n/a)208.40 (n/a)165.20 (n/a)157.20 (n/a)141.60 (n/a)26.89 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.08 (-24.37%)0.06 (-20.06%)0.06 (-18.80%)0.04 (-27.09%)0.01 (-30.54%)315.70 (+37.14%)215.58 (+24.20%)204.40 (+23.13%)155.40 (+32.26%)60.49 (+28.15%)
4d4b803 — 2026-06-22 17:54:570.10 (n/a)0.08 (n/a)0.07 (n/a)0.05 (n/a)0.02 (n/a)230.20 (n/a)173.58 (n/a)166.00 (n/a)117.50 (n/a)47.20 (n/a)

test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.13 (+54.85%)0.07 (+16.93%)0.06 (+7.53%)0.04 (-8.56%)0.03 (+130.34%)321.20 (+9.36%)211.90 (-5.00%)209.60 (-7.01%)96.80 (-35.42%)79.60 (+53.83%)
4d4b803 — 2026-06-22 17:54:570.08 (n/a)0.06 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)293.70 (n/a)223.06 (n/a)225.40 (n/a)149.90 (n/a)51.74 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.03 (-24.14%)0.03 (-15.62%)0.03 (-2.83%)0.02 (-24.24%)0.01 (-12.17%)258.80 (+32.04%)200.26 (+19.54%)177.70 (+2.95%)165.30 (+31.82%)41.20 (+57.08%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)196.00 (n/a)167.52 (n/a)172.60 (n/a)125.40 (n/a)26.23 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (-15.16%)0.03 (-18.19%)0.03 (-3.01%)0.02 (+1.41%)0.01 (-35.00%)230.50 (-1.41%)187.60 (+17.93%)174.10 (+3.08%)130.30 (+17.92%)42.60 (-16.57%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)233.80 (n/a)159.08 (n/a)168.90 (n/a)110.50 (n/a)51.06 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.03 (-27.42%)0.03 (-20.47%)0.03 (-13.23%)0.02 (-19.48%)0.00 (-49.37%)229.50 (+24.19%)188.20 (+23.73%)187.00 (+15.22%)163.90 (+37.85%)25.44 (-10.66%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)184.80 (n/a)152.10 (n/a)162.30 (n/a)118.90 (n/a)28.47 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (+13.04%)0.03 (-1.72%)0.03 (-1.46%)0.02 (-3.44%)0.01 (+37.96%)235.60 (+3.56%)183.26 (+3.63%)180.00 (+1.47%)122.90 (-11.52%)41.26 (+22.26%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)227.50 (n/a)176.84 (n/a)177.40 (n/a)138.90 (n/a)33.75 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (-17.58%)0.03 (-23.93%)0.03 (-29.58%)0.02 (-23.02%)0.00 (-11.38%)227.10 (+29.92%)188.96 (+31.92%)189.20 (+42.04%)145.20 (+21.30%)29.81 (+35.59%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)174.80 (n/a)143.24 (n/a)133.20 (n/a)119.70 (n/a)21.99 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (+7.14%)0.03 (-2.69%)0.03 (-14.66%)0.02 (-1.85%)0.01 (+26.74%)216.50 (+1.93%)179.78 (+3.85%)186.20 (+17.18%)140.30 (-6.65%)34.07 (+21.56%)
4d4b803 — 2026-06-22 17:54:570.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)212.40 (n/a)173.12 (n/a)158.90 (n/a)150.30 (n/a)28.03 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.03 (-32.54%)0.03 (-7.94%)0.03 (+4.41%)0.02 (+3.29%)0.00 (-78.03%)215.30 (-3.15%)191.84 (+3.76%)187.10 (-4.25%)183.20 (+48.22%)13.26 (-68.63%)
4d4b803 — 2026-06-22 17:54:570.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)222.30 (n/a)184.88 (n/a)195.40 (n/a)123.60 (n/a)42.28 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (+20.46%)0.03 (+5.66%)0.02 (-8.20%)0.02 (+6.02%)0.01 (+91.03%)244.70 (-5.67%)203.88 (-2.53%)225.10 (+8.95%)148.60 (-16.98%)46.21 (+48.77%)
4d4b803 — 2026-06-22 17:54:570.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)259.40 (n/a)209.18 (n/a)206.60 (n/a)179.00 (n/a)31.06 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.09 (n/a)0.08 (n/a)0.09 (n/a)0.07 (n/a)0.01 (n/a)173.00 (n/a)147.66 (n/a)143.50 (n/a)136.40 (n/a)15.00 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.11 (n/a)0.09 (n/a)0.09 (n/a)0.07 (n/a)0.01 (n/a)164.00 (n/a)135.66 (n/a)136.00 (n/a)112.90 (n/a)19.54 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.09 (n/a)0.08 (n/a)0.08 (n/a)0.06 (n/a)0.01 (n/a)191.80 (n/a)154.62 (n/a)146.00 (n/a)130.30 (n/a)24.39 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.08 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.01 (n/a)256.80 (n/a)188.30 (n/a)177.50 (n/a)151.40 (n/a)40.46 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.12 (n/a)0.10 (n/a)0.09 (n/a)0.09 (n/a)0.01 (n/a)138.70 (n/a)127.14 (n/a)132.60 (n/a)102.70 (n/a)14.34 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.11 (n/a)0.09 (n/a)0.09 (n/a)0.07 (n/a)0.01 (n/a)173.60 (n/a)136.94 (n/a)131.00 (n/a)114.70 (n/a)22.78 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.09 (n/a)0.08 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)221.40 (n/a)167.92 (n/a)165.50 (n/a)137.30 (n/a)34.24 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.10 (n/a)0.07 (n/a)0.07 (n/a)0.04 (n/a)0.02 (n/a)283.20 (n/a)181.98 (n/a)168.60 (n/a)120.40 (n/a)60.60 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)216.40 (n/a)172.30 (n/a)181.00 (n/a)139.10 (n/a)31.86 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)182.80 (n/a)166.32 (n/a)176.50 (n/a)142.60 (n/a)18.92 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)210.80 (n/a)162.66 (n/a)157.70 (n/a)115.60 (n/a)39.08 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)180.80 (n/a)168.50 (n/a)176.20 (n/a)141.50 (n/a)15.95 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)207.70 (n/a)169.40 (n/a)168.20 (n/a)121.20 (n/a)32.02 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)313.40 (n/a)216.86 (n/a)215.40 (n/a)166.90 (n/a)59.30 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.08 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)226.70 (n/a)186.70 (n/a)195.80 (n/a)108.50 (n/a)48.04 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.00 (n/a)257.00 (n/a)234.54 (n/a)231.50 (n/a)222.00 (n/a)13.35 (n/a)
iron/operators/gemm

test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:444.92 (-1.50%)4.43 (+4.35%)4.34 (+7.36%)3.97 (+0.25%)0.41 (-3.28%)2366.10 (-0.24%)2138.24 (-4.20%)2167.10 (-6.85%)1912.90 (+1.52%)197.30 (-1.60%)1933.90 (-1.50%)1742.07 (+4.35%)1707.06 (+7.36%)1563.51 (+0.25%)162.33 (-3.28%)
4d4b803 — 2026-06-22 17:54:574.99 (n/a)4.24 (n/a)4.04 (n/a)3.96 (n/a)0.43 (n/a)2371.90 (n/a)2232.02 (n/a)2326.50 (n/a)1884.20 (n/a)200.51 (n/a)1963.40 (n/a)1669.47 (n/a)1590.09 (n/a)1559.66 (n/a)167.85 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:441.22 (-21.27%)0.88 (-5.38%)0.85 (+10.39%)0.66 (+2.68%)0.23 (-39.42%)334.60 (-2.59%)264.60 (-0.06%)260.10 (-9.40%)180.70 (+26.99%)64.03 (-24.24%)52.22 (-21.27%)37.53 (-5.38%)36.28 (+10.39%)28.21 (+2.68%)9.81 (-39.42%)
4d4b803 — 2026-06-22 17:54:571.55 (n/a)0.93 (n/a)0.77 (n/a)0.64 (n/a)0.38 (n/a)343.50 (n/a)264.76 (n/a)287.10 (n/a)142.30 (n/a)84.51 (n/a)66.33 (n/a)39.67 (n/a)32.87 (n/a)27.47 (n/a)16.19 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:441.21 (+21.74%)0.76 (-8.70%)0.68 (-22.65%)0.59 (-10.72%)0.25 (+98.33%)373.90 (+12.01%)309.54 (+14.58%)327.20 (+29.28%)183.50 (-17.86%)74.08 (+71.49%)51.43 (+21.74%)32.52 (-8.70%)28.84 (-22.65%)25.24 (-10.72%)10.73 (+98.33%)
4d4b803 — 2026-06-22 17:54:570.99 (n/a)0.83 (n/a)0.87 (n/a)0.66 (n/a)0.13 (n/a)333.80 (n/a)270.16 (n/a)253.10 (n/a)223.40 (n/a)43.20 (n/a)42.24 (n/a)35.62 (n/a)37.28 (n/a)28.27 (n/a)5.41 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:440.52 (-0.67%)0.52 (-0.27%)0.52 (-0.02%)0.51 (-0.48%)0.00 (-24.88%)48905.00 (+0.48%)48704.32 (+0.28%)48638.50 (+0.02%)48599.90 (+0.67%)126.78 (-23.99%)353.50 (-0.67%)352.74 (-0.27%)353.22 (-0.02%)351.29 (-0.48%)0.92 (-24.88%)
4d4b803 — 2026-06-22 17:54:570.52 (n/a)0.52 (n/a)0.52 (n/a)0.52 (n/a)0.00 (n/a)48670.60 (n/a)48570.74 (n/a)48627.00 (n/a)48274.70 (n/a)166.79 (n/a)355.88 (n/a)353.71 (n/a)353.30 (n/a)352.98 (n/a)1.22 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:440.89 (-0.44%)0.88 (-0.28%)0.88 (-0.16%)0.87 (+0.16%)0.01 (-22.32%)28887.00 (-0.16%)28666.92 (+0.28%)28646.80 (+0.16%)28399.20 (+0.44%)190.58 (-22.10%)604.94 (-0.44%)599.31 (-0.28%)599.71 (-0.16%)594.73 (+0.16%)3.99 (-22.32%)
4d4b803 — 2026-06-22 17:54:570.89 (n/a)0.88 (n/a)0.88 (n/a)0.87 (n/a)0.01 (n/a)28932.90 (n/a)28586.36 (n/a)28600.70 (n/a)28274.00 (n/a)244.64 (n/a)607.62 (n/a)601.02 (n/a)600.68 (n/a)593.78 (n/a)5.14 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:443.21 (-2.03%)3.18 (+0.50%)3.18 (+0.64%)3.15 (+2.18%)0.02 (-67.33%)7983.80 (-2.14%)7912.54 (-0.53%)7915.40 (-0.64%)7849.50 (+2.08%)55.61 (-67.29%)2188.66 (-2.03%)2171.30 (+0.50%)2170.42 (+0.64%)2151.83 (+2.18%)15.26 (-67.33%)
4d4b803 — 2026-06-22 17:54:573.27 (n/a)3.16 (n/a)3.16 (n/a)3.08 (n/a)0.07 (n/a)8158.20 (n/a)7954.42 (n/a)7966.10 (n/a)7689.80 (n/a)169.99 (n/a)2234.13 (n/a)2160.59 (n/a)2156.62 (n/a)2105.84 (n/a)46.71 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:443.67 (-15.48%)3.52 (-7.36%)3.65 (-1.32%)2.94 (-7.40%)0.32 (-31.67%)2741.80 (+8.00%)2310.78 (+7.41%)2206.20 (+1.34%)2193.90 (+18.31%)241.07 (-11.60%)963.57 (-15.48%)921.87 (-7.36%)958.18 (-1.32%)771.01 (-7.40%)84.40 (-31.67%)
4d4b803 — 2026-06-22 17:54:574.35 (n/a)3.79 (n/a)3.70 (n/a)3.18 (n/a)0.47 (n/a)2538.80 (n/a)2151.30 (n/a)2177.10 (n/a)1854.30 (n/a)272.71 (n/a)1140.03 (n/a)995.10 (n/a)970.97 (n/a)832.66 (n/a)123.51 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:440.44 (+24.10%)0.34 (+8.72%)0.36 (+9.71%)0.28 (+1.46%)0.07 (+108.77%)4508.00 (-1.44%)3731.56 (-6.06%)3486.00 (-8.85%)2824.60 (-19.42%)703.27 (+69.03%)23.76 (+24.10%)18.52 (+8.72%)19.25 (+9.71%)14.89 (+1.46%)3.61 (+108.77%)
4d4b803 — 2026-06-22 17:54:570.36 (n/a)0.32 (n/a)0.33 (n/a)0.27 (n/a)0.03 (n/a)4573.70 (n/a)3972.46 (n/a)3824.60 (n/a)3505.50 (n/a)416.07 (n/a)19.14 (n/a)17.04 (n/a)17.55 (n/a)14.67 (n/a)1.73 (n/a)

test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:446.31 (-3.94%)5.10 (+11.39%)4.86 (+8.38%)3.44 (-4.02%)1.19 (-0.93%)1935.70 (+4.19%)1370.12 (-10.07%)1369.80 (-7.73%)1054.10 (+4.10%)357.77 (+4.75%)1949.72 (-3.94%)1575.69 (+11.39%)1500.40 (+8.38%)1061.76 (-4.02%)368.96 (-0.93%)
4d4b803 — 2026-06-22 17:54:576.57 (n/a)4.58 (n/a)4.48 (n/a)3.58 (n/a)1.21 (n/a)1857.80 (n/a)1523.60 (n/a)1484.60 (n/a)1012.60 (n/a)341.54 (n/a)2029.68 (n/a)1414.56 (n/a)1384.37 (n/a)1106.25 (n/a)372.43 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:440.29 (+34.73%)0.20 (+1.02%)0.18 (-7.98%)0.14 (-7.34%)0.06 (+142.92%)0.29 (+34.73%)0.19 (+1.02%)0.18 (-7.98%)0.14 (-7.34%)0.06 (+142.92%)
4d4b803 — 2026-06-22 17:54:570.21 (n/a)0.19 (n/a)0.20 (n/a)0.15 (n/a)0.02 (n/a)0.21 (n/a)0.19 (n/a)0.20 (n/a)0.15 (n/a)0.02 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4413.74 (+2.58%)12.84 (-2.92%)13.32 (+0.76%)10.64 (-18.54%)1.26 (+824.82%)13.74 (+2.58%)12.84 (-2.92%)13.32 (+0.76%)10.63 (-18.54%)1.26 (+824.82%)
4d4b803 — 2026-06-22 17:54:5713.40 (n/a)13.23 (n/a)13.22 (n/a)13.06 (n/a)0.14 (n/a)13.39 (n/a)13.22 (n/a)13.21 (n/a)13.05 (n/a)0.14 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4424.62 (-2.84%)23.69 (-4.99%)24.19 (-3.24%)21.12 (-13.63%)1.45 (+296.10%)24.61 (-2.84%)23.68 (-4.99%)24.17 (-3.24%)21.10 (-13.63%)1.45 (+296.10%)
4d4b803 — 2026-06-22 17:54:5725.34 (n/a)24.94 (n/a)25.00 (n/a)24.45 (n/a)0.37 (n/a)25.33 (n/a)24.92 (n/a)24.98 (n/a)24.43 (n/a)0.37 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4441.87 (-6.39%)39.99 (-4.04%)39.93 (-4.32%)38.02 (-2.47%)1.38 (-33.01%)41.85 (-6.39%)39.96 (-4.04%)39.91 (-4.32%)38.00 (-2.47%)1.38 (-33.01%)
4d4b803 — 2026-06-22 17:54:5744.73 (n/a)41.67 (n/a)41.73 (n/a)38.99 (n/a)2.06 (n/a)44.70 (n/a)41.65 (n/a)41.71 (n/a)38.96 (n/a)2.06 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4443.78 (-1.40%)42.51 (-0.51%)42.52 (-0.49%)41.13 (+1.72%)1.15 (-23.11%)43.76 (-1.40%)42.48 (-0.51%)42.50 (-0.49%)41.11 (+1.72%)1.15 (-23.11%)
4d4b803 — 2026-06-22 17:54:5744.41 (n/a)42.73 (n/a)42.73 (n/a)40.44 (n/a)1.49 (n/a)44.38 (n/a)42.70 (n/a)42.71 (n/a)40.41 (n/a)1.49 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4413.37 (+0.55%)12.97 (-0.29%)13.22 (-0.18%)11.93 (-1.82%)0.60 (+23.10%)13.36 (+0.55%)12.96 (-0.29%)13.21 (-0.18%)11.92 (-1.82%)0.60 (+23.10%)
4d4b803 — 2026-06-22 17:54:5713.29 (n/a)13.01 (n/a)13.24 (n/a)12.15 (n/a)0.48 (n/a)13.28 (n/a)13.00 (n/a)13.23 (n/a)12.14 (n/a)0.48 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4424.46 (-1.07%)23.89 (-1.94%)23.89 (-1.74%)22.99 (-4.85%)0.61 (+165.77%)24.45 (-1.07%)23.88 (-1.94%)23.88 (-1.74%)22.98 (-4.85%)0.60 (+165.77%)
4d4b803 — 2026-06-22 17:54:5724.73 (n/a)24.37 (n/a)24.32 (n/a)24.16 (n/a)0.23 (n/a)24.71 (n/a)24.35 (n/a)24.30 (n/a)24.15 (n/a)0.23 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4439.36 (-7.37%)37.06 (-8.29%)37.96 (-4.62%)31.32 (-19.44%)3.30 (+132.76%)39.34 (-7.37%)37.04 (-8.29%)37.94 (-4.62%)31.30 (-19.44%)3.30 (+132.76%)
4d4b803 — 2026-06-22 17:54:5742.49 (n/a)40.41 (n/a)39.80 (n/a)38.87 (n/a)1.42 (n/a)42.47 (n/a)40.39 (n/a)39.77 (n/a)38.85 (n/a)1.42 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:58:4445.40 (+3.64%)42.60 (+1.57%)42.83 (-1.94%)40.05 (+6.05%)1.96 (-26.62%)45.37 (+3.64%)42.57 (+1.57%)42.80 (-1.94%)40.03 (+6.05%)1.96 (-26.62%)
4d4b803 — 2026-06-22 17:54:5743.81 (n/a)41.94 (n/a)43.68 (n/a)37.77 (n/a)2.67 (n/a)43.78 (n/a)41.91 (n/a)43.65 (n/a)37.75 (n/a)2.67 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.01 (n/a)368.00 (n/a)218.16 (n/a)174.30 (n/a)156.20 (n/a)87.45 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)196.60 (n/a)184.30 (n/a)186.30 (n/a)167.30 (n/a)11.06 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)229.20 (n/a)196.68 (n/a)183.50 (n/a)174.70 (n/a)23.18 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)405.90 (n/a)236.96 (n/a)188.70 (n/a)139.50 (n/a)105.88 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)201.90 (n/a)177.56 (n/a)184.80 (n/a)143.50 (n/a)23.13 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)208.80 (n/a)181.46 (n/a)178.70 (n/a)147.50 (n/a)23.19 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)264.50 (n/a)209.44 (n/a)178.60 (n/a)167.50 (n/a)48.42 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)309.20 (n/a)222.66 (n/a)205.60 (n/a)187.40 (n/a)49.04 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (-3.75%)0.05 (+0.36%)0.05 (+3.18%)0.04 (+4.77%)0.01 (-13.40%)189.10 (-4.54%)160.36 (-1.20%)159.50 (-3.10%)118.60 (+3.94%)27.97 (-12.54%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)198.10 (n/a)162.30 (n/a)164.60 (n/a)114.10 (n/a)31.98 (n/a)

test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (-7.45%)0.04 (+22.73%)0.04 (+30.90%)0.03 (+34.27%)0.01 (-43.88%)238.00 (-25.53%)214.30 (-22.01%)233.00 (-23.61%)178.60 (+8.05%)29.64 (-53.00%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)319.60 (n/a)274.78 (n/a)305.00 (n/a)165.30 (n/a)63.06 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (+16.10%)0.05 (-2.06%)0.05 (+4.33%)0.03 (-39.37%)0.02 (+153.86%)309.70 (+64.91%)178.26 (+12.28%)149.60 (-4.10%)116.90 (-13.92%)75.68 (+287.64%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)187.80 (n/a)158.76 (n/a)156.00 (n/a)135.80 (n/a)19.52 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (-1.63%)0.05 (-6.26%)0.05 (+1.54%)0.03 (-26.87%)0.01 (+18.90%)246.60 (+36.77%)164.40 (+9.82%)155.20 (-1.52%)118.20 (+1.63%)48.48 (+75.90%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)180.30 (n/a)149.70 (n/a)157.60 (n/a)116.30 (n/a)27.56 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (-27.07%)0.05 (-5.70%)0.05 (+8.09%)0.04 (-7.43%)0.01 (-53.70%)202.00 (+8.02%)167.56 (+3.65%)156.10 (-7.47%)151.60 (+37.19%)21.21 (-28.46%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)187.00 (n/a)161.66 (n/a)168.70 (n/a)110.50 (n/a)29.64 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (-4.69%)0.05 (-6.06%)0.04 (+1.90%)0.04 (+5.25%)0.01 (-21.34%)223.20 (-4.98%)180.16 (+3.95%)184.80 (-1.86%)124.60 (+4.88%)39.92 (-18.38%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)234.90 (n/a)173.32 (n/a)188.30 (n/a)118.80 (n/a)48.92 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (+0.95%)0.05 (+9.70%)0.05 (+10.26%)0.04 (+3.11%)0.01 (-6.04%)225.70 (-3.01%)178.96 (-9.20%)175.80 (-9.29%)140.70 (-0.92%)32.50 (-8.15%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)232.70 (n/a)197.10 (n/a)193.80 (n/a)142.00 (n/a)35.38 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (-6.18%)0.05 (+6.03%)0.05 (+13.60%)0.05 (+20.83%)0.00 (-59.14%)173.50 (-17.22%)154.56 (-8.01%)154.60 (-12.01%)143.40 (+6.62%)11.97 (-62.56%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)209.60 (n/a)168.02 (n/a)175.70 (n/a)134.50 (n/a)31.97 (n/a)
iron/operators/mha

test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.21 (-0.19%)0.21 (-0.14%)0.21 (-0.16%)0.20 (-0.14%)0.00 (-12.29%)40939.70 (+0.14%)40900.74 (+0.14%)40910.20 (+0.16%)40869.40 (+0.19%)30.68 (-12.01%)
4d4b803 — 2026-06-22 17:54:570.21 (n/a)0.21 (n/a)0.21 (n/a)0.21 (n/a)0.00 (n/a)40882.90 (n/a)40842.88 (n/a)40843.80 (n/a)40791.90 (n/a)34.87 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (-2.59%)0.05 (+3.29%)0.05 (-1.24%)0.04 (+18.34%)0.01 (-27.19%)188.30 (-15.52%)165.80 (-5.19%)173.60 (+1.28%)128.90 (+2.63%)23.56 (-37.52%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)222.90 (n/a)174.88 (n/a)171.40 (n/a)125.60 (n/a)37.70 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.09 (-13.01%)0.07 (-17.64%)0.06 (-22.99%)0.06 (-15.29%)0.01 (+2.35%)205.80 (+18.07%)184.06 (+22.14%)197.60 (+29.83%)140.70 (+14.95%)26.79 (+39.95%)
4d4b803 — 2026-06-22 17:54:570.10 (n/a)0.08 (n/a)0.08 (n/a)0.07 (n/a)0.01 (n/a)174.30 (n/a)150.70 (n/a)152.20 (n/a)122.40 (n/a)19.14 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (-15.17%)0.05 (-16.79%)0.05 (-23.22%)0.04 (-7.57%)0.01 (-28.99%)203.00 (+8.21%)168.06 (+18.24%)164.40 (+30.27%)128.40 (+17.91%)32.85 (-7.82%)
4d4b803 — 2026-06-22 17:54:570.08 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)187.60 (n/a)142.14 (n/a)126.20 (n/a)108.90 (n/a)35.64 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (-2.22%)0.06 (+6.15%)0.06 (+5.53%)0.05 (+30.94%)0.01 (-34.36%)219.00 (-23.64%)177.50 (-8.97%)174.70 (-5.21%)148.00 (+2.28%)26.83 (-50.57%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)286.80 (n/a)195.00 (n/a)184.30 (n/a)144.70 (n/a)54.29 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (-7.73%)0.05 (-7.48%)0.05 (-6.75%)0.04 (+1.77%)0.01 (-26.53%)199.80 (-1.77%)170.30 (+6.65%)173.20 (+7.24%)137.10 (+8.38%)26.44 (-19.09%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)203.40 (n/a)159.68 (n/a)161.50 (n/a)126.50 (n/a)32.68 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.08 (-10.63%)0.06 (+8.87%)0.06 (-3.55%)0.05 (+50.33%)0.01 (-36.66%)193.30 (-33.46%)165.08 (-13.42%)180.50 (+3.68%)133.40 (+11.91%)29.02 (-54.76%)
4d4b803 — 2026-06-22 17:54:570.09 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.02 (n/a)290.50 (n/a)190.66 (n/a)174.10 (n/a)119.20 (n/a)64.15 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (-5.28%)0.05 (-12.81%)0.04 (-15.42%)0.03 (-9.61%)0.01 (+2.99%)241.10 (+10.65%)189.40 (+15.59%)197.30 (+18.21%)126.10 (+5.61%)42.54 (+16.35%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)217.90 (n/a)163.86 (n/a)166.90 (n/a)119.40 (n/a)36.56 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (+12.83%)0.05 (-9.51%)0.05 (-22.11%)0.04 (-18.19%)0.01 (+189.30%)225.00 (+22.22%)186.72 (+14.86%)202.60 (+28.39%)133.10 (-11.38%)41.20 (+213.58%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.00 (n/a)184.10 (n/a)162.56 (n/a)157.80 (n/a)150.20 (n/a)13.14 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (-5.67%)0.05 (-11.74%)0.05 (-11.58%)0.03 (-10.43%)0.01 (+0.84%)237.10 (+11.63%)178.30 (+13.87%)173.40 (+13.11%)127.90 (+5.97%)39.21 (+15.40%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)212.40 (n/a)156.58 (n/a)153.30 (n/a)120.70 (n/a)33.98 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.08 (+3.47%)0.05 (-3.67%)0.05 (-14.64%)0.04 (-19.92%)0.02 (+59.63%)237.90 (+24.88%)181.86 (+8.65%)203.70 (+17.14%)120.00 (-3.30%)49.83 (+96.97%)
4d4b803 — 2026-06-22 17:54:570.07 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)190.50 (n/a)167.38 (n/a)173.90 (n/a)124.10 (n/a)25.30 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.06 (+25.76%)0.04 (-14.47%)0.03 (-36.07%)0.02 (-42.71%)0.02 (+601.88%)341.40 (+74.54%)243.76 (+31.71%)281.70 (+56.41%)141.10 (-20.51%)87.40 (+848.69%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.04 (n/a)0.05 (n/a)0.04 (n/a)0.00 (n/a)195.60 (n/a)185.08 (n/a)180.10 (n/a)177.50 (n/a)9.21 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (-11.48%)0.04 (-8.16%)0.04 (-2.57%)0.03 (-9.52%)0.01 (-14.01%)321.90 (+10.50%)242.96 (+8.45%)215.30 (+2.62%)174.10 (+12.98%)60.56 (+7.40%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)291.30 (n/a)224.02 (n/a)209.80 (n/a)154.10 (n/a)56.38 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.07 (+13.21%)0.05 (-3.69%)0.05 (-5.44%)0.02 (-46.42%)0.02 (+238.79%)328.30 (+86.64%)190.20 (+17.84%)173.80 (+5.78%)121.70 (-11.68%)84.99 (+431.23%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)175.90 (n/a)161.40 (n/a)164.30 (n/a)137.80 (n/a)16.00 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.05 (-17.63%)0.04 (-18.21%)0.04 (-13.61%)0.03 (-26.90%)0.01 (-7.18%)301.60 (+36.78%)226.74 (+23.32%)213.10 (+15.75%)188.70 (+21.35%)44.95 (+60.36%)
4d4b803 — 2026-06-22 17:54:570.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)220.50 (n/a)183.86 (n/a)184.10 (n/a)155.50 (n/a)28.03 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.04 (-16.20%)0.03 (-11.78%)0.03 (-9.80%)0.02 (-14.26%)0.01 (-17.67%)337.20 (+16.64%)244.92 (+13.27%)238.60 (+10.87%)191.80 (+19.35%)55.06 (+17.59%)
4d4b803 — 2026-06-22 17:54:570.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)289.10 (n/a)216.22 (n/a)215.20 (n/a)160.70 (n/a)46.82 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.74 (+6.63%)0.59 (+6.98%)0.55 (+5.70%)0.53 (+21.38%)0.09 (-21.95%)184.30 (-17.61%)168.46 (-8.12%)179.20 (-5.39%)132.00 (-6.25%)21.96 (-39.44%)
4d4b803 — 2026-06-22 17:54:570.70 (n/a)0.55 (n/a)0.52 (n/a)0.44 (n/a)0.11 (n/a)223.70 (n/a)183.34 (n/a)189.40 (n/a)140.80 (n/a)36.26 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.55 (-14.59%)0.45 (-16.38%)0.45 (-19.05%)0.30 (-31.69%)0.10 (+20.99%)323.70 (+46.40%)227.52 (+22.72%)216.40 (+23.52%)180.00 (+17.11%)57.68 (+106.89%)
4d4b803 — 2026-06-22 17:54:570.64 (n/a)0.54 (n/a)0.56 (n/a)0.44 (n/a)0.08 (n/a)221.10 (n/a)185.40 (n/a)175.20 (n/a)153.70 (n/a)27.88 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.69 (+11.37%)0.49 (-5.24%)0.50 (-0.76%)0.29 (-34.79%)0.14 (+108.42%)340.70 (+53.33%)215.84 (+12.58%)196.80 (+0.77%)143.50 (-10.20%)73.84 (+204.49%)
4d4b803 — 2026-06-22 17:54:570.62 (n/a)0.52 (n/a)0.50 (n/a)0.44 (n/a)0.07 (n/a)222.20 (n/a)191.72 (n/a)195.30 (n/a)159.80 (n/a)24.25 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.54 (-6.94%)0.49 (-0.73%)0.48 (+4.03%)0.43 (+1.17%)0.04 (-34.13%)228.60 (-1.12%)203.46 (+0.14%)202.80 (-3.89%)183.70 (+7.49%)16.84 (-29.21%)
4d4b803 — 2026-06-22 17:54:570.58 (n/a)0.49 (n/a)0.47 (n/a)0.43 (n/a)0.06 (n/a)231.20 (n/a)203.18 (n/a)211.00 (n/a)170.90 (n/a)23.79 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.46 (-24.04%)0.42 (-17.54%)0.43 (-13.25%)0.36 (-3.53%)0.05 (-51.16%)205.00 (+3.64%)178.26 (+18.76%)171.20 (+15.29%)158.90 (+31.65%)20.72 (-32.97%)
4d4b803 — 2026-06-22 17:54:570.61 (n/a)0.51 (n/a)0.50 (n/a)0.37 (n/a)0.10 (n/a)197.80 (n/a)150.10 (n/a)148.50 (n/a)120.70 (n/a)30.91 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.48 (-16.08%)0.40 (-16.60%)0.36 (-16.78%)0.33 (-14.67%)0.06 (-25.54%)224.10 (+17.21%)190.30 (+19.24%)202.70 (+20.15%)153.00 (+19.16%)29.58 (+5.50%)
4d4b803 — 2026-06-22 17:54:570.57 (n/a)0.47 (n/a)0.44 (n/a)0.39 (n/a)0.09 (n/a)191.20 (n/a)159.60 (n/a)168.70 (n/a)128.40 (n/a)28.04 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.53 (+0.90%)0.44 (+1.45%)0.42 (-0.01%)0.34 (-4.70%)0.07 (+1.36%)214.30 (+4.95%)171.40 (-1.32%)175.70 (+0.00%)140.00 (-0.92%)28.87 (+4.77%)
4d4b803 — 2026-06-22 17:54:570.52 (n/a)0.43 (n/a)0.42 (n/a)0.36 (n/a)0.07 (n/a)204.20 (n/a)173.70 (n/a)175.70 (n/a)141.30 (n/a)27.56 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.43 (-5.81%)0.34 (-10.96%)0.32 (-8.54%)0.32 (-5.38%)0.05 (-19.10%)231.80 (+5.65%)216.92 (+11.81%)227.50 (+9.32%)173.50 (+6.18%)24.49 (-9.35%)
4d4b803 — 2026-06-22 17:54:570.45 (n/a)0.39 (n/a)0.35 (n/a)0.34 (n/a)0.06 (n/a)219.40 (n/a)194.00 (n/a)208.10 (n/a)163.40 (n/a)27.02 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.93 (-0.66%)0.88 (+7.58%)0.92 (+10.56%)0.74 (+8.89%)0.08 (-28.05%)176.60 (-8.16%)150.84 (-7.79%)142.70 (-9.51%)140.60 (+0.64%)15.18 (-33.65%)
4d4b803 — 2026-06-22 17:54:570.94 (n/a)0.81 (n/a)0.83 (n/a)0.68 (n/a)0.11 (n/a)192.30 (n/a)163.58 (n/a)157.70 (n/a)139.70 (n/a)22.87 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.99 (-10.99%)0.91 (+16.35%)0.94 (+38.93%)0.77 (+21.71%)0.09 (-56.41%)170.50 (-17.83%)145.68 (-17.02%)140.10 (-28.01%)131.80 (+12.36%)15.12 (-58.57%)
4d4b803 — 2026-06-22 17:54:571.12 (n/a)0.78 (n/a)0.67 (n/a)0.63 (n/a)0.20 (n/a)207.50 (n/a)175.56 (n/a)194.60 (n/a)117.30 (n/a)36.50 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:441.02 (-11.95%)0.89 (+5.64%)0.94 (+11.59%)0.70 (+16.18%)0.12 (-43.85%)187.30 (-13.92%)149.24 (-8.68%)139.20 (-10.42%)128.20 (+13.55%)23.05 (-44.40%)
4d4b803 — 2026-06-22 17:54:571.16 (n/a)0.85 (n/a)0.84 (n/a)0.60 (n/a)0.22 (n/a)217.60 (n/a)163.42 (n/a)155.40 (n/a)112.90 (n/a)41.45 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.00 (+9.09%)0.00 (+13.04%)0.00 (+22.22%)0.00 (+0.00%)0.00 (+38.44%)4820.23 (-6.73%)3909.66 (-11.78%)3656.34 (-18.62%)3499.10 (-2.17%)543.92 (-5.12%)
4d4b803 — 2026-06-22 17:54:570.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)5168.09 (n/a)4431.69 (n/a)4492.92 (n/a)3576.64 (n/a)573.28 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.00 (+4.55%)0.00 (+7.00%)0.00 (+10.00%)0.00 (-5.56%)0.00 (+58.75%)4779.92 (+5.01%)3847.70 (-7.30%)3652.20 (-11.41%)3551.82 (-6.70%)522.83 (+67.88%)
4d4b803 — 2026-06-22 17:54:570.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4552.05 (n/a)4150.54 (n/a)4122.63 (n/a)3806.88 (n/a)311.43 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:440.28 (+0.36%)0.21 (-12.91%)0.18 (-27.42%)0.17 (-10.54%)0.05 (+52.52%)12359.61 (+11.79%)10388.04 (+17.49%)11701.55 (+37.75%)7610.78 (-0.34%)2145.53 (+65.29%)
4d4b803 — 2026-06-22 17:54:570.27 (n/a)0.24 (n/a)0.25 (n/a)0.19 (n/a)0.03 (n/a)11055.95 (n/a)8841.86 (n/a)8494.64 (n/a)7636.48 (n/a)1298.01 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:443.54 (n/a)2.75 (n/a)2.53 (n/a)2.17 (n/a)0.54 (n/a)241.80 (n/a)196.00 (n/a)207.10 (n/a)148.10 (n/a)36.30 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:446.06 (n/a)4.64 (n/a)4.45 (n/a)3.80 (n/a)0.86 (n/a)275.90 (n/a)231.48 (n/a)235.60 (n/a)173.00 (n/a)38.08 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4d4b803 — 2026-06-22 17:54:573.09 (-11.56%)2.69 (-4.25%)2.86 (+8.05%)2.07 (-8.43%)0.40 (-19.42%)253.90 (+9.20%)198.64 (+4.00%)183.40 (-7.47%)169.60 (+13.07%)33.34 (+2.59%)
5503a95 — 2026-05-12 00:06:193.50 (n/a)2.81 (n/a)2.65 (n/a)2.26 (n/a)0.49 (n/a)232.50 (n/a)191.00 (n/a)198.20 (n/a)150.00 (n/a)32.50 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:58:443.94 (n/a)3.19 (n/a)3.02 (n/a)2.48 (n/a)0.58 (n/a)211.10 (n/a)168.72 (n/a)173.40 (n/a)133.00 (n/a)30.72 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4d4b803 — 2026-06-22 17:54:574.04 (+2.23%)2.89 (-19.61%)2.72 (-29.77%)2.00 (-30.01%)0.93 (+99.22%)262.60 (+42.87%)197.38 (+33.17%)192.60 (+42.35%)129.90 (-2.18%)62.14 (+186.82%)
5503a95 — 2026-05-12 00:06:193.95 (n/a)3.59 (n/a)3.88 (n/a)2.85 (n/a)0.47 (n/a)183.80 (n/a)148.22 (n/a)135.30 (n/a)132.80 (n/a)21.66 (n/a)
Krackan - Examples

IRON

Tested on 2026_06_23_15_46_42 at commit b6ae95b.

iron/applications/llama_3.2_1b
TestChecksTTFT (mean)TPS (mean)
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]✅ 5/52.13n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]✅ 5/52.154.16
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]✅ 5/52.09n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]✅ 5/52.084.17

Trends:

IRON Trends

iron/applications/llama_3.2_1b

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b6ae95b — 2026-06-23 15:40:562.14 (-0.47%)2.13 (+0.31%)2.13 (+0.23%)2.12 (+0.52%)0.01 (-53.44%)
4d4b803 — 2026-06-22 18:03:472.15 (n/a)2.12 (n/a)2.13 (n/a)2.10 (n/a)0.02 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b6ae95b — 2026-06-23 15:40:564.17 (-0.33%)4.16 (-0.10%)4.16 (-0.14%)4.14 (+0.05%)0.01 (-51.52%)2.25 (-0.35%)2.15 (-0.47%)2.12 (-0.61%)2.12 (-0.19%)0.06 (+0.37%)
4d4b803 — 2026-06-22 18:03:474.19 (n/a)4.17 (n/a)4.17 (n/a)4.14 (n/a)0.02 (n/a)2.26 (n/a)2.16 (n/a)2.13 (n/a)2.12 (n/a)0.06 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b6ae95b — 2026-06-23 15:40:562.11 (+0.52%)2.09 (+0.29%)2.08 (+0.14%)2.08 (+0.68%)0.01 (-3.40%)
4d4b803 — 2026-06-22 18:03:472.10 (n/a)2.08 (n/a)2.08 (n/a)2.06 (n/a)0.02 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b6ae95b — 2026-06-23 15:40:564.17 (-0.10%)4.17 (+0.11%)4.17 (+0.24%)4.16 (+0.34%)0.00 (-65.02%)2.10 (+0.19%)2.08 (-0.61%)2.08 (-0.67%)2.05 (-0.87%)0.02 (+44.83%)
4d4b803 — 2026-06-22 18:03:474.18 (n/a)4.16 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.10 (n/a)2.09 (n/a)2.10 (n/a)2.06 (n/a)0.01 (n/a)
Phoenix - Small

IRON

Tested on 2026_06_23_15_34_34 at commit b6ae95b.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5402.420.03n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5625.220.02n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5526.360.02n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5303.220.02n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5306.600.02n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5352.400.02n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5287.380.02n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5377.260.02n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5376.780.01n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5406.340.03n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5405.960.04n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5490.660.03n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5429.560.03n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5723.160.03n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5498.920.03n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5431.780.02n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5466.340.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5384.240.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5401.100.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5779.340.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5481.780.02n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5620.360.3916.74
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5470.860.4921.01
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/584281.540.30203.96
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/524699.941.02695.68
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/53552.522.38623.85
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/55233.860.2413.17
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.090.08
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a3.733.73
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a6.086.08
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a7.967.95
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a3.573.57
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a6.766.75
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a8.648.64
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5321.840.03n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5369.360.02n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5306.140.03n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5498.640.02n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5384.260.02n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5431.640.02n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5449.820.02n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5405.100.02n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5379.080.02n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5788.080.01n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5462.900.02n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5447.140.02n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5381.040.02n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5295.320.03n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5325.480.03n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5332.980.03n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5393.180.02n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5541.280.02n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5386.400.02n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5338.740.04n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5450.620.02n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5464.780.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5406.860.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5405.240.03n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5532.080.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5741.500.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5842.340.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5516.660.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5530.480.02n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5367.660.31n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5318.600.32n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5436.420.24n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5371.640.21n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5426.660.19n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5399.580.21n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5322.760.03n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5420.160.02n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5389.060.02n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5412.300.02n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5590.600.02n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5465.320.02n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5334.760.03n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5366.200.03n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5837.500.02n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5420.220.36n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5391.140.37n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5366.460.40n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/518569.880.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/511372.470.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/518550.860.11n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5681.580.02n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5294.840.03n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5308.960.03n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5292.720.03n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5777.320.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5297.460.03n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]✅ 5/5485.921.23n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]✅ 5/5482.902.22n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]✅ 5/5407.261.38n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (-20.81%)0.03 (-16.56%)0.03 (-41.04%)0.03 (+9.55%)0.01 (-48.52%)479.10 (-8.73%)402.42 (+10.29%)451.00 (+69.61%)311.70 (+26.30%)81.21 (-44.26%)
4d4b803 — 2026-06-22 18:12:430.05 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.01 (n/a)524.90 (n/a)364.88 (n/a)265.90 (n/a)246.80 (n/a)145.71 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (-52.01%)0.02 (-39.76%)0.02 (-19.35%)0.01 (-49.44%)0.01 (-59.74%)1046.70 (+97.79%)625.22 (+57.84%)592.50 (+23.98%)413.20 (+108.37%)249.75 (+65.57%)
4d4b803 — 2026-06-22 18:12:430.06 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)529.20 (n/a)396.12 (n/a)477.90 (n/a)198.30 (n/a)150.84 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (-28.46%)0.02 (-25.61%)0.02 (-16.58%)0.02 (+7.08%)0.01 (-47.72%)631.70 (-6.62%)526.36 (+21.51%)555.20 (+19.89%)315.20 (+39.78%)122.24 (-33.45%)
4d4b803 — 2026-06-22 18:12:430.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)676.50 (n/a)433.18 (n/a)463.10 (n/a)225.50 (n/a)183.68 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (+11.01%)0.02 (+9.08%)0.02 (+9.54%)0.01 (-11.26%)0.00 (+50.04%)496.70 (+12.68%)303.22 (-4.50%)262.40 (-8.70%)238.00 (-9.92%)108.90 (+53.49%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)440.80 (n/a)317.50 (n/a)287.40 (n/a)264.20 (n/a)70.95 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (+17.57%)0.02 (+13.13%)0.02 (+20.17%)0.01 (-12.45%)0.01 (+45.94%)510.80 (+14.22%)306.60 (-7.09%)237.20 (-16.80%)222.30 (-14.96%)122.40 (+43.38%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)447.20 (n/a)329.98 (n/a)285.10 (n/a)261.40 (n/a)85.37 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (+5.38%)0.02 (+13.80%)0.01 (+17.76%)0.01 (+29.64%)0.00 (-13.81%)446.10 (-22.86%)352.40 (-16.69%)372.60 (-15.09%)248.40 (-5.12%)99.15 (-36.60%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)578.30 (n/a)423.02 (n/a)438.80 (n/a)261.80 (n/a)156.39 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (+119.94%)0.02 (+114.86%)0.02 (+87.50%)0.01 (+277.89%)0.01 (+83.80%)544.40 (-73.54%)287.38 (-63.02%)251.90 (-46.67%)151.00 (-54.53%)150.74 (-79.12%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)0.00 (n/a)2057.20 (n/a)777.02 (n/a)472.30 (n/a)332.10 (n/a)721.88 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (-1.57%)0.02 (-16.31%)0.01 (-38.27%)0.01 (+33.43%)0.00 (-12.43%)473.90 (-25.05%)377.26 (+13.14%)440.50 (+62.01%)248.00 (+1.60%)107.88 (-35.64%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)632.30 (n/a)333.44 (n/a)271.90 (n/a)244.10 (n/a)167.63 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (+8.91%)0.01 (+4.81%)0.02 (+17.37%)0.01 (+17.47%)0.00 (+24.54%)512.40 (-14.87%)376.78 (-3.36%)304.50 (-14.80%)280.60 (-8.21%)118.10 (-3.31%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)601.90 (n/a)389.90 (n/a)357.40 (n/a)305.70 (n/a)122.15 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (n/a)0.03 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)562.00 (n/a)406.34 (n/a)338.60 (n/a)315.00 (n/a)114.95 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.07 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)505.60 (n/a)405.96 (n/a)454.80 (n/a)168.80 (n/a)135.48 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)596.40 (n/a)490.66 (n/a)511.90 (n/a)288.00 (n/a)122.85 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (n/a)0.03 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)613.10 (n/a)429.56 (n/a)345.70 (n/a)300.80 (n/a)148.60 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.05 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.02 (n/a)1937.90 (n/a)723.16 (n/a)480.40 (n/a)262.00 (n/a)691.43 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)646.60 (n/a)498.92 (n/a)470.90 (n/a)338.40 (n/a)117.63 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)557.90 (n/a)431.78 (n/a)471.10 (n/a)241.00 (n/a)132.28 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)797.60 (n/a)466.34 (n/a)400.10 (n/a)254.00 (n/a)227.41 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)582.60 (n/a)384.24 (n/a)292.40 (n/a)271.30 (n/a)146.65 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)571.30 (n/a)401.10 (n/a)451.20 (n/a)234.10 (n/a)144.00 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)0.01 (n/a)1913.00 (n/a)779.34 (n/a)594.80 (n/a)274.40 (n/a)648.19 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)591.50 (n/a)481.78 (n/a)508.90 (n/a)344.50 (n/a)95.22 (n/a)
iron/operators/gemm

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:400.61 (-7.90%)0.39 (-15.10%)0.34 (-25.25%)0.25 (-23.96%)0.14 (+14.61%)881.60 (+31.52%)620.36 (+23.01%)656.30 (+33.80%)363.50 (+8.60%)201.93 (+63.68%)25.97 (-7.90%)16.74 (-15.10%)14.38 (-25.25%)10.70 (-23.96%)6.07 (+14.61%)
4d4b803 — 2026-06-22 18:12:430.66 (n/a)0.46 (n/a)0.45 (n/a)0.33 (n/a)0.12 (n/a)670.30 (n/a)504.30 (n/a)490.50 (n/a)334.70 (n/a)123.36 (n/a)28.19 (n/a)19.72 (n/a)19.24 (n/a)14.08 (n/a)5.29 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:400.66 (+63.26%)0.49 (+37.11%)0.48 (+35.23%)0.35 (+9.51%)0.12 (+280.29%)634.80 (-8.68%)470.86 (-24.02%)459.40 (-26.06%)335.00 (-38.76%)114.39 (+113.37%)28.17 (+63.26%)21.01 (+37.11%)20.54 (+35.23%)14.87 (+9.51%)5.07 (+280.29%)
4d4b803 — 2026-06-22 18:12:430.40 (n/a)0.36 (n/a)0.36 (n/a)0.32 (n/a)0.03 (n/a)695.10 (n/a)619.70 (n/a)621.30 (n/a)547.00 (n/a)53.61 (n/a)17.25 (n/a)15.32 (n/a)15.19 (n/a)13.58 (n/a)1.33 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:400.31 (-2.40%)0.30 (-2.65%)0.30 (-2.25%)0.29 (-5.29%)0.01 (+42.96%)88277.60 (+5.59%)84281.54 (+2.75%)83792.60 (+2.31%)82436.90 (+2.46%)2337.82 (+55.11%)208.40 (-2.40%)203.96 (-2.65%)205.03 (-2.25%)194.61 (-5.29%)5.50 (+42.96%)
4d4b803 — 2026-06-22 18:12:430.31 (n/a)0.31 (n/a)0.31 (n/a)0.30 (n/a)0.01 (n/a)83603.40 (n/a)82023.64 (n/a)81903.40 (n/a)80454.70 (n/a)1507.16 (n/a)213.53 (n/a)209.51 (n/a)209.76 (n/a)205.49 (n/a)3.85 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:401.04 (+1.15%)1.02 (-0.11%)1.02 (-0.09%)1.00 (-0.66%)0.02 (+69.02%)25224.90 (+0.67%)24699.94 (+0.12%)24649.10 (+0.09%)24160.00 (-1.13%)387.09 (+67.79%)711.09 (+1.15%)695.68 (-0.11%)696.98 (-0.09%)681.07 (-0.66%)10.91 (+69.01%)
4d4b803 — 2026-06-22 18:12:431.03 (n/a)1.02 (n/a)1.02 (n/a)1.00 (n/a)0.01 (n/a)25057.80 (n/a)24670.26 (n/a)24626.20 (n/a)24437.30 (n/a)230.71 (n/a)703.02 (n/a)696.43 (n/a)697.63 (n/a)685.61 (n/a)6.46 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:403.37 (-3.35%)2.38 (+16.66%)2.31 (+37.22%)1.75 (+31.33%)0.61 (-27.68%)4615.70 (-23.86%)3552.52 (-19.14%)3496.40 (-27.13%)2394.80 (+3.46%)817.97 (-40.84%)882.73 (-3.35%)623.85 (+16.66%)604.59 (+37.22%)457.99 (+31.33%)160.17 (-27.68%)
4d4b803 — 2026-06-22 18:12:433.48 (n/a)2.04 (n/a)1.68 (n/a)1.33 (n/a)0.84 (n/a)6061.80 (n/a)4393.58 (n/a)4798.00 (n/a)2314.70 (n/a)1382.73 (n/a)913.28 (n/a)534.74 (n/a)440.59 (n/a)348.73 (n/a)221.49 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:400.32 (+0.35%)0.24 (+6.41%)0.22 (+2.77%)0.21 (+16.57%)0.05 (-15.83%)5996.90 (-14.22%)5233.86 (-7.51%)5711.40 (-2.69%)3846.90 (-0.35%)879.54 (-27.14%)17.45 (+0.35%)13.17 (+6.41%)11.75 (+2.77%)11.19 (+16.57%)2.58 (-15.83%)
4d4b803 — 2026-06-22 18:12:430.32 (n/a)0.23 (n/a)0.21 (n/a)0.18 (n/a)0.06 (n/a)6990.80 (n/a)5658.88 (n/a)5869.50 (n/a)3860.40 (n/a)1207.19 (n/a)17.38 (n/a)12.37 (n/a)11.43 (n/a)9.60 (n/a)3.06 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:400.13 (+0.20%)0.09 (-3.55%)0.11 (+25.11%)0.02 (-70.25%)0.05 (+72.42%)0.13 (+0.20%)0.08 (-3.55%)0.11 (+25.11%)0.02 (-70.25%)0.05 (+72.42%)
4d4b803 — 2026-06-22 18:12:430.13 (n/a)0.09 (n/a)0.09 (n/a)0.06 (n/a)0.03 (n/a)0.13 (n/a)0.09 (n/a)0.09 (n/a)0.06 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:403.86 (-1.20%)3.73 (+3.42%)3.72 (-0.58%)3.58 (+8.11%)0.11 (-61.18%)3.86 (-1.20%)3.73 (+3.42%)3.72 (-0.58%)3.58 (+8.11%)0.11 (-61.18%)
4d4b803 — 2026-06-22 18:12:433.91 (n/a)3.61 (n/a)3.75 (n/a)3.31 (n/a)0.27 (n/a)3.91 (n/a)3.61 (n/a)3.74 (n/a)3.31 (n/a)0.27 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:406.71 (-7.76%)6.08 (-4.06%)5.81 (-7.85%)5.54 (-2.72%)0.54 (-7.48%)6.70 (-7.76%)6.08 (-4.06%)5.81 (-7.85%)5.54 (-2.72%)0.54 (-7.48%)
4d4b803 — 2026-06-22 18:12:437.27 (n/a)6.34 (n/a)6.31 (n/a)5.70 (n/a)0.58 (n/a)7.27 (n/a)6.34 (n/a)6.30 (n/a)5.70 (n/a)0.58 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:409.16 (-3.90%)7.96 (-5.66%)7.86 (-8.21%)7.02 (-3.29%)0.94 (-4.60%)9.16 (-3.90%)7.95 (-5.66%)7.85 (-8.21%)7.01 (-3.29%)0.94 (-4.60%)
4d4b803 — 2026-06-22 18:12:439.53 (n/a)8.44 (n/a)8.56 (n/a)7.26 (n/a)0.98 (n/a)9.53 (n/a)8.43 (n/a)8.55 (n/a)7.25 (n/a)0.98 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:403.73 (-4.80%)3.57 (-4.99%)3.68 (-2.46%)3.25 (-10.75%)0.21 (+96.93%)3.73 (-4.80%)3.57 (-4.99%)3.68 (-2.46%)3.25 (-10.75%)0.21 (+96.93%)
4d4b803 — 2026-06-22 18:12:433.92 (n/a)3.76 (n/a)3.77 (n/a)3.64 (n/a)0.11 (n/a)3.92 (n/a)3.76 (n/a)3.77 (n/a)3.64 (n/a)0.11 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:407.12 (+2.95%)6.76 (+8.15%)6.93 (+14.09%)5.78 (+2.14%)0.55 (-8.23%)7.12 (+2.95%)6.75 (+8.15%)6.92 (+14.09%)5.78 (+2.14%)0.55 (-8.23%)
4d4b803 — 2026-06-22 18:12:436.92 (n/a)6.25 (n/a)6.07 (n/a)5.66 (n/a)0.60 (n/a)6.91 (n/a)6.24 (n/a)6.07 (n/a)5.66 (n/a)0.60 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b6ae95b — 2026-06-23 15:31:4010.32 (-26.87%)8.64 (-20.59%)8.41 (-27.57%)7.73 (+4.60%)1.01 (-69.72%)10.32 (-26.87%)8.64 (-20.59%)8.41 (-27.57%)7.72 (+4.60%)1.00 (-69.72%)
4d4b803 — 2026-06-22 18:12:4314.12 (n/a)10.88 (n/a)11.61 (n/a)7.39 (n/a)3.32 (n/a)14.11 (n/a)10.88 (n/a)11.60 (n/a)7.38 (n/a)3.32 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)464.50 (n/a)321.84 (n/a)291.40 (n/a)255.70 (n/a)84.43 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)578.10 (n/a)369.36 (n/a)306.40 (n/a)227.90 (n/a)140.44 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)479.70 (n/a)306.14 (n/a)263.00 (n/a)241.50 (n/a)99.19 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)607.00 (n/a)498.64 (n/a)530.00 (n/a)261.60 (n/a)136.71 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)530.60 (n/a)384.26 (n/a)380.60 (n/a)278.10 (n/a)107.92 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)590.60 (n/a)431.64 (n/a)485.80 (n/a)228.60 (n/a)139.03 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (+10.46%)0.02 (+7.88%)0.02 (+10.28%)0.01 (-6.63%)0.01 (+14.26%)633.90 (+7.10%)449.82 (-5.94%)485.20 (-9.33%)233.50 (-9.46%)147.35 (+3.75%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)591.90 (n/a)478.24 (n/a)535.10 (n/a)257.90 (n/a)142.02 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (-6.50%)0.02 (+8.13%)0.03 (+43.81%)0.01 (-8.73%)0.01 (+12.63%)572.70 (+9.59%)405.10 (-4.82%)324.20 (-30.47%)291.00 (+6.95%)136.41 (+35.22%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)522.60 (n/a)425.62 (n/a)466.30 (n/a)272.10 (n/a)100.88 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (+20.09%)0.02 (+10.21%)0.02 (-4.20%)0.02 (-0.95%)0.01 (+47.38%)509.00 (+0.95%)379.08 (-4.94%)433.20 (+4.39%)237.60 (-16.72%)130.22 (+19.87%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)504.20 (n/a)398.76 (n/a)415.00 (n/a)285.30 (n/a)108.64 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (-38.28%)0.01 (-31.25%)0.01 (-14.84%)0.01 (-45.91%)0.01 (-31.50%)1283.20 (+84.90%)788.08 (+53.32%)614.30 (+17.43%)437.70 (+62.05%)354.98 (+127.78%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)694.00 (n/a)514.00 (n/a)523.10 (n/a)270.10 (n/a)155.85 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (-21.15%)0.02 (-33.07%)0.02 (-48.05%)0.01 (-5.14%)0.01 (-27.18%)588.70 (+5.41%)462.90 (+44.96%)486.60 (+92.48%)290.10 (+26.85%)131.31 (-4.07%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)558.50 (n/a)319.32 (n/a)252.80 (n/a)228.70 (n/a)136.88 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (+44.13%)0.02 (+33.90%)0.02 (+19.25%)0.01 (+73.44%)0.01 (+27.91%)572.40 (-42.34%)447.14 (-28.19%)480.80 (-16.13%)264.50 (-30.61%)114.60 (-52.08%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)992.80 (n/a)622.66 (n/a)573.30 (n/a)381.20 (n/a)239.16 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (+5.53%)0.02 (+25.16%)0.02 (+37.38%)0.02 (+72.59%)0.01 (-11.47%)536.80 (-42.06%)386.40 (-27.95%)391.20 (-27.21%)238.50 (-5.24%)130.45 (-50.40%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)926.50 (n/a)536.26 (n/a)537.40 (n/a)251.70 (n/a)263.01 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.05 (-7.60%)0.04 (+15.46%)0.04 (+29.83%)0.02 (+53.46%)0.01 (-29.68%)527.20 (-34.83%)338.74 (-23.20%)277.70 (-22.97%)252.30 (+8.24%)114.92 (-50.73%)
4d4b803 — 2026-06-22 18:12:430.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)809.00 (n/a)441.06 (n/a)360.50 (n/a)233.10 (n/a)233.25 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (+8.13%)0.02 (+8.04%)0.02 (+3.53%)0.01 (+34.53%)0.01 (+4.06%)575.20 (-25.67%)450.62 (-9.80%)480.20 (-3.42%)236.40 (-7.51%)132.14 (-28.65%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)773.80 (n/a)499.60 (n/a)497.20 (n/a)255.60 (n/a)185.21 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (-8.61%)0.02 (+9.94%)0.02 (-3.42%)0.02 (+220.48%)0.01 (-39.62%)577.70 (-68.80%)464.78 (-35.69%)495.30 (+3.53%)287.00 (+9.42%)114.05 (-82.20%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)1851.60 (n/a)722.74 (n/a)478.40 (n/a)262.30 (n/a)640.71 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (-0.74%)0.02 (-1.78%)0.02 (-24.72%)0.02 (+15.17%)0.01 (-9.26%)523.00 (-13.17%)406.86 (-1.40%)440.50 (+32.84%)277.70 (+0.76%)118.42 (-24.01%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)602.30 (n/a)412.64 (n/a)331.60 (n/a)275.60 (n/a)155.83 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.04 (+15.75%)0.03 (+3.48%)0.02 (-7.24%)0.02 (+10.07%)0.01 (+25.04%)505.30 (-9.15%)405.24 (-1.45%)477.80 (+7.81%)233.20 (-13.60%)120.77 (+2.91%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)556.20 (n/a)411.20 (n/a)443.20 (n/a)269.90 (n/a)117.35 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (-43.03%)0.02 (-34.01%)0.02 (-35.06%)0.01 (-17.45%)0.00 (-57.41%)755.70 (+21.13%)532.08 (+40.96%)470.70 (+53.97%)402.40 (+75.49%)141.20 (-10.31%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)623.90 (n/a)377.46 (n/a)305.70 (n/a)229.30 (n/a)157.43 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (+46.42%)0.02 (+12.97%)0.02 (+14.90%)0.00 (-66.57%)0.01 (+603.08%)1929.70 (+199.13%)741.50 (+28.05%)505.70 (-12.96%)359.70 (-31.71%)668.70 (+1429.33%)
4d4b803 — 2026-06-22 18:12:430.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)645.10 (n/a)579.08 (n/a)581.00 (n/a)526.70 (n/a)43.73 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.03 (-18.72%)0.02 (-13.46%)0.02 (+12.07%)0.00 (-5.23%)0.01 (-24.45%)2502.90 (+5.51%)842.34 (+6.66%)446.30 (-10.78%)271.70 (+23.05%)934.27 (+4.37%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2372.10 (n/a)789.76 (n/a)500.20 (n/a)220.80 (n/a)895.19 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.05 (+34.85%)0.02 (-30.82%)0.02 (-55.57%)0.01 (-34.46%)0.02 (+144.04%)657.10 (+52.57%)516.66 (+76.26%)599.00 (+125.10%)176.80 (-25.84%)194.59 (+148.31%)
4d4b803 — 2026-06-22 18:12:430.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)430.70 (n/a)293.12 (n/a)266.10 (n/a)238.40 (n/a)78.37 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.02 (-38.08%)0.02 (-12.91%)0.02 (-2.84%)0.01 (-17.84%)0.00 (-51.80%)792.80 (+21.73%)530.48 (+7.85%)511.80 (+2.94%)381.20 (+61.53%)161.76 (+1.56%)
4d4b803 — 2026-06-22 18:12:430.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)651.30 (n/a)491.86 (n/a)497.20 (n/a)236.00 (n/a)159.27 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.47 (+10.90%)0.31 (+20.26%)0.31 (+38.86%)0.16 (+26.58%)0.12 (-1.28%)613.60 (-21.00%)367.66 (-20.90%)316.60 (-27.98%)208.80 (-9.84%)160.06 (-27.51%)
4d4b803 — 2026-06-22 18:12:430.42 (n/a)0.26 (n/a)0.22 (n/a)0.13 (n/a)0.12 (n/a)776.70 (n/a)464.78 (n/a)439.60 (n/a)231.60 (n/a)220.81 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.38 (+33.52%)0.32 (+76.86%)0.29 (+49.67%)0.26 (+414.07%)0.06 (-35.83%)378.90 (-80.55%)318.60 (-59.18%)336.00 (-33.19%)256.70 (-25.12%)53.52 (-91.89%)
4d4b803 — 2026-06-22 18:12:430.29 (n/a)0.18 (n/a)0.20 (n/a)0.05 (n/a)0.09 (n/a)1947.60 (n/a)780.52 (n/a)502.90 (n/a)342.80 (n/a)660.10 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.30 (+11.95%)0.24 (+27.45%)0.25 (+43.16%)0.15 (-1.77%)0.06 (+30.54%)653.60 (+1.81%)436.42 (-19.72%)398.00 (-30.14%)328.30 (-10.67%)131.08 (+26.57%)
4d4b803 — 2026-06-22 18:12:430.27 (n/a)0.19 (n/a)0.17 (n/a)0.15 (n/a)0.05 (n/a)642.00 (n/a)543.60 (n/a)569.70 (n/a)367.50 (n/a)103.56 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.24 (+27.80%)0.21 (+25.59%)0.23 (+34.27%)0.14 (-0.13%)0.04 (+121.90%)514.30 (+0.12%)371.64 (-17.87%)321.30 (-25.52%)303.10 (-21.76%)91.74 (+66.33%)
4d4b803 — 2026-06-22 18:12:430.19 (n/a)0.16 (n/a)0.17 (n/a)0.14 (n/a)0.02 (n/a)513.70 (n/a)452.50 (n/a)431.40 (n/a)387.40 (n/a)55.16 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.31 (+21.36%)0.19 (+32.47%)0.14 (+8.62%)0.13 (+255.18%)0.08 (-3.47%)548.40 (-71.85%)426.66 (-44.09%)513.20 (-7.93%)237.30 (-17.60%)144.19 (-78.67%)
4d4b803 — 2026-06-22 18:12:430.26 (n/a)0.15 (n/a)0.13 (n/a)0.04 (n/a)0.08 (n/a)1947.80 (n/a)763.12 (n/a)557.40 (n/a)288.00 (n/a)675.98 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.30 (+27.69%)0.21 (+17.59%)0.16 (+13.95%)0.14 (+5.79%)0.08 (+61.25%)532.60 (-5.48%)399.58 (-9.98%)447.20 (-12.24%)242.40 (-21.68%)143.19 (+20.94%)
4d4b803 — 2026-06-22 18:12:430.24 (n/a)0.18 (n/a)0.14 (n/a)0.13 (n/a)0.05 (n/a)563.50 (n/a)443.88 (n/a)509.60 (n/a)309.50 (n/a)118.40 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.56 (+44.44%)0.36 (+50.96%)0.27 (+3.90%)0.22 (+234.49%)0.16 (+11.01%)605.70 (-70.10%)420.22 (-51.24%)482.50 (-3.75%)232.50 (-30.76%)163.92 (-77.12%)
4d4b803 — 2026-06-22 18:12:430.39 (n/a)0.24 (n/a)0.26 (n/a)0.06 (n/a)0.14 (n/a)2025.90 (n/a)861.80 (n/a)501.30 (n/a)335.80 (n/a)716.33 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.50 (+14.50%)0.37 (+28.05%)0.42 (+52.64%)0.23 (+20.97%)0.13 (+32.44%)569.40 (-17.35%)391.14 (-19.82%)309.30 (-34.50%)262.10 (-12.66%)147.04 (-1.05%)
4d4b803 — 2026-06-22 18:12:430.44 (n/a)0.29 (n/a)0.28 (n/a)0.19 (n/a)0.09 (n/a)688.90 (n/a)487.82 (n/a)472.20 (n/a)300.10 (n/a)148.59 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.55 (+8.77%)0.40 (+31.76%)0.47 (+84.63%)0.24 (+48.79%)0.14 (-4.15%)553.80 (-32.79%)366.46 (-28.86%)280.30 (-45.83%)236.70 (-8.08%)142.85 (-38.01%)
4d4b803 — 2026-06-22 18:12:430.51 (n/a)0.30 (n/a)0.25 (n/a)0.16 (n/a)0.15 (n/a)824.00 (n/a)515.14 (n/a)517.40 (n/a)257.50 (n/a)230.46 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.00 (-62.50%)0.00 (-50.00%)0.00 (-33.33%)0.00 (+0.00%)0.00 (-82.18%)21039.33 (+18.32%)18569.88 (+58.37%)19211.09 (+39.46%)14176.83 (+173.88%)2845.63 (-46.10%)
4d4b803 — 2026-06-22 18:12:430.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)17782.11 (n/a)11725.34 (n/a)13775.42 (n/a)5176.26 (n/a)5279.68 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.00 (-7.14%)0.00 (-14.29%)0.00 (-41.67%)0.00 (+0.00%)0.00 (-12.97%)18741.55 (-1.35%)11372.47 (+11.29%)11734.88 (+67.32%)6356.35 (+6.20%)5143.09 (-8.92%)
4d4b803 — 2026-06-22 18:12:430.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)18997.36 (n/a)10218.74 (n/a)7013.58 (n/a)5985.05 (n/a)5646.66 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:400.13 (+2.10%)0.11 (+26.09%)0.11 (+38.86%)0.10 (+27.11%)0.01 (-29.56%)21403.02 (-21.31%)18550.86 (-22.35%)18685.66 (-28.00%)16014.65 (-2.06%)2417.42 (-44.96%)
4d4b803 — 2026-06-22 18:12:430.13 (n/a)0.09 (n/a)0.08 (n/a)0.08 (n/a)0.02 (n/a)27199.35 (n/a)23890.15 (n/a)25953.98 (n/a)16350.74 (n/a)4391.76 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:401.85 (n/a)1.23 (n/a)0.99 (n/a)0.73 (n/a)0.51 (n/a)713.90 (n/a)485.92 (n/a)531.50 (n/a)283.20 (n/a)187.08 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:402.55 (n/a)2.22 (n/a)2.23 (n/a)1.66 (n/a)0.34 (n/a)630.20 (n/a)482.90 (n/a)469.70 (n/a)412.00 (n/a)86.16 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4d4b803 — 2026-06-22 18:12:431.09 (-32.64%)1.03 (-5.64%)1.04 (-5.85%)0.97 (+500.21%)0.05 (-91.52%)541.80 (-83.34%)512.04 (-48.01%)502.10 (+6.22%)481.70 (+48.44%)25.22 (-98.01%)
5503a95 — 2026-05-11 23:50:481.62 (n/a)1.09 (n/a)1.11 (n/a)0.16 (n/a)0.59 (n/a)3251.70 (n/a)984.96 (n/a)472.70 (n/a)324.50 (n/a)1270.69 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b6ae95b — 2026-06-23 15:31:401.70 (n/a)1.38 (n/a)1.60 (n/a)0.88 (n/a)0.37 (n/a)595.60 (n/a)407.26 (n/a)328.60 (n/a)307.70 (n/a)127.48 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4d4b803 — 2026-06-22 18:12:432.09 (+42.62%)1.22 (+14.06%)1.04 (+8.33%)0.51 (-40.85%)0.59 (+147.17%)1018.50 (+69.07%)529.80 (+4.78%)504.00 (-7.69%)251.40 (-29.87%)294.28 (+206.40%)
5503a95 — 2026-05-11 23:50:481.46 (n/a)1.07 (n/a)0.96 (n/a)0.87 (n/a)0.24 (n/a)602.40 (n/a)505.64 (n/a)546.00 (n/a)358.50 (n/a)96.04 (n/a)
Phoenix - Examples

IRON

Tested on 2026_06_23_15_45_05 at commit b6ae95b.

Trends:

IRON Trends

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants