| Loop id | Source Location | Source Function | Level | Max Thread Time / Walltime icx_2 (%) | Exclusive Coverage icx_2 (%) | Inclusive Coverage icx_2 (%) | Max Exclusive Time Over Threads icx_2 (s) | Max Inclusive Time Over Threads icx_2 (s) | Exclusive Time w.r.t. Wall Time icx_2 (s) | Inclusive Time w.r.t. Wall Time icx_2 (s) | Nb Threads icx_2 | GFLOPS icx_2 | Vectorization Ratio (%) | Vector Length Use (%) | Speedup If No Scalar Integer | Speedup If FP Vectorized | Speedup If Fully Vectorized | Speedup If Perfect Load Balancing icx_2 | Stride 0 | Stride 1 | Stride n | Stride Unknown | Stride Indirect | Array Access Efficiency |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 442 | libggml-cpu.so - mmq.cpp:1573-1597 [...] | ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::{lambda(int, int)#2}::operator()(int, int) const::{lambda()#1}::operator()() const | Single | 35.27 | 23.67 | 23.67 | 12.89 | 12.89 | 4.50 | 4.50 | 183 | 79.92 | NA | NA | NA | NA | NA | 2.79 | NA | NA | NA | NA | NA | 0.00 |
| 2344 | libggml-cpu.so - vec.h:491-497 | ggml_compute_forward_flash_attn_ext | Innermost | 0.52 | 0.12 | 0.12 | 0.19 | 0.19 | 0.02 | 0.02 | 34 | 1071.50 | NA | NA | NA | NA | NA | 1.5 | NA | NA | NA | NA | NA | 0.00 |
| 648 | libggml-cpu.so - mmq.cpp:303-1392 [...] | void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int,... | Innermost | 0.14 | 0.09 | 0.09 | 0.05 | 0.05 | 0.02 | 0.02 | 163 | 0.00 | 97.1 | 54.17 | 1 | 1 | 1.33 | 2.55 | 18 | 0 | 0 | 1 | 8 | 68.52 |
| 546 | libggml-cpu.so - mmq.cpp:520-2194 [...] | ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::{lambda(int, int)#3}::operator()(int, int) const::{lambda()#1}::operator()() const | Outermost | 0.11 | 0.08 | 0.09 | 0.04 | 0.04 | 0.01 | 0.02 | 150 | 333.04 | NA | NA | NA | NA | NA | 2.18 | NA | NA | NA | NA | NA | 0.00 |
| 1177 | libggml-cpu.so - vec.cpp:311-316 | ggml_vec_dot_f16 | Single | 0.22 | 0.04 | 0.04 | 0.08 | 0.08 | 0.01 | 0.01 | 32 | 1258.71 | NA | NA | NA | NA | NA | 1.74 | NA | NA | NA | NA | NA | 0.00 |
| 2001 | libggml-cpu.so - ops.cpp:6220-6245 [...] | ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool) | Innermost | 0.11 | 0.03 | 0.03 | 0.04 | 0.04 | 0.01 | 0.01 | 192 | 407.77 | 15.56 | 9.17 | 1 | 1.74 | 5.33 | 6.68 | 1 | 1 | 0 | 0 | 0 | 100.00 |
| 2336 | libggml-cpu.so - ops.cpp:8759-8927 [...] | ggml_compute_forward_flash_attn_ext | InBetween | 0.16 | 0.02 | 0.14 | 0.06 | 0.20 | 0.00 | 0.03 | 36 | 3376.38 | NA | NA | NA | NA | NA | 2.56 | NA | NA | NA | NA | NA | 0.00 |
| 124 | libggml-cpu.so - ggml-cpu.c:533-2891 [...] | ggml_graph_compute_thread | Single | 0.07 | 0.02 | 0.02 | 0.03 | 0.03 | 0.00 | 0.00 | 111 | 13.44 | 0 | 9.82 | 1 | 1 | 14.62 | 3.38 | NA | NA | NA | NA | NA | 0.00 |
| 1964 | libggml-cpu.so - ops.cpp:6365-6484 [...] | ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool) | InBetween | 0.05 | 0.02 | 0.02 | 0.02 | 0.03 | 0.00 | 0.00 | 155 | 28.84 | 0 | 9.53 | 3.68 | 2.6 | 23.62 | 5.59 | NA | NA | NA | NA | NA | 0.00 |
| 3143 | libggml-cpu.so - quants.c:298-355 [...] | quantize_row_q8_0 | Single | 0.71 | 0.01 | 0.01 | 0.26 | 0.26 | 0.00 | 0.00 | 1 | 1022.75 | 60.7 | 29.66 | 1 | 1.34 | 2.74 | 1 | 0 | 2 | 0 | 0 | 0 | 100.00 |
| 2335 | libggml-cpu.so - ops.cpp:8885-8886 [...] | ggml_compute_forward_flash_attn_ext | Innermost | 0.07 | 0.01 | 0.01 | 0.03 | 0.03 | 0.00 | 0.00 | 25 | 0.00 | 0 | 6.25 | 1.33 | 1 | 16 | 2.5 | 0 | 1 | 0 | 0 | 1 | 50.00 |
| 1968 | libggml-cpu.so - ops.cpp:6446-6457 | ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool) | Innermost | 0.07 | 0.01 | 0.01 | 0.02 | 0.02 | 0.00 | 0.00 | 33 | 162.53 | 0 | 6.25 | 1.13 | 1.69 | 16 | 3.59 | 0 | 0 | 0 | 2 | 0 | 50.00 |
| 1184 | libggml-cpu.so - vec.h:1084-1116 [...] | ggml_vec_swiglu_f32 | Single | 0.52 | 0.01 | 0.01 | 0.19 | 0.19 | 0.00 | 0.00 | 3 | 5407.92 | 100 | 100 | 1.02 | 1 | 1 | 3 | 0 | 3 | 0 | 0 | 0 | 100.00 |
| 1725 | libggml-cpu.so - ops.cpp:4325-4326 | ggml_compute_forward_rms_norm | Innermost | 0.42 | 0.00 | 0.00 | 0.16 | 0.16 | 0.00 | 0.00 | 2 | 343.24 | 0 | 7.81 | 1 | 1.98 | 13.02 | 2 | 0 | 1 | 0 | 0 | 0 | 100.00 |
| 848 | libggml-cpu.so - binary-ops.cpp:18-32 [...] | ggml_compute_forward_mul | Innermost | 0.42 | 0.00 | 0.00 | 0.16 | 0.16 | 0.00 | 0.00 | 7 | 170.71 | 0 | 6.25 | 1 | 1.5 | 16 | 7 | 0 | 3 | 0 | 0 | 0 | 100.00 |
| 1963 | exec - sampling.cpp:125-126 | common_sampler::set_logits(llama_context*, int) | Single | 0.29 | 0.00 | 0.00 | 0.11 | 0.11 | 0.00 | 0.00 | 1 | 0.00 | 33.33 | 8.33 | 2 | 1 | 10.67 | 1 | 0 | 2 | 0 | 0 | 0 | 100.00 |
| 712 | libggml-cpu.so - binary-ops.cpp:10-32 [...] | ggml_compute_forward_add_non_quantized | Innermost | 0.26 | 0.00 | 0.00 | 0.09 | 0.09 | 0.00 | 0.00 | 7 | 261.67 | 0 | 6.25 | 1 | 1.5 | 16 | 7 | 0 | 3 | 0 | 0 | 0 | 100.00 |
| 1969 | libllama.so - stl_heap.h:139-262 [...] | llama_token_data_array_partial_sort_inplace(llama_token_data_array*, int) | Outermost | 0.14 | 0.00 | 0.00 | 0.05 | 0.05 | 0.00 | 0.00 | 1 | 0.00 | 0 | 8.88 | 2.33 | 1 | 13.27 | 1 | NA | NA | NA | NA | NA | 0.00 |