| Loop id | Source Location | Source Function | Level | Max Thread Time / Walltime orig_0 (%) | Exclusive Coverage orig_0 (%) | Inclusive Coverage orig_0 (%) | Max Exclusive Time Over Threads orig_0 (s) | Max Inclusive Time Over Threads orig_0 (s) | Exclusive Time w.r.t. Wall Time orig_0 (s) | Inclusive Time w.r.t. Wall Time orig_0 (s) | Nb Threads orig_0 | GFLOPS orig_0 | Vectorization Ratio (%) | Vector Length Use (%) | Speedup If No Scalar Integer | Speedup If FP Vectorized | Speedup If Fully Vectorized | Speedup If Perfect Load Balancing orig_0 | Stride 0 | Stride 1 | Stride n | Stride Unknown | Stride Indirect | Array Access Efficiency |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 451 | libggml-cpu.so - mmq.cpp:1573-1597 [...] | ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::{lambda(int, int)#2}::operator()(int, int) const::{lambda()#1}::operator()() const | Single | 34.93 | 23.83 | 23.83 | 11.24 | 11.24 | 4.21 | 4.21 | 183 | 85.42 | NA | NA | NA | NA | NA | 2.59 | NA | NA | NA | NA | NA | 0.00 |
| 638 | libggml-cpu.so - mmq.cpp:303-1392 [...] | void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int,... | Innermost | 0.19 | 0.12 | 0.12 | 0.06 | 0.06 | 0.02 | 0.02 | 175 | 0.00 | 90.91 | 38.76 | 1.33 | 1 | 1.45 | 2.58 | 19 | 0 | 0 | 5 | 0 | 89.58 |
| 2188 | libggml-cpu.so - vec.h:491-497 | ggml_compute_forward_flash_attn_ext | Innermost | 0.53 | 0.12 | 0.12 | 0.17 | 0.17 | 0.02 | 0.02 | 37 | 1153.98 | NA | NA | NA | NA | NA | 1.59 | NA | NA | NA | NA | NA | 0.00 |
| 1119 | libggml-cpu.so - vec.cpp:311-316 | ggml_vec_dot_f16 | Single | 0.25 | 0.05 | 0.05 | 0.08 | 0.08 | 0.01 | 0.01 | 32 | 1214.80 | NA | NA | NA | NA | NA | 1.7 | NA | NA | NA | NA | NA | 0.00 |
| 1836 | libggml-cpu.so - ops.cpp:6220-6245 [...] | ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool) | Innermost | 0.11 | 0.04 | 0.04 | 0.04 | 0.04 | 0.01 | 0.01 | 192 | 365.83 | 15.56 | 9.17 | 1 | 1.74 | 5.33 | 5.42 | 1 | 1 | 0 | 0 | 0 | 100.00 |
| 2180 | libggml-cpu.so - ops.cpp:8759-8927 [...] | ggml_compute_forward_flash_attn_ext | InBetween | 0.17 | 0.03 | 0.14 | 0.05 | 0.21 | 0.00 | 0.03 | 35 | 3075.14 | NA | NA | NA | NA | NA | 2.24 | NA | NA | NA | NA | NA | 0.00 |
| 122 | libggml-cpu.so - ggml-cpu.c:533-2891 [...] | ggml_graph_compute_thread | Single | 0.08 | 0.02 | 0.02 | 0.03 | 0.03 | 0.00 | 0.00 | 99 | 4.50 | 0 | 9.82 | 1 | 1 | 14.64 | 3.11 | NA | NA | NA | NA | NA | 0.00 |
| 1807 | libggml-cpu.so - ops.cpp:6365-6484 [...] | ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool) | InBetween | 0.06 | 0.02 | 0.02 | 0.02 | 0.03 | 0.00 | 0.00 | 165 | 26.67 | 0 | 9.75 | 4.05 | 2.43 | 20.84 | 5.08 | NA | NA | NA | NA | NA | 0.00 |
| 403 | libggml-cpu.so - traits.cpp:13-17 [...] | ggml_cpu_extra_compute_forward | Single | 0.06 | 0.01 | 0.01 | 0.02 | 0.02 | 0.00 | 0.00 | 59 | 0.00 | 0 | 12.5 | 1 | 1 | 8 | 3.06 | 0 | 1 | 0 | 1.5 | 0 | 76.04 |
| 2924 | libggml-cpu.so - quants.c:298-355 [...] | quantize_row_q8_0 | Single | 1.03 | 0.01 | 0.01 | 0.33 | 0.33 | 0.00 | 0.00 | 1 | 801.66 | 60.7 | 29.66 | 1 | 1.34 | 2.74 | 1 | 0 | 2 | 0 | 0 | 0 | 100.00 |
| 2179 | libggml-cpu.so - ops.cpp:8885-8886 [...] | ggml_compute_forward_flash_attn_ext | Innermost | 0.06 | 0.01 | 0.01 | 0.02 | 0.02 | 0.00 | 0.00 | 26 | 6.67 | 0 | 6.25 | 1.33 | 1 | 16 | 2.17 | 0 | 1 | 0 | 0 | 1 | 50.00 |
| 1126 | libggml-cpu.so - vec.h:1084-1116 [...] | ggml_vec_swiglu_f32 | Single | 0.61 | 0.01 | 0.01 | 0.19 | 0.19 | 0.00 | 0.00 | 1 | 5009.17 | 100 | 100 | 1.02 | 1 | 1 | 1 | 0 | 3 | 0 | 0 | 0 | 100.00 |
| 812 | libggml-cpu.so - binary-ops.cpp:18-32 [...] | ggml_compute_forward_mul | Innermost | 0.48 | 0.00 | 0.00 | 0.16 | 0.16 | 0.00 | 0.00 | 7 | 183.12 | 0 | 6.25 | 1 | 1.5 | 16 | 6.78 | 0 | 3 | 0 | 0 | 0 | 100.00 |
| 1577 | libggml-cpu.so - ops.cpp:4325-4326 | ggml_compute_forward_rms_norm | Innermost | 0.31 | 0.00 | 0.00 | 0.10 | 0.10 | 0.00 | 0.00 | 3 | 462.98 | 0 | 7.81 | 1 | 1.98 | 13.02 | 2.86 | 0 | 1 | 0 | 0 | 0 | 100.00 |
| 1936 | exec - sampling.cpp:125-126 | common_sampler::set_logits(llama_context*, int) | Single | 0.22 | 0.00 | 0.00 | 0.07 | 0.07 | 0.00 | 0.00 | 1 | 0.00 | 0 | 6.25 | 3 | 1 | 16 | 1 | 0 | 2 | 0 | 0 | 0 | 100.00 |
| 698 | libggml-cpu.so - binary-ops.cpp:10-32 [...] | ggml_compute_forward_add_non_quantized | Innermost | 0.19 | 0.00 | 0.00 | 0.06 | 0.06 | 0.00 | 0.00 | 7 | 452.22 | 0 | 6.25 | 1 | 1.5 | 16 | 7 | 0 | 3 | 0 | 0 | 0 | 100.00 |
| 1889 | libllama.so - stl_heap.h:139-262 [...] | llama_token_data_array_partial_sort_inplace(llama_token_data_array*, int) | Outermost | 0.17 | 0.00 | 0.00 | 0.05 | 0.05 | 0.00 | 0.00 | 1 | 0.00 | 0 | 8.17 | 2.33 | 1 | 14.32 | 1 | NA | NA | NA | NA | NA | 0.00 |