| Loop id | Source Location | Source Function | Level | Exclusive Coverage gcc_0 (%) | Inclusive Coverage gcc_0 (%) | Max Exclusive Time Over Threads gcc_0 (s) | Max Inclusive Time Over Threads gcc_0 (s) | Exclusive Time w.r.t. Wall Time gcc_0 (s) | Inclusive Time w.r.t. Wall Time gcc_0 (s) | Nb Threads gcc_0 | Vectorization Ratio (%) | Vector Length Use (%) | Speedup If No Scalar Integer | Speedup If FP Vectorized | Speedup If Fully Vectorized | Speedup If Perfect Load Balancing gcc_0 | Stride 0 | Stride 1 | Stride n | Stride Unknown | Stride Indirect | Array Access Efficiency |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2041 | libggml-cpu.so - quants.c:979-1000 [...] | ggml_vec_dot_q8_0_q8_0 | Innermost | 10.60 | 10.60 | 3.49 | 3.49 | 2.72 | 2.72 | 72 | NA | NA | NA | NA | NA | 3.12 | NA | NA | NA | NA | NA | 0.00 |
| 756 | libggml-cpu.so - vec.cpp:385-387 [...] | ggml_vec_swiglu_f32 | Single | 0.05 | 0.05 | 0.39 | 0.39 | 0.01 | 0.01 | 4 | 80 | 97.66 | 1 | 1 | 1.03 | 3.85 | 0 | 0 | 0 | 3 | 0 | 50.00 |
| 3483 | libllama.so - stl_algo.h:1594-1595 [...] | llama_token_data_array_partial_sort_inplace(llama_token_data_array*, int) | Innermost | 0.05 | 0.05 | 0.40 | 0.40 | 0.01 | 0.01 | 1 | 0 | 30.36 | 2.17 | 1 | 3.06 | 1 | 1.5 | 0.5 | 0.5 | 0 | 0 | 93.75 |