Loops
▶repack.cpp: 153 - 159.24 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2569 | 2.39 | 2.13 | 54.24 | 0 | 0 | 2960 | 2.11 | 2.01 | 53.05 | 0 | 0 | 2000 | 1.92 | 1.84 | 51.95 | 0 | 0 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2569) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2960) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2000) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶quants.c: 682 - 64.30 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3839 | 4.53 | 3.91 | 64.30 | 93.02 | 21.37 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3839) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶quants.c: 2150 - 35.79 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2544 | 0.46 | 0.43 | 10.86 | 97.09 | 43.87 | 2936 | 0.44 | 0.46 | 12.16 | 96.81 | 44.08 | 1970 | 0.45 | 0.45 | 12.77 | 97.09 | 43.87 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2544) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2936) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1970) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
▶quants.c: 716 - 7.54 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 499 | 0.50 | 0.46 | 7.54 | 46.92 | 13.2 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 499) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶<unknown>: 0 - 1.98 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions | ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2485 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1023 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2125 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4301 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2640 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2395 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2262 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3825 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1126 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2155 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2256 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1922 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2842 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2504 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2448 | 0.02 | 0.00 | 0.01 | 0 | 0 | 4341 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2638 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2294 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2346 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4117 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2741 | 0.00 | 0.00 | 0.00 | 0 | 0 | 444 | 0.09 | 0.00 | 0.04 | 0 | 0 | 2349 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4118 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 765 | 0.03 | 0.00 | 0.02 | 0 | 0 | 278 | 0.01 | 0.00 | 0.06 | 0 | 0 | 1139 | 0.01 | 0.00 | 0.02 | 0 | 0 | 4130 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2586 | 0.02 | 0.00 | 0.06 | 0 | 0 | 114 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2017 | 0.02 | 0.00 | 0.06 | 0 | 0 | 1714 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2527 | 0.04 | 0.00 | 0.02 | 0 | 0 | 124 | 0.02 | 0.00 | 0.09 | 0 | 0 | 423 | 0.02 | 0.00 | 0.01 | 0 | 0 | 1561 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 358 | 0.01 | 0.00 | 0.01 | 0 | 0 | 2954 | 0.00 | 0.00 | 0.00 | 0 | 0 | 353 | 0.05 | 0.00 | 0.04 | 0 | 0 | 648 | 0.02 | 0.00 | 0.07 | 0 | 0 |
| 1704 | 0.01 | 0.00 | 0.00 | 0 | 0 | 108 | 0.00 | 0.00 | 0.00 | 0 | 0 | 56 | 0.01 | 0.00 | 0.01 | 0 | 0 | 877 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1709 | 0.02 | 0.00 | 0.08 | 0 | 0 | 873 | 0.04 | 0.00 | 0.03 | 0 | 0 | 1297 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3505 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1456 | 0.02 | 0.00 | 0.03 | 0 | 0 | 2099 | 0.02 | 0.00 | 0.06 | 0 | 0 | 1301 | 0.02 | 0.00 | 0.06 | 0 | 0 | 1747 | 0.05 | 0.00 | 0.02 | 0 | 0 |
| 87 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2980 | 0.01 | 0.00 | 0.06 | 0 | 0 | 1951 | 0.05 | 0.00 | 0.03 | 0 | 0 | 114 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2562 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1412 | 0.00 | 0.00 | 0.00 | 0 | 0 | 195 | 0.26 | 0.00 | 0.09 | 0 | 0 | 2574 | 0.04 | 0.00 | 0.01 | 0 | 0 |
| 82 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1418 | 0.02 | 0.00 | 0.01 | 0 | 0 | 227 | 0.01 | 0.00 | 0.02 | 0 | 0 | 530 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 1195 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1707 | 0.01 | 0.00 | 0.04 | 0 | 0 | 0 | 0.01 | 0.00 | 0.00 | 0 | 0 | 108 | 0.02 | 0.00 | 0.07 | 0 | 0 |
| 484 | 0.03 | 0.00 | 0.01 | 0 | 0 | 2111 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1076 | 0.00 | 0.00 | 0.00 | 0 | 0 | 117 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 99 | 0.01 | 0.00 | 0.05 | 0 | 0 | 558 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1992 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2572 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 398 | 0.05 | 0.00 | 0.03 | 0 | 0 | 401 | 0.01 | 0.00 | 0.01 | 0 | 0 | 4 | 0.01 | 0.00 | 0.01 | 0 | 0 | 2986 | 0.03 | 0.00 | 0.08 | 0 | 0 |
| 233 | 0.21 | 0.00 | 0.06 | 0 | 0 | 2918 | 0.03 | 0.00 | 0.02 | 0 | 0 | 993 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2975 | 0.01 | 0.00 | 0.03 | 0 | 0 |
| 274 | 0.01 | 0.00 | 0.02 | 0 | 0 | 102 | 0.01 | 0.00 | 0.00 | 0 | 0 | 71 | 0.02 | 0.00 | 0.06 | 0 | 0 | 119 | 0.01 | 0.00 | 0.06 | 0 | 0 |
| 5 | 0.01 | 0.00 | 0.02 | 0 | 0 | 2094 | 0.01 | 0.00 | 0.01 | 0 | 0 | 994 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3433 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 6 | 0.00 | 0.00 | 0.01 | 0 | 0 | 55 | 0.01 | 0.00 | 0.00 | 0 | 0 | 649 | 0.03 | 0.00 | 0.01 | 0 | 0 | ||||||
| 257 | 0.29 | 0.00 | 0.09 | 0 | 0 | 319 | 0.01 | 0.00 | 0.01 | 0 | 0 | 3 | 0.03 | 0.00 | 0.04 | 0 | 0 | ||||||
| 490 | 0.00 | 0.00 | 0.00 | 0 | 0 | 695 | 0.02 | 0.00 | 0.01 | 0 | 0 | 3446 | 0.02 | 0.00 | 0.03 | 0 | 0 | ||||||
| 27 | 0.00 | 0.00 | 0.00 | 0 | 0 | 57 | 0.00 | 0.00 | 0.00 | 0 | 0 | 112 | 0.01 | 0.00 | 0.01 | 0 | 0 | ||||||
| 368 | 0.00 | 0.00 | 0.00 | 0 | 0 | 431 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3429 | 0.01 | 0.00 | 0.01 | 0 | 0 | ||||||
| 1947 | 0.00 | 0.00 | 0.00 | 0 | 0 | 707 | 0.00 | 0.00 | 0.00 | 0 | 0 | 945 | 0.02 | 0.00 | 0.01 | 0 | 0 | ||||||
| 1948 | 0.00 | 0.00 | 0.00 | 0 | 0 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶repack.cpp: 125 - 1.78 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2593 | 0.04 | 0.02 | 0.57 | 99.51 | 83.19 | 2990 | 0.04 | 0.02 | 0.57 | 99.48 | 85.43 | 2024 | 0.04 | 0.02 | 0.64 | 99.51 | 83.19 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2593) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2990) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2024) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Use of masked instructions | 1 | Use of masked instructions | 1 | Use of masked instructions | 1 | ||||||||||||||||||
▶vec.cpp: 331 - 1.02 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1733 | 0.15 | 0.06 | 1.02 | 5.88 | 8.82 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1733) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
▶vec.h: 491 - 0.95 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1723 | 0.04 | 0.01 | 0.31 | 100 | 75 | 2109 | 0.04 | 0.01 | 0.29 | 100 | 75 | 1312 | 0.05 | 0.01 | 0.36 | 100 | 75 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1723) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2109) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1312) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶vec.h: 508 - 0.92 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3443 | 0.15 | 0.06 | 0.92 | 85.05 | 20.79 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3443) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶vec.cpp: 311 - 0.69 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 761 | 0.04 | 0.01 | 0.21 | 100 | 66.67 | 866 | 0.03 | 0.01 | 0.24 | 100 | 66.67 | 689 | 0.04 | 0.01 | 0.24 | 100 | 66.67 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 761) | Sum on 1 analyzed binary loop (libggml-cpu.so - 866) | Sum on 1 analyzed binary loop (libggml-cpu.so - 689) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
▶ops.cpp: 6220 - 0.31 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | Loop Source Regions | ||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1462 | 0.03 | 0.01 | 0.17 | 6.67 | 7.5 | 1734 | 0.02 | 0.01 | 0.14 | 15.56 | 9.17 | ||||||||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1462) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1734) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 0 | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||
▶ops.cpp: 6210 - 0.22 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions |
| Loop Source Regions | |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1142 | 0.03 | 0.01 | 0.22 | 5.56 | 7.29 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1142) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Use of masked instructions | 1 | ||||||||||||||||||||||
▶ggml-cpu.c: 533 - 0.11 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_9 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 135 | 0.03 | 0.01 | 0.11 | 0 | 9.82 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 135) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||||||||

