Loops
▶sgemm.cpp: 138 - 170.49 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2364 | 9.02 | 6.92 | 64.39 | 0 | 0 | 2688 | 9.11 | 6.96 | 64.20 | 0 | 0 | 1814 | 5.77 | 4.51 | 41.90 | 0 | 0 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2364) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2688) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1814) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶quants.c: 1066 - 83.87 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4223 | 18.18 | 17.45 | 83.87 | 88.89 | 20.14 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 4223) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶quants.c: 298 - 2.66 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2527 | 0.17 | 0.09 | 0.88 | 58.66 | 28.88 | 2918 | 0.16 | 0.09 | 0.81 | 59.66 | 29.26 | 1951 | 0.22 | 0.10 | 0.97 | 60.7 | 29.66 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2527) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2918) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1951) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | ||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
▶vec.h: 491 - 1.78 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1723 | 0.13 | 0.06 | 0.60 | 100 | 75 | 2109 | 0.14 | 0.06 | 0.57 | 100 | 75 | 1312 | 0.14 | 0.06 | 0.60 | 100 | 75 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1723) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2109) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1312) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶<unknown>: 0 - 1.58 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions | ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2749 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1023 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2157 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2177 | 0.02 | 0.00 | 0.00 | 0 | 0 |
| 2485 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1189 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2262 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4520 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1282 | 0.02 | 0.00 | 0.00 | 0 | 0 | 1118 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2128 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4516 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2842 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2395 | 0.02 | 0.00 | 0.00 | 0 | 0 | 2263 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1905 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2640 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2294 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1058 | 0.02 | 0.00 | 0.00 | 0 | 0 | 2175 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 332 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2504 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2448 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2115 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2513 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2298 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2131 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4518 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2741 | 0.02 | 0.00 | 0.00 | 0 | 0 | 1947 | 0.05 | 0.00 | 0.00 | 0 | 0 | 2349 | 0.02 | 0.00 | 0.00 | 0 | 0 | 4540 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2738 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1948 | 0.00 | 0.00 | 0.00 | 0 | 0 | 298 | 0.01 | 0.00 | 0.00 | 0 | 0 | 746 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 358 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1949 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1139 | 0.03 | 0.00 | 0.04 | 0 | 0 | 3862 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 90 | 0.01 | 0.00 | 0.02 | 0 | 0 | 372 | 0.00 | 0.00 | 0.00 | 0 | 0 | 423 | 0.04 | 0.00 | 0.04 | 0 | 0 | 3863 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 398 | 0.07 | 0.01 | 0.08 | 0 | 0 | 154 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1297 | 0.01 | 0.00 | 0.01 | 0 | 0 | 339 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 85 | 0.01 | 0.00 | 0.01 | 0 | 0 | 444 | 0.09 | 0.01 | 0.09 | 0 | 0 | 1301 | 0.03 | 0.01 | 0.09 | 0 | 0 | 3861 | 0.05 | 0.00 | 0.00 | 0 | 0 |
| 484 | 0.06 | 0.00 | 0.04 | 0 | 0 | 114 | 0.02 | 0.00 | 0.04 | 0 | 0 | 59 | 0.01 | 0.00 | 0.02 | 0 | 0 | 759 | 0.11 | 0.01 | 0.05 | 0 | 0 |
| 5 | 0.03 | 0.00 | 0.04 | 0 | 0 | 124 | 0.03 | 0.01 | 0.08 | 0 | 0 | 1076 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3825 | 0.03 | 0.01 | 0.03 | 0 | 0 |
| 1456 | 0.03 | 0.01 | 0.05 | 0 | 0 | 2099 | 0.03 | 0.01 | 0.07 | 0 | 0 | 1412 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3812 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 87 | 0.01 | 0.00 | 0.01 | 0 | 0 | 6 | 0.02 | 0.01 | 0.05 | 0 | 0 | 1 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1023 | 0.05 | 0.00 | 0.02 | 0 | 0 |
| 1196 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1418 | 0.04 | 0.00 | 0.03 | 0 | 0 | 71 | 0.02 | 0.00 | 0.04 | 0 | 0 | 126 | 0.02 | 0.00 | 0.02 | 0 | 0 |
| 1195 | 0.05 | 0.00 | 0.03 | 0 | 0 | 1707 | 0.03 | 0.01 | 0.06 | 0 | 0 | 4 | 0.03 | 0.00 | 0.04 | 0 | 0 | 2710 | 0.03 | 0.00 | 0.01 | 0 | 0 |
| 1704 | 0.01 | 0.00 | 0.01 | 0 | 0 | 2111 | 0.00 | 0.00 | 0.00 | 0 | 0 | 62 | 0.01 | 0.00 | 0.03 | 0 | 0 | 2708 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 1369 | 0.01 | 0.00 | 0.00 | 0 | 0 | 401 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1074 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3808 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 1730 | 0.01 | 0.00 | 0.00 | 0 | 0 | 108 | 0.01 | 0.00 | 0.00 | 0 | 0 | 993 | 0.02 | 0.00 | 0.00 | 0 | 0 | 127 | 0.03 | 0.01 | 0.03 | 0 | 0 |
| 99 | 0.03 | 0.01 | 0.05 | 0 | 0 | 1412 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1317 | 0.01 | 0.00 | 0.00 | 0 | 0 | 119 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 1874 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2094 | 0.02 | 0.00 | 0.01 | 0 | 0 | 994 | 0.02 | 0.00 | 0.01 | 0 | 0 | 654 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 317 | 0.00 | 0.00 | 0.00 | 0 | 0 | 558 | 0.07 | 0.00 | 0.03 | 0 | 0 | 319 | 0.02 | 0.00 | 0.01 | 0 | 0 | 3 | 0.06 | 0.01 | 0.05 | 0 | 0 |
| 822 | 0.00 | 0.00 | 0.00 | 0 | 0 | 113 | 0.02 | 0.00 | 0.03 | 0 | 0 | 57 | 0.00 | 0.00 | 0.00 | 0 | 0 | 144 | 0.03 | 0.01 | 0.04 | 0 | 0 |
| 121 | 0.01 | 0.00 | 0.00 | 0 | 0 | ||||||||||||||||||
| 3118 | 0.02 | 0.01 | 0.02 | 0 | 0 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶vec.h: 508 - 1.09 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3822 | 0.34 | 0.23 | 1.09 | 85.05 | 20.79 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3822) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶vec.cpp: 311 - 1.07 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 761 | 0.08 | 0.04 | 0.35 | 100 | 66.67 | 866 | 0.08 | 0.04 | 0.37 | 100 | 66.67 | 689 | 0.09 | 0.04 | 0.36 | 100 | 66.67 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 761) | Sum on 1 analyzed binary loop (libggml-cpu.so - 866) | Sum on 1 analyzed binary loop (libggml-cpu.so - 689) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
▶vec.h: 1084 - 1.07 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 765 | 0.33 | 0.04 | 0.38 | 98 | 98.13 | 873 | 0.28 | 0.04 | 0.36 | 100 | 100 | 695 | 0.26 | 0.04 | 0.33 | 98 | 98.13 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 765) | Sum on 1 analyzed binary loop (libggml-cpu.so - 873) | Sum on 1 analyzed binary loop (libggml-cpu.so - 695) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Use of masked instructions | 1 | Use of masked instructions | 1 | Use of masked instructions | 1 | ||||||||||||||||||
▶vec.cpp: 331 - 1.03 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1754 | 0.34 | 0.21 | 1.03 | 5.88 | 8.82 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1754) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
▶ops.cpp: 6220 - 0.56 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | Loop Source Regions |
| |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1462 | 0.06 | 0.03 | 0.27 | 6.67 | 7.5 | 1734 | 0.05 | 0.02 | 0.18 | 15.56 | 9.17 | 3142 | 0.05 | 0.02 | 0.11 | 64.91 | 16.67 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1462) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1734) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3142) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 0 | ||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||
| Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
▶ops.cpp: 6210 - 0.34 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions |
| Loop Source Regions | |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1142 | 0.06 | 0.04 | 0.34 | 5.56 | 7.29 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1142) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Use of masked instructions | 1 | ||||||||||||||||||||||
▶sgemm.cpp: 205 - 0.24 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions |
| Loop Source Regions | |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1812 | 0.05 | 0.03 | 0.24 | 71.43 | 17.86 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1812) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶ggml-quants.c: 216 - 0.18 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 729 | 0.09 | 0.04 | 0.18 | 95 | 22.5 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-base.so - 729) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶vec.h: 1190 - 0.16 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1768 | 0.26 | 0.03 | 0.16 | 98.03 | 24.75 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1768) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | ||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
▶binary-ops.cpp: 10 - 0.12 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions |
| Loop Source Regions | |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 353 | 0.11 | 0.01 | 0.12 | 0 | 6.25 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 353) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
▶ggml-cpu.c: 1193 - 0.10 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 115 | 0.05 | 0.02 | 0.10 | 0 | 12.5 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 115) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
▶ops.cpp: 8885 - 0.10 %
| Run orig_default | Run icx_default | Run aocc_9 | Run icx_1 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions | Loop Source Regions | Loop Source Regions | |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1709 | 0.04 | 0.01 | 0.10 | 0 | 6.25 | ||||||||||||||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1709) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||

