Loops
▶sgemm.cpp: 138 - 196.18 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2364 | 8.41 | 6.13 | 65.31 | 0 | 0 | 2688 | 8.35 | 6.15 | 65.27 | 0 | 0 | 2437 | 8.36 | 6.13 | 65.60 | 0 | 0 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2364) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2688) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2437) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶quants.c: 1066 - 76.49 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3755 | 9.12 | 8.05 | 76.49 | 88.89 | 20.14 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3755) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶vec.h: 508 - 1.49 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3409 | 0.31 | 0.16 | 1.49 | 4.17 | 7.03 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3409) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
▶<unknown>: 0 - 1.34 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions | ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2749 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2395 | 0.02 | 0.00 | 0.00 | 0 | 0 | 2659 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4074 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2640 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2155 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2756 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4879 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2738 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2153 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2660 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4393 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2450 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2298 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2650 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4210 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2638 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1947 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2759 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4391 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2741 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1948 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2511 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4413 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 765 | 0.14 | 0.00 | 0.05 | 0 | 0 | 813 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2861 | 0.02 | 0.00 | 0.00 | 0 | 0 | 3782 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 484 | 0.01 | 0.00 | 0.00 | 0 | 0 | 815 | 0.00 | 0.00 | 0.00 | 0 | 0 | 99 | 0.02 | 0.01 | 0.06 | 0 | 0 | 3781 | 0.02 | 0.00 | 0.00 | 0 | 0 |
| 90 | 0.01 | 0.00 | 0.02 | 0 | 0 | 709 | 0.00 | 0.00 | 0.00 | 0 | 0 | 357 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1 | 0.02 | 0.00 | 0.02 | 0 | 0 |
| 398 | 0.07 | 0.00 | 0.03 | 0 | 0 | 444 | 0.08 | 0.00 | 0.03 | 0 | 0 | 1530 | 0.01 | 0.00 | 0.02 | 0 | 0 | 633 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 85 | 0.01 | 0.00 | 0.01 | 0 | 0 | 114 | 0.02 | 0.00 | 0.03 | 0 | 0 | 90 | 0.01 | 0.00 | 0.03 | 0 | 0 | 2819 | 0.02 | 0.01 | 0.07 | 0 | 0 |
| 914 | 0.00 | 0.00 | 0.00 | 0 | 0 | 124 | 0.02 | 0.01 | 0.07 | 0 | 0 | 1803 | 0.01 | 0.00 | 0.01 | 0 | 0 | 64 | 0.03 | 0.01 | 0.09 | 0 | 0 |
| 1709 | 0.02 | 0.00 | 0.04 | 0 | 0 | 873 | 0.14 | 0.01 | 0.06 | 0 | 0 | 397 | 0.06 | 0.00 | 0.02 | 0 | 0 | 690 | 0.07 | 0.00 | 0.02 | 0 | 0 |
| 1456 | 0.01 | 0.00 | 0.02 | 0 | 0 | 2099 | 0.02 | 0.00 | 0.03 | 0 | 0 | 85 | 0.01 | 0.00 | 0.01 | 0 | 0 | 75 | 0.01 | 0.00 | 0.02 | 0 | 0 |
| 87 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2111 | 0.01 | 0.00 | 0.01 | 0 | 0 | 5 | 0.02 | 0.00 | 0.01 | 0 | 0 | 2795 | 0.02 | 0.00 | 0.02 | 0 | 0 |
| 1704 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1418 | 0.03 | 0.00 | 0.01 | 0 | 0 | 87 | 0.01 | 0.00 | 0.00 | 0 | 0 | 93 | 0.02 | 0.01 | 0.05 | 0 | 0 |
| 1196 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1707 | 0.01 | 0.00 | 0.02 | 0 | 0 | 1261 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2465 | 0.02 | 0.00 | 0.01 | 0 | 0 |
| 1195 | 0.04 | 0.00 | 0.01 | 0 | 0 | 1412 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1262 | 0.00 | 0.00 | 0.00 | 0 | 0 | 68 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 358 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1120 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1773 | 0.01 | 0.00 | 0.01 | 0 | 0 | 3630 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 99 | 0.02 | 0.01 | 0.05 | 0 | 0 | 558 | 0.02 | 0.00 | 0.01 | 0 | 0 | 1782 | 0.02 | 0.00 | 0.03 | 0 | 0 | 2463 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1730 | 0.01 | 0.00 | 0.01 | 0 | 0 | 108 | 0.01 | 0.00 | 0.00 | 0 | 0 | 483 | 0.04 | 0.00 | 0.01 | 0 | 0 | 3400 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 5 | 0.02 | 0.00 | 0.01 | 0 | 0 | 2094 | 0.01 | 0.00 | 0.00 | 0 | 0 | 819 | 0.14 | 0.01 | 0.06 | 0 | 0 | 3404 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 838 | 0.01 | 0.00 | 0.00 | 0 | 0 | 401 | 0.00 | 0.00 | 0.01 | 0 | 0 | 738 | 0.00 | 0.00 | 0.00 | 0 | 0 | 76 | 0.02 | 0.00 | 0.04 | 0 | 0 |
| 810 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1734 | 0.03 | 0.01 | 0.09 | 0 | 0 | 1622 | 0.11 | 0.00 | 0.03 | 0 | 0 | ||||||
| 6 | 0.01 | 0.00 | 0.01 | 0 | 0 | 922 | 0.03 | 0.00 | 0.01 | 0 | 0 | ||||||||||||
| 113 | 0.01 | 0.00 | 0.01 | 0 | 0 | 3410 | 0.01 | 0.00 | 0.01 | 0 | 0 | ||||||||||||
| 70 | 0.01 | 0.00 | 0.00 | 0 | 0 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶quants.c: 298 - 1.06 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2527 | 0.08 | 0.03 | 0.36 | 58.66 | 28.88 | 2918 | 0.07 | 0.03 | 0.35 | 59.66 | 29.26 | 2600 | 0.07 | 0.03 | 0.35 | 60.7 | 29.66 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2527) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2918) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2600) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | ||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
▶vec.h: 491 - 0.61 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1723 | 0.06 | 0.02 | 0.20 | 100 | 75 | 2109 | 0.06 | 0.02 | 0.21 | 100 | 75 | 1796 | 0.07 | 0.02 | 0.20 | 100 | 75 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1723) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2109) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1796) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶vec.cpp: 331 - 0.57 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1610 | 0.19 | 0.06 | 0.57 | 5.88 | 8.82 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1610) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
▶vec.cpp: 311 - 0.41 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 761 | 0.04 | 0.01 | 0.12 | 100 | 66.67 | 866 | 0.04 | 0.01 | 0.12 | 100 | 66.67 | 811 | 0.05 | 0.02 | 0.16 | 100 | 66.67 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 761) | Sum on 1 analyzed binary loop (libggml-cpu.so - 866) | Sum on 1 analyzed binary loop (libggml-cpu.so - 811) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
▶ggml-quants.c: 203 - 0.36 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 694 | 0.07 | 0.04 | 0.36 | 62.15 | 16.95 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-base.so - 694) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | ||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶ops.cpp: 6220 - 0.29 %
| Run orig_default | Run icx_default | Run aocc_3 | Run icx_5 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions | Loop Source Regions |
| Loop Source Regions | ||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1462 | 0.05 | 0.01 | 0.15 | 6.67 | 7.5 | 1536 | 0.04 | 0.01 | 0.14 | 0 | 6.25 | ||||||||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1462) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1536) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||

