Loops
mmq.cpp: 1140 - 38.89 %
| Run orig_default | Run aocc_default | Run icx_5 | Run aocc_6 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 567 | 0.85 | 0.34 | 4.78 | 100 | 100 | 2332.33 | 704 | 0.57 | 0.24 | 3.90 | 100 | 100 | 512.43 | 497 | 0.81 | 0.34 | 4.81 | 100 | 100 | 2307.19 | 653 | 0.67 | 0.24 | 4.01 | 100 | 100 | 2965.08 |
| 566 | 0.91 | 0.51 | 7.24 | 100 | 100 | 387.79 | 705 | 0.63 | 0.23 | 3.77 | 100 | 100 | 3054.79 | 496 | 1.06 | 0.46 | 6.46 | 100 | 100 | 384.65 | 652 | 0.68 | 0.23 | 3.91 | 100 | 100 | 540.52 |
| Sum on 2 analyzed binary loops (libggml-cpu.so - 567, libggml-cpu.so - 566) | Sum on 2 analyzed binary loops (libggml-cpu.so - 704, libggml-cpu.so - 705) | Sum on 2 analyzed binary loops (libggml-cpu.so - 497, libggml-cpu.so - 496) | Sum on 2 analyzed binary loops (libggml-cpu.so - 653, libggml-cpu.so - 652) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | ||||||||||||||||||||||
| Low iteration count | 0 | Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Low iteration count | Low iteration count | Low iteration count | 1 | Low iteration count | |||||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | |||||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | Presence of indirect access | 1 | |||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of more than 4 paths | 0 | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 0 | Presence of indirect access | 1 | ||||||||||||||||||||
vec.h: 491 - 4.19 %
| Run orig_default | Run aocc_default | Run icx_5 | Run aocc_6 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2263 | 0.19 | 0.08 | 1.14 | 0 | 0 | 179.6 | 2437 | 0.10 | 0.06 | 0.93 | 0 | 0 | 190.28 | 2208 | 0.16 | 0.08 | 1.15 | 0 | 0 | 203.89 | 2015 | 0.17 | 0.06 | 0.97 | 0 | 0 | 204.95 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2263) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2437) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2208) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2015) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
vec.cpp: 311 - 2.72 %
| Run orig_default | Run aocc_default | Run icx_5 | Run aocc_6 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1140 | 0.07 | 0.03 | 0.41 | 0 | 0 | 274.68 | 1232 | 0.14 | 0.06 | 0.91 | 0 | 0 | 204.17 | 1146 | 0.07 | 0.03 | 0.42 | 0 | 0 | 188.63 | 1247 | 0.13 | 0.06 | 0.98 | 0 | 0 | 196.21 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1140) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1232) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1146) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1247) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | ||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | ||||||||||||||||||||||
mmq.cpp: 303 - 1.79 %
| Run orig_default | Run aocc_default | Run icx_5 | Run aocc_6 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 659 | 0.06 | 0.03 | 0.39 | 90.91 | 38.76 | 0 | 384 | 0.05 | 0.03 | 0.44 | 90.91 | 38.76 | 0 | 594 | 0.06 | 0.03 | 0.44 | 90.91 | 38.76 | 0 | 326 | 0.07 | 0.03 | 0.52 | 90.91 | 38.76 | 0 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 659) | Sum on 1 analyzed binary loop (libggml-cpu.so - 384) | Sum on 1 analyzed binary loop (libggml-cpu.so - 594) | Sum on 1 analyzed binary loop (libggml-cpu.so - 326) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | ||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
ops.cpp: 8885 - 1.28 %
| Run orig_default | Run aocc_default | Run icx_5 | Run aocc_6 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2254 | 0.09 | 0.02 | 0.27 | 0 | 6.25 | 0.34 | 2419 | 0.09 | 0.02 | 0.36 | 0 | 6.25 | 0.24 | 2199 | 0.09 | 0.02 | 0.27 | 0 | 6.25 | 0.34 | 1997 | 0.10 | 0.02 | 0.37 | 0 | 6.25 | 0.07 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2254) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2419) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2199) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1997) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||||
quants.c: 298 - 0.70 %
| Run orig_default | Run aocc_default | Run icx_5 | Run aocc_6 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3016 | 1.17 | 0.01 | 0.18 | 59.66 | 29.26 | 456.09 | 3307 | 1.04 | 0.01 | 0.18 | 58.33 | 28.75 | 511.87 | 2984 | 1.09 | 0.01 | 0.16 | 60.7 | 29.66 | 489.16 | 2730 | 1.03 | 0.01 | 0.18 | 60.7 | 29.66 | 508.19 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 3016) | Sum on 1 analyzed binary loop (libggml-cpu.so - 3307) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2984) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2730) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||
| Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 0 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||
| Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 0 | ||||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | ||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
vec.h: 1084 - 0.46 %
| Run orig_default | Run aocc_default | Run icx_5 | Run aocc_6 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1147 | 0.06 | 0.01 | 0.11 | 100 | 100 | 2677.48 | 1236 | 0.06 | 0.01 | 0.11 | 98 | 98.13 | 3028.94 | 1153 | 0.07 | 0.01 | 0.10 | 100 | 100 | 2832.04 | 1257 | 0.07 | 0.01 | 0.13 | 98 | 98.13 | 2706.91 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||

