Loops
mmq.cpp: 303 - 4.35 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 659 | 0.05 | 0.03 | 0.98 | 90.91 | 38.76 | 0 | 384 | 0.07 | 0.03 | 1.21 | 90.91 | 38.76 | 0 | 585 | 0.06 | 0.03 | 1.08 | 100 | 41.39 | 0 | 386 | 0.07 | 0.03 | 1.08 | 90.91 | 38.76 | 0 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 659) | Sum on 1 analyzed binary loop (libggml-cpu.so - 384) | Sum on 1 analyzed binary loop (libggml-cpu.so - 585) | Sum on 1 analyzed binary loop (libggml-cpu.so - 386) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 0 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | ||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
vec.h: 491 - 1.85 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2263 | 0.04 | 0.01 | 0.49 | 0 | 0 | 247.14 | 2437 | 0.04 | 0.01 | 0.38 | 0 | 0 | 212.22 | 2094 | 0.04 | 0.02 | 0.54 | 0 | 0 | 194.84 | 2442 | 0.04 | 0.01 | 0.44 | 0 | 0 | 94.95 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2263) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2437) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2094) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2442) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||
binary-ops.cpp: 18 - 1.60 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 833 | 0.03 | 0.01 | 0.35 | 0 | 6.25 | 29.28 | 914 | 0.04 | 0.01 | 0.39 | 0 | 6.25 | 26.75 | 785 | 0.04 | 0.01 | 0.49 | 0 | 6.25 | 21.21 | 913 | 0.03 | 0.01 | 0.37 | 0 | 6.25 | 24.44 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 833) | Sum on 1 analyzed binary loop (libggml-cpu.so - 914) | Sum on 1 analyzed binary loop (libggml-cpu.so - 785) | Sum on 1 analyzed binary loop (libggml-cpu.so - 913) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||
| Low iteration count | 0 | Low iteration count | 0 | Low iteration count | 1 | Low iteration count | 0 | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Low iteration count | Low iteration count | Low iteration count | 1 | Low iteration count | |||||||||||||||||||||||
vec.h: 1084 - 1.13 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1147 | 0.03 | 0.01 | 0.24 | 100 | 100 | 1544.62 | 1236 | 0.03 | 0.01 | 0.33 | 98 | 98.13 | 1149.33 | 1117 | 0.03 | 0.01 | 0.27 | 100 | 100 | 1343.85 | 1299 | 0.03 | 0.01 | 0.29 | 98 | 98.13 | 1265.17 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1147) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1236) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1117) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1299) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||
| Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | ||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||
| Use of masked instructions | 1 | Use of masked instructions | 1 | Use of masked instructions | 1 | Use of masked instructions | 1 | ||||||||||||||||||||
vec.cpp: 311 - 1.13 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1140 | 0.02 | 0.01 | 0.28 | 0 | 0 | 451.48 | 1232 | 0.05 | 0.01 | 0.26 | 0 | 0 | 376.98 | 1110 | 0.03 | 0.01 | 0.23 | 0 | 0 | 473.82 | 1289 | 0.03 | 0.01 | 0.36 | 0 | 0 | 270.93 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1140) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1232) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1110) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1289) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | ||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | ||||||||||||||||||||||
ops.cpp: 4325 - 1.05 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1610 | 0.03 | 0.01 | 0.28 | 0 | 7.81 | 70.26 | 1758 | 0.03 | 0.01 | 0.28 | 0 | 7.81 | 68.46 | 1591 | 0.02 | 0.01 | 0.27 | 0 | 7.81 | 71.5 | 1820 | 0.03 | 0.01 | 0.23 | 100 | 31.25 | 78.87 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1610) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1758) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1591) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1820) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||
ops.cpp: 6220 - 1.00 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1888 | 0.03 | 0.01 | 0.23 | 15.56 | 9.17 | 398.71 | 2113 | 0.03 | 0.01 | 0.38 | 1.96 | 6.62 | 337.83 | 1829 | 0.03 | 0.01 | 0.22 | 19.15 | 9.84 | 410.7 | 2136 | 0.02 | 0.00 | 0.17 | 0 | 6.25 | 153.55 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2113) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1829) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||||||
quants.c: 298 - 0.91 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3016 | 0.59 | 0.01 | 0.23 | 59.66 | 29.26 | 441.17 | 3307 | 0.52 | 0.01 | 0.21 | 58.33 | 28.75 | 500.48 | 2807 | 0.63 | 0.01 | 0.24 | 60.7 | 29.66 | 412.58 | 3325 | 0.58 | 0.01 | 0.23 | 60.7 | 29.66 | 442.42 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2807) | Sum on 1 analyzed binary loop (libggml-cpu.so - 3325) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||||
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | ||||||||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | ||||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||
ggml-cpu.c: 3204 - 0.82 %
| Run orig_default | Run aocc_default | Run icx_8 | Run aocc_3 | ||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 0.02 | 0.01 | 0.22 | 100 | 100 | 0 | 5 | 0.03 | 0.01 | 0.23 | 100 | 100 | 1.27 | 6 | 0.02 | 0.00 | 0.16 | 100 | 100 | 0 | 5 | 0.03 | 0.01 | 0.21 | 100 | 100 | 0 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 5) | ||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||

