Loops
▶repack.cpp: 125 - 127.26 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2594 | 0.04 | 0.02 | 0.61 | 99.51 | 83.19 | 2985 | 1.64 | 1.40 | 40.82 | 99.19 | 85.93 | 2218 | 0.05 | 0.02 | 0.58 | 99.51 | 83.19 | ||||||
| 2591 | 1.64 | 1.41 | 41.58 | 99.23 | 80.75 | 2990 | 0.04 | 0.02 | 0.65 | 99.48 | 85.43 | 2215 | 2.10 | 1.63 | 43.02 | 99.23 | 80.75 | ||||||
| Sum on 2 analyzed binary loops (libggml-cpu.so - 2594, libggml-cpu.so - 2591) | Sum on 2 analyzed binary loops (libggml-cpu.so - 2985, libggml-cpu.so - 2990) | Sum on 2 analyzed binary loops (libggml-cpu.so - 2218, libggml-cpu.so - 2215) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 0 | More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Use of masked instructions | 1 | Use of masked instructions | 1 | Use of masked instructions | 1 | ||||||||||||||||||
▶quants.c: 682 - 63.80 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3699 | 5.42 | 4.89 | 63.80 | 93.02 | 21.37 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3699) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶quants.c: 2150 - 37.51 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2545 | 0.48 | 0.43 | 12.78 | 97.09 | 43.87 | 2936 | 0.49 | 0.46 | 13.35 | 96.81 | 44.08 | 2166 | 0.49 | 0.43 | 11.38 | 97.09 | 43.87 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2545) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2936) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2166) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||
▶repack.cpp: 153 - 12.79 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2584 | 0.28 | 0.15 | 4.37 | 0 | 0 | 2975 | 0.25 | 0.15 | 4.28 | 0 | 0 | 2208 | 0.28 | 0.16 | 4.14 | 0 | 0 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2584) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2975) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2208) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶quants.c: 716 - 11.40 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 494 | 0.90 | 0.87 | 11.40 | 46.92 | 13.2 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 494) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶<unknown>: 0 - 2.54 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions | ||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2485 | 0.02 | 0.00 | 0.01 | 0 | 0 | 2395 | 0.02 | 0.00 | 0.01 | 0 | 0 | 2426 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4113 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1282 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2155 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2417 | 0.01 | 0.00 | 0.00 | 0 | 0 | 441 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1126 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2189 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2197 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4116 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2741 | 0.02 | 0.00 | 0.01 | 0 | 0 | 2298 | 0.02 | 0.00 | 0.01 | 0 | 0 | 2330 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4128 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2738 | 0.00 | 0.00 | 0.00 | 0 | 0 | 444 | 0.06 | 0.00 | 0.08 | 0 | 0 | 2517 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4122 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 0 | 0.01 | 0.00 | 0.00 | 0 | 0 | 106 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2414 | 0.00 | 0.00 | 0.00 | 0 | 0 | 867 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 99 | 0.01 | 0.00 | 0.07 | 0 | 0 | 124 | 0.01 | 0.00 | 0.09 | 0 | 0 | 315 | 0.01 | 0.00 | 0.01 | 0 | 0 | 3515 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 485 | 0.02 | 0.00 | 0.02 | 0 | 0 | 1412 | 0.00 | 0.00 | 0.00 | 0 | 0 | 766 | 0.09 | 0.00 | 0.08 | 0 | 0 | 3514 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2563 | 0.08 | 0.00 | 0.03 | 0 | 0 | 1120 | 0.00 | 0.00 | 0.00 | 0 | 0 | 201 | 0.22 | 0.00 | 0.07 | 0 | 0 | 2855 | 0.04 | 0.01 | 0.08 | 0 | 0 |
| 399 | 0.08 | 0.00 | 0.09 | 0 | 0 | 873 | 0.05 | 0.00 | 0.07 | 0 | 0 | 1271 | 0.03 | 0.00 | 0.05 | 0 | 0 | 105 | 0.02 | 0.01 | 0.07 | 0 | 0 |
| 83 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2099 | 0.01 | 0.00 | 0.07 | 0 | 0 | 353 | 0.08 | 0.00 | 0.08 | 0 | 0 | 114 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 85 | 0.00 | 0.00 | 0.00 | 0 | 0 | 6 | 0.01 | 0.00 | 0.04 | 0 | 0 | 63 | 0.01 | 0.00 | 0.02 | 0 | 0 | 631 | 0.06 | 0.00 | 0.02 | 0 | 0 |
| 1731 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2980 | 0.02 | 0.00 | 0.09 | 0 | 0 | 433 | 0.02 | 0.00 | 0.02 | 0 | 0 | 2844 | 0.02 | 0.00 | 0.03 | 0 | 0 |
| 1197 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2094 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1094 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1635 | 0.13 | 0.00 | 0.06 | 0 | 0 |
| 1710 | 0.02 | 0.00 | 0.09 | 0 | 0 | 1418 | 0.02 | 0.00 | 0.02 | 0 | 0 | 5 | 0.01 | 0.00 | 0.03 | 0 | 0 | 116 | 0.01 | 0.00 | 0.04 | 0 | 0 |
| 1457 | 0.02 | 0.00 | 0.04 | 0 | 0 | 1707 | 0.02 | 0.00 | 0.05 | 0 | 0 | 1095 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3292 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| 359 | 0.01 | 0.00 | 0.01 | 0 | 0 | 401 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1466 | 0.00 | 0.00 | 0.00 | 0 | 0 | 109 | 0.01 | 0.00 | 0.03 | 0 | 0 |
| 1705 | 0.01 | 0.00 | 0.01 | 0 | 0 | 102 | 0.01 | 0.00 | 0.02 | 0 | 0 | 1440 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2442 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1196 | 0.04 | 0.00 | 0.03 | 0 | 0 | 558 | 0.03 | 0.00 | 0.03 | 0 | 0 | 2186 | 0.15 | 0.00 | 0.05 | 0 | 0 | 2689 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 5 | 0.01 | 0.00 | 0.02 | 0 | 0 | 2111 | 0.01 | 0.00 | 0.01 | 0 | 0 | 78 | 0.02 | 0.00 | 0.07 | 0 | 0 | 2444 | 0.02 | 0.00 | 0.01 | 0 | 0 |
| 82 | 0.01 | 0.00 | 0.03 | 0 | 0 | 257 | 0.27 | 0.00 | 0.09 | 0 | 0 | 2211 | 0.01 | 0.00 | 0.08 | 0 | 0 | 3296 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 766 | 0.07 | 0.00 | 0.08 | 0 | 0 | 2954 | 0.11 | 0.00 | 0.04 | 0 | 0 | 444 | 0.01 | 0.00 | 0.00 | 0 | 0 | 111 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 233 | 0.17 | 0.00 | 0.06 | 0 | 0 | 1947 | 0.01 | 0.00 | 0.00 | 0 | 0 | 525 | 0.00 | 0.00 | 0.00 | 0 | 0 | ||||||
| 2587 | 0.02 | 0.00 | 0.09 | 0 | 0 | 815 | 0.01 | 0.00 | 0.00 | 0 | 0 | 132 | 0.03 | 0.01 | 0.09 | 0 | 0 | ||||||
| 313 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3 | 0.03 | 0.00 | 0.04 | 0 | 0 | ||||||||||||
| 3309 | 0.01 | 0.00 | 0.02 | 0 | 0 | ||||||||||||||||||
| 893 | 0.03 | 0.00 | 0.01 | 0 | 0 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶vec.h: 491 - 1.49 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1724 | 0.05 | 0.02 | 0.49 | 100 | 75 | 2109 | 0.05 | 0.02 | 0.52 | 100 | 75 | 1459 | 0.04 | 0.02 | 0.48 | 100 | 75 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1724) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2109) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1459) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
▶vec.cpp: 311 - 0.98 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | |||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 762 | 0.04 | 0.01 | 0.34 | 100 | 66.67 | 866 | 0.04 | 0.01 | 0.36 | 100 | 66.67 | 758 | 0.04 | 0.01 | 0.28 | 100 | 66.67 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 762) | Sum on 1 analyzed binary loop (libggml-cpu.so - 866) | Sum on 1 analyzed binary loop (libggml-cpu.so - 758) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||
| Low iteration count | Low iteration count | 1 | Low iteration count | ||||||||||||||||||||
▶vec.cpp: 331 - 0.94 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1621 | 0.20 | 0.07 | 0.94 | 5.88 | 8.82 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1621) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
▶vec.h: 508 - 0.91 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3306 | 0.18 | 0.07 | 0.91 | 85.05 | 20.79 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 3306) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶ops.cpp: 6220 - 0.53 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions | Loop Source Regions | ||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1463 | 0.04 | 0.01 | 0.28 | 6.67 | 7.5 | 1734 | 0.03 | 0.01 | 0.25 | 15.56 | 9.17 | ||||||||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1463) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1734) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 0 | ||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||
▶ops.cpp: 6210 - 0.34 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions |
| Loop Source Regions | |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1274 | 0.04 | 0.01 | 0.34 | 5.56 | 7.29 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1274) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| Use of masked instructions | 1 | ||||||||||||||||||||||
▶ggml-quants.c: 216 - 0.12 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 638 | 0.04 | 0.01 | 0.12 | 95 | 22.5 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-base.so - 638) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | |||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
| Control Flow Issues | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of calls | 1 | ||||||||||||||||||||||
| Inefficient Vectorization | |||||||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||
▶ops.cpp: 8885 - 0.11 %
| Run orig_default | Run icx_default | Run aocc_7 | Run icx_8 | ||||||||||||||||||||
| Loop Source Regions | Loop Source Regions | Loop Source Regions |
| Loop Source Regions | |||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1445 | 0.03 | 0.00 | 0.11 | 0 | 6.25 | ||||||||||||||||||
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1445) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||||||||
| Presence of indirect access | 1 | ||||||||||||||||||||||

