| Run orig_default | Run aocc_default | Run gcc_default | Run aocc_7 | Run icx_3 | Run gcc_3 |
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 354-354
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 79-79
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 86-86
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 1046-1046
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-298
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 304-304
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 319-321
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 184-184
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 240-240
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 793-793
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/pmmintrin.h: 71-71
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 186-186
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 296-296
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 320-320
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 475-475
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 526-526
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 905-905
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 935-935
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1066-1066
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1329-1329
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1472-1472
| Loop Source Regions | | Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 354-354
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 79-79
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 86-86
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 1046-1046
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-298
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 304-304
- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-950-2189/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 319-321
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 184-184
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 240-240
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 793-793
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/pmmintrin.h: 71-71
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 186-186
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 296-296
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 320-320
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 475-475
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 526-526
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 905-905
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 935-935
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1066-1066
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1329-1329
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 2855 | 0.08 | 0.02 | 0.03 | 60.7 | 29.66 | 0.43 | 2527 | 0.08 | 0.02 | 0.03 | 58.66 | 28.88 | 0.46 | 1982 | 0.07 | 0.02 | 0.03 | 66.67 | 31.25 | 0.46 | | 5745 | 0.09 | 0.02 | 0.03 | 60.7 | 29.66 | 0.5 | 1943 | 0.08 | 0.02 | 0.03 | 59.65 | 29.28 | 0 |
| | | | | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2855) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2527) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1982) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 5745) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1943) |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | | | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | | | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 |
| Control Flow Issues | | Control Flow Issues | | Control Flow Issues | | | | Control Flow Issues | | Control Flow Issues | |
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 | | | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 |
| Presence of more than 4 paths | 0 | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 | | | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | | | | Data Access Issues | | Data Access Issues | |
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | | | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 | | | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 |
| Presence of more than 4 paths | 0 | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 | | | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 |
| Inefficient Vectorization | | Inefficient Vectorization | | Inefficient Vectorization | | | | Inefficient Vectorization | | Inefficient Vectorization | |
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |