| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 1089-1089
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 1095-1112
| Loop Source Regions | - /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /scratch/users/amazouz/Tools/aarch64/compilers/install/gcc-14.2.0_Ubuntu-22.04/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 324-324
- /scratch/users/amazouz/Tools/aarch64/compilers/install/gcc-14.2.0_Ubuntu-22.04/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 4095-4095
- /scratch/users/amazouz/Tools/aarch64/compilers/install/gcc-14.2.0_Ubuntu-22.04/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 10238-10238
- /scratch/users/amazouz/Tools/aarch64/compilers/install/gcc-14.2.0_Ubuntu-22.04/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 12569-12569
- /scratch/users/amazouz/Tools/aarch64/compilers/install/gcc-14.2.0_Ubuntu-22.04/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 26920-26920
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 1089-1089
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 1106-1110
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 2416 | 0.61 | 0.18 | 1.07 | 53.85 | 63.46 | 2103 | 2.92 | 2.71 | 11.32 | 64.52 | 70.16 |
| | | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2416) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2103) |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | | Loop Computation Issues | |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | | | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| | | | Data Access Issues | | Data Access Issues | |
| | | | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| | | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | | | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 |
| Loop Source Regions | - /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 979-1002
| Loop Source Regions | - /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 979-979
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 986-1000
| Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2221 | 0.87 | 0.24 | 1.22 | 0 | 0 | 2041 | 3.49 | 2.72 | 10.60 | 0 | 0 | | |
| | | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2221) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2041) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2746 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1650 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3254 | 0.03 | 0.00 | 0.00 | 0 | 0 | 1897 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 3006 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3375 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3088 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1710 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2580 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3825 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3406 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4232 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2573 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3828 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3251 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3544 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 1264 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4130 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2888 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4202 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2882 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3962 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2881 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4053 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2583 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3432 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3109 | 0.03 | 0.00 | 0.00 | 0 | 0 | 4204 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2884 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3983 | 0.00 | 0.00 | 0.00 | 0 | 0 | 2507 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3906 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2764 | 0.02 | 0.00 | 0.00 | 0 | 0 | 4132 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3107 | 0.02 | 0.00 | 0.00 | 0 | 0 | 4610 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2763 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4159 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3322 | 0.05 | 0.00 | 0.00 | 0 | 0 | 4033 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2885 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2022 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1553 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3294 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 1410 | 0.00 | 0.00 | 0.00 | 0 | 0 | 419 | 0.04 | 0.00 | 0.01 | 0 | 0 | 1550 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3841 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 881 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1525 | 0.02 | 0.00 | 0.01 | 0 | 0 | 980 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4413 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 888 | 0.05 | 0.00 | 0.00 | 0 | 0 | 1523 | 0.01 | 0.00 | 0.00 | 0 | 0 | 296 | 0.00 | 0.00 | 0.00 | 0 | 0 | 339 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 1720 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1516 | 0.01 | 0.00 | 0.00 | 0 | 0 | 53 | 0.01 | 0.00 | 0.00 | 0 | 0 | 845 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 54 | 0.00 | 0.00 | 0.00 | 0 | 0 | 70 | 0.01 | 0.00 | 0.00 | 0 | 0 | 72 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4156 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2200 | 0.00 | 0.00 | 0.00 | 0 | 0 | 493 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1593 | 0.00 | 0.00 | 0.00 | 0 | 0 | 35 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 415 | 0.01 | 0.00 | 0.00 | 0 | 0 | 785 | 0.00 | 0.00 | 0.00 | 0 | 0 | 439 | 0.04 | 0.00 | 0.00 | 0 | 0 | 37 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 53 | 0.00 | 0.00 | 0.00 | 0 | 0 | 62 | 0.01 | 0.00 | 0.01 | 0 | 0 | 1910 | 0.01 | 0.00 | 0.00 | 0 | 0 | 67 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 179 | 0.00 | 0.00 | 0.00 | 0 | 0 | 58 | 0.01 | 0.00 | 0.00 | 0 | 0 | 54 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1612 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1491 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2042 | 0.01 | 0.00 | 0.00 | 0 | 0 | 2401 | 0.00 | 0.00 | 0.00 | 0 | 0 | 820 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1148 | 0.01 | 0.00 | 0.00 | 0 | 0 | 918 | 0.08 | 0.00 | 0.01 | 0 | 0 | 60 | 0.01 | 0.00 | 0.02 | 0 | 0 |
| 1147 | 0.00 | 0.00 | 0.00 | 0 | 0 | | 90 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 754 | 0.03 | 0.00 | 0.01 | 0 | 0 | | 74 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 56 | 0.01 | 0.00 | 0.02 | 0 | 0 | | 2086 | 0.02 | 0.00 | 0.02 | 0 | 0 |
| 775 | 0.01 | 0.00 | 0.00 | 0 | 0 | | 431 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 382 | 0.00 | 0.00 | 0.00 | 0 | 0 | | 794 | 0.12 | 0.00 | 0.02 | 0 | 0 |
| 57 | 0.00 | 0.00 | 0.00 | 0 | 0 | | 788 | 0.02 | 0.00 | 0.02 | 0 | 0 |
| 1192 | 0.00 | 0.00 | 0.00 | 0 | 0 | | 95 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 326 | 0.00 | 0.00 | 0.00 | 0 | 0 | | 1617 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 3894 | 0.25 | 0.01 | 0.03 | 0 | 0 | | 238 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 897 | 0.00 | 0.00 | 0.00 | 0 | 0 | | 66 | 0.01 | 0.00 | 0.01 | 0 | 0 |
| | | 1622 | 0.02 | 0.00 | 0.01 | 0 | 0 |
| | | 830 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| | | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 |
| Loop Source Regions | | Loop Source Regions | - /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1009-1023
- /scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1031-1034
| Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 756 | 0.39 | 0.01 | 0.05 | 80 | 97.66 | | |
| | | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 756) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | | | | |
| | Presence of expensive FP instructions | 1 | | | | |
| | Data Access Issues | | | | | |
| | Presence of constant non-unit stride data access | 1 | | | | |
| | Vectorization Roadblocks | | | | | |
| | Presence of constant non-unit stride data access | 1 | | | | |
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 |
| Loop Source Regions | | Loop Source Regions | - /scratch/users/amazouz/Tools/aarch64/compilers/install/gcc-14.2.0_Ubuntu-22.04/include/c++/14.2.0/bits/stl_algo.h: 1594-1595
- /scratch/users/amazouz/Tools/aarch64/compilers/install/gcc-14.2.0_Ubuntu-22.04/include/c++/14.2.0/bits/stl_heap.h: 262-267
| Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 3483 | 0.40 | 0.01 | 0.05 | 0 | 30.36 | | |
| | | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libllama.so - 3483) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | | | | |
| | Presence of a large number of scalar integer instructions | 1 | | | | |
| | Control Flow Issues | | | | | |
| | Presence of 2 to 4 paths | 1 | | | | |
| | Data Access Issues | | | | | |
| | Presence of constant non-unit stride data access | 1 | | | | |
| | Vectorization Roadblocks | | | | | |
| | Presence of 2 to 4 paths | 1 | | | | |
| | Presence of constant non-unit stride data access | 1 | | | | |