| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_5 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/build/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96-96
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/build/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 111-142
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/build/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 151-258
| Loop Source Regions | - /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 898-898
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 2943-2943
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 6853-6853
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 7154-7154
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 9945-9945
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 10568-10568
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 10805-10805
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 12531-12531
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 15594-15594
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 16191-16191
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 17374-17374
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 17537-17537
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 22069-22069
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 22134-22134
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 24797-24797
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 24809-24809
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/gcc/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96-96
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/gcc/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 262-262
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/armclang_3/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96-96
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/armclang_3/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 111-142
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/armclang_3/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 151-258
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/gcc_5/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96-96
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-47-249.ec2.internal/176-138-2040/llama.cpp/build/gcc_5/_deps/kleidiai_download-src/kai/ukernels/matmul/pack/kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 262-262
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 898-898
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 2943-2943
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 6853-6853
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 7154-7154
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 9945-9945
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 10568-10568
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 10805-10805
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 12531-12531
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 15594-15594
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 16191-16191
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 17374-17374
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 17537-17537
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 22069-22069
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 22134-22134
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 24797-24797
- /opt/arm/gcc-14.2.0_AmazonLinux-2023/lib/gcc/aarch64-linux-gnu/14.2.0/include/arm_neon.h: 24809-24809
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2317 | 0.23 | 0.01 | 0.18 | 77.23 | 96.44 | 2158 | 0.18 | 0.01 | 0.16 | 76.21 | 96.75 | 2496 | 0.21 | 0.01 | 0.18 | 77.23 | 96.44 | 2066 | 0.20 | 0.01 | 0.18 | 76.21 | 96.75 |
| | | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2317) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2158) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2496) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2066) |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |