Run Neoverse V1 GCC Ofast Manual Unroll + SoA | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 74-87
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 74-87
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
2 | 1.70 | 1.31 | 60.50 | 0 | 23.61 | 438.83 | 8 | 1.39 | 1.35 | 66.79 | 11.76 | 25.74 | 382.71 |
| |
Sum on 1 analyzed binary loop (kmeans-gcc-Ofast - 2) | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 8) |
Analysis | Count | Analysis | Count |
Loop Computation Issues | | Loop Computation Issues | |
Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | |
Run Neoverse V1 GCC Ofast Manual Unroll + SoA | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 38 | 0.21 | 0.16 | 7.89 | 0 | 0 | 27.07 |
| |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count |
Run Neoverse V1 GCC Ofast Manual Unroll + SoA | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 113-116
| Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
16 | 0.19 | 0.12 | 5.42 | 0 | 21.59 | 28.82 | |
| |
Sum on 1 analyzed binary loop (kmeans-gcc-Ofast - 16) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count |
Loop Computation Issues | | | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
Presence of a large number of scalar integer instructions | 1 | | |
Data Access Issues | | | |
Presence of indirect access | 1 | | |
Vectorization Roadblocks | | | |
Presence of indirect access | 1 | | |