Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 71-76
| Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 8 | 14.99 | 12.69 | 66.84 | 12.5 | 26.56 | 1.87 | | |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 8) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 74-87
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | | 8 | 12.64 | 11.22 | 66.47 | 11.76 | 25.74 | 2.09 |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 8) |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
| Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | 8 | 11.81 | 10.19 | 62.81 | 35.71 | 38.39 | 2.28 | |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 8) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | | | |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-160
| Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | 39 | 1.71 | 1.24 | 7.62 | 0 | 20.83 | 2.03 | |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 39) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | | | |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| | | | Presence of a large number of scalar integer instructions | 1 | | |
| | | | Data Access Issues | | | |
| | | | Presence of indirect access | 1 | | |
| | | | Vectorization Roadblocks | | | |
| | | | Presence of indirect access | 1 | | |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 112-116
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | | 38 | 1.50 | 1.11 | 6.59 | 0 | 20.83 | 3.27 |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 38) |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | | | | | Loop Computation Issues | |
| | | | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | | | | | Presence of a large number of scalar integer instructions | 1 |
| | | | | | Data Access Issues | |
| | | | | | Presence of indirect access | 1 |
| | | | | | Vectorization Roadblocks | |
| | | | | | Presence of indirect access | 1 |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 93-97
| Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 39 | 1.77 | 1.21 | 6.39 | 0 | 20.83 | 1.31 | | |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 39) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | | | | |
| | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | |
| | Presence of a large number of scalar integer instructions | 1 | | | | |
| | Data Access Issues | | | | | |
| | Presence of indirect access | 1 | | | | |
| | Vectorization Roadblocks | | | | | |
| | Presence of indirect access | 1 | | | | |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 140-144
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
38 | 1.75 | 1.18 | 6.09 | 0 | 20.83 | 1.53 | | | |
| | | |
Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 38) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
Loop Computation Issues | | | | | | | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | | | |
Presence of a large number of scalar integer instructions | 1 | | | | | | |
Data Access Issues | | | | | | | |
Presence of indirect access | 1 | | | | | | |
Vectorization Roadblocks | | | | | | | |
Presence of indirect access | 1 | | | | | | |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 115-117
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
7 | 0.13 | 0.05 | 0.27 | 0 | 18.75 | 1.65 | | | |
| | | |
Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 7) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
Control Flow Issues | | | | | | | |
Vectorization Roadblocks | | | | | | | |
Presence of more than 4 paths | 1 | | | | | | |
Run Neoverse V1 ACFL Ofast Base (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast SoA (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Manual Unroll + SoA (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 16 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 16 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 9 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |