Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | 24 | 21.74 | 22.58 | 80.25 | 80 | 42.08 | 391.09 | 24 | 21.94 | 22.37 | 79.60 | 80 | 42.08 | 391.9 |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24) | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24) |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | | Loop Computation Issues | |
| | | | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| | | | Low iteration count | 1 | Low iteration count | 1 |
| | | | Control Flow Issues | | Control Flow Issues | |
| | | | Low iteration count | 1 | Low iteration count | 1 |
| | | | Data Access Issues | | Data Access Issues | |
| | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |
| | | | Inefficient Vectorization | | Inefficient Vectorization | |
| | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |
| | | | Use of masked instructions | 1 | Use of masked instructions | 1 |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 71-76
| Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 28 | 88.95 | 89.38 | 90.46 | 19.15 | 16.49 | 84.26 | | |
| 27 | 2.94 | 2.79 | 2.83 | 0 | 11.61 | 171.47 | | |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 28, kmeans-icpx-Ofast - 27) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 116-122
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
26 | 102.54 | 82.88 | 79.19 | 57.89 | 18.86 | 78.44 | | | |
| | | |
Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 26) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
Loop Computation Issues | | | | | | | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | | | |
Control Flow Issues | | | | | | | |
Presence of more than 4 paths | 1 | | | | | | |
Data Access Issues | | | | | | | |
Presence of special instructions executing on a single port | 1 | | | | | | |
Vectorization Roadblocks | | | | | | | |
Presence of more than 4 paths | 1 | | | | | | |
Inefficient Vectorization | | | | | | | |
Presence of special instructions executing on a single port | 1 | | | | | | |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-161
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-161
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | 21 | 4.70 | 2.96 | 10.51 | 0 | 11.61 | 9.62 | 21 | 4.59 | 3.12 | 11.11 | 0 | 11.61 | 9.63 |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21) | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21) |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | | Loop Computation Issues | |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | | | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| | | | Data Access Issues | | Data Access Issues | |
| | | | Presence of indirect access | 1 | Presence of indirect access | 1 |
| | | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | | | Presence of indirect access | 1 | Presence of indirect access | 1 |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 93-98
| Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 25 | 5.03 | 2.93 | 2.97 | 0 | 12.5 | 9.8 | | |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 25) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | | | | |
| | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | |
| | Data Access Issues | | | | | |
| | Presence of indirect access | 1 | | | | |
| | Vectorization Roadblocks | | | | | |
| | Presence of indirect access | 1 | | | | |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 140-145
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
21 | 3.94 | 2.13 | 2.03 | 0 | 11.61 | 10.17 | | | |
| | | |
Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
Loop Computation Issues | | | | | | | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | | | |
Presence of a large number of scalar integer instructions | 1 | | | | | | |
Data Access Issues | | | | | | | |
Presence of indirect access | 1 | | | | | | |
Vectorization Roadblocks | | | | | | | |
Presence of indirect access | 1 | | | | | | |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 23 | 0.01 | 0.00 | 0.00 | 0 | 0 | 0 | | 22 | 0.00 | 0.00 | 0.00 | 0 | 0 | NA |
| | | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |