OV - Compare Loops

MAQAO

options

Loops

▶main.cpp: 118 - 159.85 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll							Run Skylake ICPX Ofast Manual Unroll + SoA
Loop Source Regions							Loop Source Regions							Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131						Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
														24	21.74	22.58	80.25	80	42.08	391.09	24	21.94	22.37	79.60	80	42.08	391.9

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24)							Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
														Loop Computation Issues							Loop Computation Issues
														Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1
														Low iteration count						1	Low iteration count						1
														Control Flow Issues							Control Flow Issues
														Low iteration count						1	Low iteration count						1
														Data Access Issues							Data Access Issues
														Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1
														Inefficient Vectorization							Inefficient Vectorization
														Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1
														Use of masked instructions						1	Use of masked instructions						1

▶main.cpp: 71 - 93.29 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll							Run Skylake ICPX Ofast Manual Unroll + SoA
Loop Source Regions							Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 71-76						Loop Source Regions							Loop Source Regions
							28	88.95	89.38	90.46	19.15	16.49	84.26
							27	2.94	2.79	2.83	0	11.61	171.47

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 28, kmeans-icpx-Ofast - 27)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count

▶main.cpp: 116 - 79.19 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll							Run Skylake ICPX Ofast Manual Unroll + SoA
Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 116-122						Loop Source Regions							Loop Source Regions							Loop Source Regions
26	102.54	82.88	79.19	57.89	18.86	78.44

Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 26)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
Control Flow Issues
Presence of more than 4 paths						1
Data Access Issues
Presence of special instructions executing on a single port						1
Vectorization Roadblocks
Presence of more than 4 paths						1
Inefficient Vectorization
Presence of special instructions executing on a single port						1

▶main.cpp: 156 - 21.62 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll							Run Skylake ICPX Ofast Manual Unroll + SoA
Loop Source Regions							Loop Source Regions							Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-161						Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-161
														21	4.70	2.96	10.51	0	11.61	9.62	21	4.59	3.12	11.11	0	11.61	9.63

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21)							Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
														Loop Computation Issues							Loop Computation Issues
														Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
														Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1
														Data Access Issues							Data Access Issues
														Presence of indirect access						1	Presence of indirect access						1
														Vectorization Roadblocks							Vectorization Roadblocks
														Presence of indirect access						1	Presence of indirect access						1

▶main.cpp: 93 - 2.97 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll							Run Skylake ICPX Ofast Manual Unroll + SoA
Loop Source Regions							Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 93-98						Loop Source Regions							Loop Source Regions
							25	5.03	2.93	2.97	0	12.5	9.8

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 25)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
							Loop Computation Issues
							Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
							Data Access Issues
							Presence of indirect access						1
							Vectorization Roadblocks
							Presence of indirect access						1

▶main.cpp: 140 - 2.03 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll							Run Skylake ICPX Ofast Manual Unroll + SoA
Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 140-145						Loop Source Regions							Loop Source Regions							Loop Source Regions
21	3.94	2.13	2.03	0	11.61	10.17

Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
Presence of a large number of scalar integer instructions						1
Data Access Issues
Presence of indirect access						1
Vectorization Roadblocks
Presence of indirect access						1

▶<unknown>: 0 - 0.00 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll							Run Skylake ICPX Ofast Manual Unroll + SoA
Loop Source Regions							Loop Source Regions							Loop Source Regions							Loop Source Regions
							23	0.01	0.00	0.00	0	0	0								22	0.00	0.00	0.00	0	0	NA

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count

×