OV - Compare Loops

Loops

▶sort.f90: 94 - 8.17 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run gcc_default							Run icx_9							Run gcc_1
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 94-108 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 118-119						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 94-110 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 118-122						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 94-108 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 118-119						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 94-110 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/Modules/sort.f90: 118-122
11793	0.66	0.66	1.77	19.05	13.1	31.33	11441	0.05	0.04	0.16	26.32	14.47	18.01	18274	0.67	0.67	1.80	10.53	11.84	30.92	12464	0.05	0.04	0.16	17.65	13.24	16.97
							11437	0.58	0.57	2.15	26.32	14.47	35.08								12460	0.57	0.56	2.12	17.65	13.24	35.75

Sum on 1 analyzed binary loop (exec - 11793)							Sum on 2 analyzed binary loops (exec - 11441, exec - 11437)							Sum on 1 analyzed binary loop (exec - 18274)							Sum on 2 analyzed binary loops (exec - 12464, exec - 12460)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues							Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1
Control Flow Issues							Control Flow Issues							Control Flow Issues							Control Flow Issues
Presence of more than 4 paths						1	Presence of more than 4 paths						1	Presence of more than 4 paths						1	Presence of more than 4 paths						1
Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
Presence of more than 4 paths						1	Presence of more than 4 paths						1	Presence of more than 4 paths						1	Presence of more than 4 paths						1

▶fft_helper_subroutines.f90: 499 - 2.59 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run gcc_default							Run icx_9							Run gcc_1
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_helper_subroutines.f90: 499-499						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_helper_subroutines.f90: 499-499						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_helper_subroutines.f90: 499-499						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_helper_subroutines.f90: 499-499
19559	0.31	0.22	0.59	100	25	0	18206	0.28	0.18	0.68	100	25	0	32206	0.28	0.22	0.59	100	25	0	20418	0.29	0.19	0.73	100	50	0

Sum on 1 analyzed binary loop (exec - 19559)							Sum on 1 analyzed binary loop (exec - 18206)							Sum on 1 analyzed binary loop (exec - 32206)							Sum on 1 analyzed binary loop (exec - 20418)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Data Access Issues							Data Access Issues							Data Access Issues							Data Access Issues
Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access
Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access

▶fft_scatter_2d.f90: 129 - 1.30 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run gcc_default							Run icx_9							Run gcc_1
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_scatter_2d.f90: 129-129						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_scatter_2d.f90: 129-129						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_scatter_2d.f90: 129-129						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/FFTXlib/src/fft_scatter_2d.f90: 129-129
19104	0.38	0.18	0.49	100	25	0	17813	0.14	0.06	0.24	100	25	0	31297	0.24	0.12	0.31	100	25	0	20012	0.11	0.07	0.26	100	50	0

Sum on 1 analyzed binary loop (exec - 19104)							Sum on 1 analyzed binary loop (exec - 17813)							Sum on 1 analyzed binary loop (exec - 31297)							Sum on 1 analyzed binary loop (exec - 20012)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Data Access Issues							Data Access Issues							Data Access Issues							Data Access Issues
Presence of constant non-unit stride data access							Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access							Presence of constant non-unit stride data access
Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
Presence of constant non-unit stride data access							Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access							Presence of constant non-unit stride data access

▶thread_util.f90: 29 - 0.90 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default		Run gcc_default							Run icx_9		Run gcc_1
Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/UtilXlib/thread_util.f90: 29-29						Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/UtilXlib/thread_util.f90: 29-29
		19086	0.16	0.12	0.46	100	25	0			21399	0.14	0.11	0.43	100	50	0

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 19086)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 21399)
Analysis	Count	Analysis						Count	Analysis	Count	Analysis						Count
		Data Access Issues									Data Access Issues
		More than 10% of the vector loads instructions are unaligned						1			More than 10% of the vector loads instructions are unaligned						1

▶<unknown>: 0 - 0.75 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run gcc_default							Run icx_9							Run gcc_1
Loop Source Regions							Loop Source Regions							Loop Source Regions							Loop Source Regions
7839	0.06	0.02	0.04	0	0	27.46	8832	0.05	0.02	0.07	0	0	0	32269	0.04	0.00	0.00	0	0	0	15824	0.04	0.01	0.03	0	0	0
7849	0.06	0.02	0.05	0	0	28.07	6661	0.05	0.01	0.05	0	0	495.62	31290	0.05	0.00	0.00	0	0	0	20356	0.04	0.00	0.00	0	0	0
19097	0.05	0.00	0.00	0	0	0	13661	0.03	0.00	0.00	0	0	0	24795	0.04	0.00	0.00	0	0	0	5398	0.03	0.00	0.01	0	0	54.43
7375	0.04	0.00	0.00	0	0	127.39	14263	0.03	0.00	0.00	0	0	84.24	31300	0.04	0.00	0.00	0	0	0	19988	0.06	0.01	0.02	0	0	0
8882	0.05	0.01	0.02	0	0	933.86	17789	0.03	0.00	0.00	0	0	0	8563	0.04	0.00	0.00	0	0	25.36	3742	0.04	0.00	0.01	0	0	1556.03
9153	0.05	0.00	0.00	0	0	19.41	17797	0.04	0.00	0.01	0	0	0	31277	0.05	0.00	0.00	0	0	0	20005	0.07	0.01	0.04	0	0	0
8886	0.06	0.00	0.01	0	0	915.3	8285	0.03	0.00	0.00	0	0	25.27	31285	0.04	0.00	0.00	0	0	0	15819	0.03	0.00	0.00	0	0	80.03
15112	0.06	0.02	0.06	0	0	421.95	4682	0.03	0.00	0.00	0	0	226.59	5984	0.04	0.00	0.00	0	0	1239.35	20014	0.04	0.00	0.00	0	0	0
3963	0.06	0.01	0.02	0	0	1086.74	18144	0.04	0.00	0.00	0	0	0	13575	0.04	0.00	0.00	0	0	51.64	15145	0.03	0.00	0.00	0	0	0
19603	0.04	0.00	0.00	0	0	0	14920	0.03	0.00	0.00	0	0	109.51	24880	0.05	0.01	0.04	0	0	1476.7	19975	0.04	0.00	0.00	0	0	0
15122	0.04	0.00	0.00	0	0	0	17815	0.04	0.00	0.00	0	0	0	39195	0.04	0.00	0.00	0	0	0	7260	0.04	0.00	0.01	0	0	186.27
9756	0.04	0.00	0.00	0	0	0	4943	0.03	0.00	0.00	0	0	235.86	13203	0.04	0.00	0.00	0	0	1204.56	15887	0.05	0.01	0.04	0	0	1956.27
							3394	0.05	0.01	0.05	0	0	1349.62	13210	0.04	0.00	0.00	0	0	1258.72	9646	0.04	0.01	0.04	0	0	0
							17808	0.05	0.00	0.01	0	0	0	11210	0.04	0.00	0.00	0	0	121.03
							14270	0.04	0.02	0.06	0	0	0

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count

▶vloc_psi.f90: 475 - 0.66 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default		Run gcc_default							Run icx_9		Run gcc_1
Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/vloc_psi.f90: 475-475						Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/vloc_psi.f90: 475-475
		8284	0.19	0.12	0.45	75	21.88	932.79			9066	0.11	0.05	0.21	100	50	589.8

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 8284)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 9066)
Analysis	Count	Analysis						Count	Analysis	Count	Analysis						Count
		Loop Computation Issues									Loop Computation Issues
		Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1			Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
		Data Access Issues									Data Access Issues
		Presence of constant non-unit stride data access						0			Presence of constant non-unit stride data access						1
		More than 10% of the vector loads instructions are unaligned						1			More than 10% of the vector loads instructions are unaligned						1
		Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1
		Vectorization Roadblocks									Vectorization Roadblocks
		Presence of constant non-unit stride data access									Presence of constant non-unit stride data access						1
		Inefficient Vectorization									Inefficient Vectorization
		Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1

▶h_psi.f90: 140 - 0.47 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default		Run gcc_default							Run icx_9							Run gcc_1
Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/h_psi.f90: 140-140						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/h_psi.f90: 140-140						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/h_psi.f90: 140-140
		7089	0.08	0.05	0.17	72.73	21.59	97.45	11811	0.09	0.06	0.16	92.86	40.18	23.56	7768	0.06	0.04	0.13	100	50	38.47

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 7089)							Sum on 1 analyzed binary loop (exec - 11811)							Sum on 1 analyzed binary loop (exec - 7768)
Analysis	Count	Analysis						Count	Analysis						Count	Analysis						Count
		Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
		Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
		Data Access Issues							Data Access Issues							Data Access Issues
		Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						1
		Presence of indirect access						0	Presence of indirect access						1	Presence of indirect access						0
		More than 10% of the vector loads instructions are unaligned						1	More than 10% of the vector loads instructions are unaligned						1	More than 10% of the vector loads instructions are unaligned						1
		Presence of expensive instructions: scatter/gather						0	Presence of expensive instructions: scatter/gather						1	Presence of expensive instructions: scatter/gather						0
		Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1
		Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
		Presence of constant non-unit stride data access							Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						1
		Presence of indirect access							Presence of indirect access						1	Presence of indirect access						0
		Inefficient Vectorization							Inefficient Vectorization							Inefficient Vectorization
		Presence of expensive instructions: scatter/gather						0	Presence of expensive instructions: scatter/gather						1	Presence of expensive instructions: scatter/gather						0
		Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1

▶init_us_2_acc.f90: 153 - 0.35 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default		Run gcc_default							Run icx_9		Run gcc_1
Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/upflib/init_us_2_acc.f90: 153-153						Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/upflib/init_us_2_acc.f90: 153-153
		14265	0.06	0.04	0.14	80	22.5	1150.95			15818	0.08	0.05	0.21	100	50	281.68

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 14265)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 15818)
Analysis	Count	Analysis						Count	Analysis	Count	Analysis						Count
		Loop Computation Issues									Loop Computation Issues
		Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1			Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA
		Data Access Issues									Data Access Issues
		Presence of constant non-unit stride data access						0			Presence of constant non-unit stride data access						1
		More than 10% of the vector loads instructions are unaligned						1			More than 10% of the vector loads instructions are unaligned						1
		Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1
		Vectorization Roadblocks									Vectorization Roadblocks
		Presence of constant non-unit stride data access									Presence of constant non-unit stride data access						1
		Inefficient Vectorization									Inefficient Vectorization
		Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1

▶usnldiag.f90: 102 - 0.32 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default		Run gcc_default							Run icx_9		Run gcc_1
Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/usnldiag.f90: 102-107						Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/usnldiag.f90: 103-107
		8064	0.07	0.06	0.21	87.1	23.39	1395.56			8798	0.05	0.03	0.11	100	50	2553.12

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 8064)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 8798)
Analysis	Count	Analysis						Count	Analysis	Count	Analysis						Count
		Loop Computation Issues									Loop Computation Issues
		Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1			Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA
		Data Access Issues									Data Access Issues
		Presence of constant non-unit stride data access						0			Presence of constant non-unit stride data access						1
		More than 10% of the vector loads instructions are unaligned						1			More than 10% of the vector loads instructions are unaligned						1
		Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1
		Vectorization Roadblocks									Vectorization Roadblocks
		Presence of constant non-unit stride data access									Presence of constant non-unit stride data access						1
		Inefficient Vectorization									Inefficient Vectorization
		Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1

▶vloc_psi.f90: 474 - 0.31 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s
Run orig_default							Run gcc_default		Run icx_9							Run gcc_1
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/vloc_psi.f90: 474-475						Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/PW/src/vloc_psi.f90: 474-475						Loop Source Regions
9152	0.11	0.06	0.17	60	20	455.24			13573	0.10	0.05	0.14	100	42.86	552.37

Sum on 1 analyzed binary loop (exec - 9152)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 13573)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis	Count	Analysis						Count	Analysis	Count
Loop Computation Issues									Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1			Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
Data Access Issues									Data Access Issues
More than 10% of the vector loads instructions are unaligned						1			More than 10% of the vector loads instructions are unaligned						1
Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1
Inefficient Vectorization									Inefficient Vectorization
Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1

▶qvan2.f90: 143 - 0.17 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	GFLOP/s
Run orig_default		Run gcc_default							Run icx_9		Run gcc_1
Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/upflib/qvan2.f90: 143-158						Loop Source Regions		Loop Source Regions
		14332	0.06	0.04	0.17	68.9	19.82	1523.9

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 14332)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis	Count	Analysis						Count	Analysis	Count	Analysis	Count
		Loop Computation Issues
		Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
		Data Access Issues
		Presence of constant non-unit stride data access						1
		Presence of indirect access						1
		More than 10% of the vector loads instructions are unaligned						1
		Presence of special instructions executing on a single port						1
		More than 20% of the loads are accessing the stack						1
		Vectorization Roadblocks
		Presence of constant non-unit stride data access						1
		Presence of indirect access						1
		Inefficient Vectorization
		Presence of special instructions executing on a single port						1

▶init_us_2_acc.f90: 150 - 0.17 %

ASM Loop ID	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s
Run orig_default		Run gcc_default		Run icx_9							Run gcc_1
Loop Source Regions		Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/upflib/init_us_2_acc.f90: 150-154						Loop Source Regions
				24783	0.07	0.06	0.17	94.44	42.36	244.72

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (exec - 24783)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis	Count	Analysis	Count	Analysis						Count	Analysis	Count
				Data Access Issues
				Presence of indirect access						1
				More than 10% of the vector loads instructions are unaligned						1
				Presence of expensive instructions: scatter/gather						1
				Presence of special instructions executing on a single port						1
				Vectorization Roadblocks
				Presence of indirect access						1
				Inefficient Vectorization
				Presence of expensive instructions: scatter/gather						1
				Presence of special instructions executing on a single port						1

▶qvan2.f90: 141 - 0.13 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	GFLOP/s
Run orig_default							Run gcc_default		Run icx_9		Run gcc_1
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix02.benchmarkcenter.megware.com/177-218-1582/qe/build/qe/upflib/qvan2.f90: 141-160						Loop Source Regions		Loop Source Regions		Loop Source Regions
15203	0.07	0.05	0.13	70.42	20.95	1178.46

Sum on 1 analyzed binary loop (exec - 15203)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis	Count	Analysis	Count	Analysis	Count
Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
Data Access Issues
Presence of constant non-unit stride data access						1
More than 10% of the vector loads instructions are unaligned						1
Presence of special instructions executing on a single port						1
Vectorization Roadblocks
Presence of constant non-unit stride data access						1
Inefficient Vectorization
Presence of special instructions executing on a single port						1

Report Configuration

Loops

▶sort.f90: 94 - 8.17 %

▶fft_helper_subroutines.f90: 499 - 2.59 %

▶fft_scatter_2d.f90: 129 - 1.30 %

▶thread_util.f90: 29 - 0.90 %

▶<unknown>: 0 - 0.75 %

▶vloc_psi.f90: 475 - 0.66 %

▶h_psi.f90: 140 - 0.47 %

▶init_us_2_acc.f90: 153 - 0.35 %

▶usnldiag.f90: 102 - 0.32 %

▶vloc_psi.f90: 474 - 0.31 %

▶qvan2.f90: 143 - 0.17 %

▶init_us_2_acc.f90: 150 - 0.17 %

▶qvan2.f90: 141 - 0.13 %