OV - Compare Index

Help is available by moving the cursor above any symbol or by checking MAQAO website.

▶Compared Reports

Global Metrics

Metric		r0	r1	r2
Total Time (s)		284.20	277.60	374.98
Profiled Time (s)		282.15	276.57	374.74
Time in analyzed loops (%)		85.0	90.5	93.4
Time in analyzed innermost loops (%)		68.1	77.5	85.8
Time in user code (%)		87.7	92.6	95.2
Compilation Options Score (%)		75.0	75.0	100
Perfect Flow Complexity		1.02	1.02	1.03
Array Access Efficiency (%)		55.1	57.2	Not Available
GFLOPS		32.712	51.828	0.0
Perfect OpenMP + MPI + Pthread		1.00	1.00	1.00
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		1.00	1.00	1.00
No Scalar Integer	Potential Speedup	1.08	1.05	1.17
No Scalar Integer	Nb Loops to get 80%	12	9	14
FP Vectorised	Potential Speedup	1.08	1.06	1.61
FP Vectorised	Nb Loops to get 80%	8	10	13
Fully Vectorised	Potential Speedup	1.24	1.15	1.67
Fully Vectorised	Nb Loops to get 80%	26	21	15
Only FP Arithmetic	Potential Speedup	1.27	1.14	1.75
Only FP Arithmetic	Nb Loops to get 80%	26	18	15

Cumulated Speedup If No Scalar Integer

Cumulated Speedup If FP Vectorized

Cumulated Speedup If Fully Vectorized

Cumulated Speedup If Only FP Arithmetic

Loop Based Profiles

Innermost / Single Loops

Inbetween Loops

Outermost Loops

Cumulated Coverage With All Loops

Innermost Loop Based Profiles

Coverage

Count

Application Categorization

Time

Coverage

Compilation Options

Source Object	Issue
▼libgromacs_mpi.so.7–
▼pairlist_simd_2xmm.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼threaded_force_buffer.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_gather.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼listed_forces.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼partition.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼manage_threading.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼kernel_prune.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pairs.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pairlist.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼update.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼md_support.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼redistribute.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼mdatoms.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼lincs.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pbc.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼atomdata.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼localtopology.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼vector.tcc–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_solve.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_spread.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼calc_verletbuf.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼vec.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼computemultibodycutoffs.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼bonded.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼settle.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼sim_util.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼grid.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼mshift.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼arrayref.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼domdec_constraints.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼kernel_ElecEw_VdwLJCombLB_VF.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼kernel_outer.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_grid.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.

Source Object	Issue
▼libgromacs_mpi.so.7–
▼pairlist_simd_4xm.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼threaded_force_buffer.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_gather.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼kernel_outer.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼manage_threading.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼kernel_prune.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼reversetopology.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼listoflists.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼vcm.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pairs.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pairlist.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼update.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼md_support.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼redistribute.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼mdatoms.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼lincs.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pbc.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼atomdata.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼localtopology.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼vector.tcc–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_solve.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_spread.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼calc_verletbuf.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼computemultibodycutoffs.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼bonded.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼partition.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼sim_util.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼grid.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼mshift.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼arrayref.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼domdec_constraints.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼vec.h–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼settle.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
▼pme_grid.cpp–
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.

Path Count Profiles

Coverage

Count

Low Iteration Count Profiles

Coverage

Count

Experiment Summaries

	r0	r1	r2
Application	/home/eoseret/GROMACS/install/gplusplus/bin/gmx_mpi	/ccc/work/cont001/ocre/oserete/gromacs-2022.4-install-gcc-ompi/bin/gmx_mpi	/home/eoseret/GROMACS/build/gcc_2/bin/gmx_mpi
Timestamp	2023-07-28 12:01:12	2023-08-08 09:43:00	2023-08-08 09:21:48
Experiment Type	MPI;	same as r0	same as r0
Machine	skylake	inti6224	ip-172-31-47-199
Architecture	x86_64	same as r0	aarch64
Micro Architecture	SKYLAKE	ZEN_V3	ARM_NEOVERSE_V1
Model Name	Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz	AMD EPYC 7763 64-Core Processor
Cache Size	36608 KB	512 KB
Number of Cores	26	64
Maximal Frequency	2.1 GHz	2.45 GHz	0 GHz
OS Version	Linux 6.4.1-arch2-1 #1 SMP PREEMPT_DYNAMIC Tue, 04 Jul 2023 08:39:40 +0000	Linux 4.18.0-305.88.1.el8_4.x86_64 #1 SMP Thu Apr 6 10:22:46 EDT 2023	Linux 5.15.0-1039-aws #44~20.04.1-Ubuntu SMP Thu Jun 22 12:21:08 UTC 2023
Architecture used during static analysis	x86_64	same as r0	aarch64
Micro Architecture used during static analysis	SKYLAKE	ZEN_V3	ARM_NEOVERSE_V1
Compilation Options	libgromacs_mpi.so.7: GNU C++17 13.1.1 20230429 -mavx512f -mfma -mavx512vl -mavx512dq -mavx512bw -mtune=generic -march=x86-64 -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp	libgromacs_mpi.so.7: GNU C++17 12.2.0 -mavx2 -mfma -mtune=generic -march=x86-64 -g -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fexceptions	libgromacs_mpi.so.7: GNU C++17 11.1.0 -march=armv8.2-a+sve -msve-vector-bits=256 -mlittle-endian -mabi=lp64 -g -O3 -O3 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fasynchronous-unwind-tables -fstack-protector-strong -fstack-clash-protection
Number of processes observed	1	same as r0	same as r0
Number of threads observed	1	same as r0	same as r0
Frequency Driver	intel_cpufreq	acpi-cpufreq	NA
Frequency Governor	schedutil	performance	NA
Huge Pages	always	same as r0	madvise
Hyperthreading	off	on	same as r0
Number of sockets	2	same as r0	1
Number of cores per socket	26	64	same as r1
MAQAO version	2.17.7	same as r0	2.17.8
MAQAO build	bf11934ec971510c7f500e010d8ca2474fd787ed::20230726-123240	Build information not available	same as r1
Comments	GROMACS 2022.4 compiled with g++ 13.1.1 running on Skylake with 1 OMP thread, 2000 steps	GROMACS compiled with gcc 12.2.0 + OpenMPI, Zen 3, OV1, 2000 steps, single core	GNU g++ 12.2.0 (SIMD=SVE), AWS G3 (Neoverse V1), 2000 steps, single core

Report Configuration

▶Compared Reports

Global Metrics

Cumulated Speedup If No Scalar Integer

Cumulated Speedup If FP Vectorized

Cumulated Speedup If Fully Vectorized

Cumulated Speedup If Only FP Arithmetic

Loop Based Profiles

Innermost / Single Loops

Inbetween Loops

Outermost Loops

Cumulated Coverage With All Loops

Innermost Loop Based Profiles

Coverage

Count

Application Categorization

Time

Coverage

Compilation Options

Path Count Profiles

Coverage

Count

Low Iteration Count Profiles

Coverage

Count

Experiment Summaries