OV - gmx_mpi - Global

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Total Time (s)		15.59
Profiled Time (s)		13.41
Time in analyzed loops (%)		38.3
Time in analyzed innermost loops (%)		31.1
Time in user code (%)		47.4
Compilation Options Score (%)		100
Array Access Efficiency (%)		53.0

Potential Speedups
Perfect Flow Complexity		1.03
Perfect OpenMP + MPI + Pthread		1.17
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		1.53
No Scalar Integer	Potential Speedup	1.03
No Scalar Integer	Nb Loops to get 80%	13
FP Vectorised	Potential Speedup	1.02
FP Vectorised	Nb Loops to get 80%	10
Fully Vectorised	Potential Speedup	1.10
Fully Vectorised	Nb Loops to get 80%	30
FP Arithmetic Only	Potential Speedup	1.09
FP Arithmetic Only	Nb Loops to get 80%	26

Source Object	Issue
▼libgromacs_mpi.so.9.0.0–
○fft5d.cpp
○threaded_force_buffer.cpp
○pme_pp.cpp
○pme_gather.cpp
○listed_forces.cpp
○simd_prune_kernel.cpp
○partition.cpp
○pairs.cpp
○vec.h
○update.cpp
○md_support.cpp
○pme.cpp
○mdatoms.cpp
○lincs.cpp
○calc_verletbuf.h
○pme_redistribute.cpp
○md.cpp
○fft.cpp
○constr.cpp
○domdec_specatomcomm.cpp
○pme_grid.cpp
○localtopology.cpp
○pme_solve.cpp
○pme_spread.cpp
○calc_verletbuf.cpp
○simd_kernel.h
○fft_mkl.cpp
○bonded.cpp
○domdec.cpp
○sim_util.cpp
○grid.cpp
○settle.cpp
○arrayref.h
○domdec_constraints.cpp
○pairlist.cpp
○pme_only.cpp
○atomdata.cpp
▼[vdso]–
○	-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)

Application	../../install_gcc/bin/gmx_mpi
Timestamp	2024-08-05 19:31:18	Universal Timestamp	1722879078
Number of processes observed	192	Number of threads observed	192
Experiment Type	MPI; OpenMP;
Machine	ins01.benchmarkcenter.megware.com
Model Name	AMD EPYC 9654 96-Core Processor
Architecture	x86_64	Micro Architecture	ZEN_V4
Cache Size	1024 KB	Number of Cores	96
OS Version	Linux 5.14.0-427.18.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 28 06:27:02 EDT 2024
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	ZEN_V4
Frequency Driver	acpi-cpufreq	Frequency Governor	performance
Huge Pages	always	Hyperthreading	on
Number of sockets	2	Number of cores per socket	96
Compilation Options	+ [vdso]: N/A libgromacs_mpi.so.9.0.0: GNU C++17 13.2.0 -march=skylake-avx512 -g -O3 -std=c++17 -fno-omit-frame-pointer -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp
Comments	GROMACS 2024.2 compiled with gcc 13.2 running on two 96 cores AMD Zen 4 processors, using 192 MPI ranks (no OMP). Pinning is controlled by GROMACS.

Dataset
Run Command	<executable> mdrun -s ion_channel.tpr -nsteps 10000 -pin on -deffnm gcc
MPI Command	mpirun -genv I_MPI_FABRICS=shm -n <number_processes>
Number Processes	192
Number Nodes	1
Number Processes per Nodes	192
Filter	Not Used
Profile Start	Not Used