Help is available by moving the cursor above any symbol or by checking MAQAO website.
▶Compared Reports
r0: OV1_ZEN4_10K_MPI_scal
r1: OV1_ZEN4_10K_MPI_scal_AMD_libm
Global Metrics
Metric
r0
r1
Total Time (s)
15.65
15.96
Profiled Time (s)
13.47
13.61
Time in analyzed loops (%)
39.7
40.0
Time in analyzed innermost loops (%)
33.3
33.5
Time in user code (%)
48.3
48.8
Compilation Options Score (%)
100
100
Array Access Efficiency (%)
50.3
50.3
Potential Speedups
Perfect Flow Complexity
1.02
1.02
Perfect OpenMP + MPI + Pthread
1.18
1.17
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution
1.52
1.51
No Scalar Integer
Potential Speedup
1.02
1.02
Nb Loops to get 80%
12
12
FP Vectorised
Potential Speedup
1.03
1.03
Nb Loops to get 80%
11
11
Fully Vectorised
Potential Speedup
1.11
1.11
Nb Loops to get 80%
27
27
Only FP Arithmetic
Potential Speedup
1.08
1.08
Nb Loops to get 80%
26
26
Cumulated Speedup If No Scalar Integer
Cumulated Speedup If FP Vectorized
Cumulated Speedup If Fully Vectorized
Cumulated Speedup If Only FP Arithmetic
Loop Based Profiles
Innermost / Single Loops
Inbetween Loops
Outermost Loops
Cumulated Coverage With All Loops
Innermost Loop Based Profiles
Coverage
Count
Application Categorization
Time
Coverage
Compilation Options
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼listed_forces.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼pme_grid.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼constraintrange.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼atomdata.cpp–
○
▼bonded.cpp–
○
▼domdec_constraints.cpp–
○
▼settle.cpp–
○
▼calcvir.cpp–
○
▼pme_only.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Path Count Profiles
Coverage
Count
Low Iteration Count Profiles
Coverage
Count
Experiment Summaries
r0
r1
Experiment Name
Application
../../install_MPI/bin/gmx_mpi
../../install_MPI_AMD_libm/bin/gmx_mpi
Timestamp
2024-08-02 16:05:08
2024-08-02 20:46:34
Experiment Type
MPI;
same as r0
Machine
ins01.benchmarkcenter.megware.com
gmz11.benchmarkcenter.megware.com
Architecture
x86_64
same as r0
Micro Architecture
ZEN_V4
same as r0
Model Name
AMD EPYC 9654 96-Core Processor
same as r0
Cache Size
1024 KB
same as r0
Number of Cores
96
same as r0
Maximal Frequency
3.707812 GHz
same as r0
OS Version
Linux 5.14.0-427.18.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 28 06:27:02 EDT 2024
GROMACS 2024.2 compiled with AOCC 4.1 running on two 96 cores AMD Zen 4 processors, using 1 to 192 MPI ranks (no OMP) [strong scaling]. Pinning is controlled by GROMACS.