Help is available by moving the cursor above any symbol or by checking MAQAO website.
▶Compared Reports
r0: libm
r1: libalm
Global Metrics
Metric
r0
r1
Total Time (s)
834.18
832.93
Profiled Time (s)
833.60
832.29
Time in analyzed loops (%)
91.6
91.5
Time in analyzed innermost loops (%)
77.4
77.1
Time in user code (%)
93.1
93.2
Compilation Options Score (%)
100
100
Array Access Efficiency (%)
48.1
48.1
Potential Speedups
Perfect Flow Complexity
1.01
1.01
Perfect OpenMP + MPI + Pthread
1.00
1.00
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution
1.00
1.00
No Scalar Integer
Potential Speedup
1.04
1.04
Nb Loops to get 80%
7
6
FP Vectorised
Potential Speedup
1.05
1.05
Nb Loops to get 80%
10
10
Fully Vectorised
Potential Speedup
1.21
1.21
Nb Loops to get 80%
22
22
Only FP Arithmetic
Potential Speedup
1.20
1.20
Nb Loops to get 80%
18
18
Cumulated Speedup If No Scalar Integer
Cumulated Speedup If FP Vectorized
Cumulated Speedup If Fully Vectorized
Cumulated Speedup If Only FP Arithmetic
Loop Based Profiles
Innermost / Single Loops
Inbetween Loops
Outermost Loops
Cumulated Coverage With All Loops
Innermost Loop Based Profiles
Coverage
Count
Application Categorization
Time
Coverage
Compilation Options
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼listed_forces.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼pme_grid.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼constraintrange.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼atomdata.cpp–
○
▼bonded.cpp–
○
▼domdec_constraints.cpp–
○
▼settle.cpp–
○
▼calcvir.cpp–
○
▼pme_only.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Path Count Profiles
Coverage
Count
Low Iteration Count Profiles
Coverage
Count
Experiment Summaries
r0
r1
Experiment Name
Application
../../install_MPI/bin/gmx_mpi
../../install_MPI_AMD_libm/bin/gmx_mpi
Timestamp
2024-08-02 16:05:08
2024-08-02 20:46:34
Experiment Type
MPI;
same as r0
Machine
ins01.benchmarkcenter.megware.com
gmz11.benchmarkcenter.megware.com
Architecture
x86_64
same as r0
Micro Architecture
ZEN_V4
same as r0
Model Name
AMD EPYC 9654 96-Core Processor
same as r0
Cache Size
1024 KB
same as r0
Number of Cores
96
same as r0
Maximal Frequency
3.707812 GHz
same as r0
OS Version
Linux 5.14.0-427.18.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 28 06:27:02 EDT 2024
GROMACS 2024.2 compiled with AOCC 4.1 running on two 96 cores AMD Zen 4 processors, using 1 to 192 MPI ranks (no OMP) [strong scaling]. Pinning is controlled by GROMACS.