Help is available by moving the cursor above any symbol or by checking MAQAO website.
▶Compared Reports
r0: 1x1
r1: 2x1
r2: 4x1
r3: 8x1
r4: 16x1
r5: 32x1
r6: 64x1
r7: 128x1
r8: 192x1
Global Metrics
Metric
r0
r1
r2
r3
r4
r5
r6
r7
r8
Total Time (s)
832.93
454.42
252.32
155.87
78.70
52.86
36.05
19.24
15.96
Profiled Time (s)
832.29
453.06
249.44
154.42
77.26
51.37
34.46
17.40
13.61
Time in analyzed loops (%)
91.5
89.3
88.0
85.4
76.7
57.6
52.7
45.8
40.0
Time in analyzed innermost loops (%)
77.1
75.8
75.5
73.2
65.7
49.6
45.9
38.5
33.5
Time in user code (%)
93.2
90.8
89.7
87.4
79.6
60.6
56.6
52.8
48.8
Compilation Options Score (%)
100
100
100
100
100
100
100
100
100
Array Access Efficiency (%)
48.1
49.0
49.1
49.2
49.1
48.7
48.1
50.5
50.3
Scalability - Gap
1.00
1.09
1.21
1.50
1.51
2.03
2.77
2.96
3.68
Potential Speedups
Perfect Flow Complexity
1.01
1.01
1.01
1.01
1.01
1.01
1.01
1.01
1.02
Perfect OpenMP + MPI + Pthread
1.00
1.02
1.02
1.04
1.11
1.08
1.11
1.14
1.17
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution
1.00
1.02
1.04
1.05
1.14
1.47
1.54
1.50
1.51
No Scalar Integer
Potential Speedup
1.04
1.04
1.04
1.04
1.04
1.03
1.02
1.02
1.02
Nb Loops to get 80%
6
10
11
11
12
11
11
12
12
FP Vectorised
Potential Speedup
1.05
1.05
1.04
1.05
1.04
1.03
1.02
1.03
1.03
Nb Loops to get 80%
10
11
12
12
13
12
11
11
11
Fully Vectorised
Potential Speedup
1.21
1.22
1.21
1.21
1.18
1.12
1.11
1.13
1.11
Nb Loops to get 80%
22
28
29
29
30
28
26
25
27
Only FP Arithmetic
Potential Speedup
1.20
1.20
1.19
1.18
1.16
1.11
1.10
1.11
1.08
Nb Loops to get 80%
18
23
25
26
26
23
23
24
26
Scalability Speedup
Cumulated Speedup If No Scalar Integer
Cumulated Speedup If FP Vectorized
Cumulated Speedup If Fully Vectorized
Cumulated Speedup If Only FP Arithmetic
Loop Based Profiles
Innermost / Single Loops
Inbetween Loops
Outermost Loops
Cumulated Coverage With All Loops
Innermost Loop Based Profiles
Coverage
Count
Application Categorization
Time
Coverage
Compilation Options
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Source Object
Issue
▼libgromacs_mpi.so.9.0.0–
▼fft5d.cpp–
○
▼threaded_force_buffer.cpp–
○
▼pme_pp.cpp–
○
▼pme_gather.cpp–
○
▼calcvir.cpp–
○
▼simd_prune_kernel.cpp–
○
▼partition.cpp–
○
▼reversetopology.cpp–
○
▼pairs.cpp–
○
▼pairlist.cpp–
○
▼update.cpp–
○
▼md_support.cpp–
○
▼pme.cpp–
○
▼mdatoms.cpp–
○
▼lincs.cpp–
○
▼pbc.cpp–
○
▼domdec.cpp–
○
▼pme_redistribute.cpp–
○
▼md.cpp–
○
▼domdec_specatomcomm.cpp–
○
▼atomdata.cpp–
○
▼localtopology.cpp–
○
▼pme_solve.cpp–
○
▼pme_spread.cpp–
○
▼calc_verletbuf.cpp–
○
▼simd_kernel.h–
○
▼bonded.cpp–
○
▼inmemoryserializer.cpp–
○
▼sim_util.cpp–
○
▼grid.cpp–
○
▼settle.cpp–
○
▼domdec_constraints.cpp–
○
▼pme_only.cpp–
○
▼constraintrange.cpp–
○
▼pme_grid.cpp–
○
▼[vdso]–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
Path Count Profiles
Coverage
Count
Low Iteration Count Profiles
Coverage
Count
Experiment Summaries
r0
r1
r2
r3
r4
r5
r6
r7
r8
Experiment Name
Application
../../install_MPI_AMD_libm/bin/gmx_mpi
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Timestamp
2024-08-02 20:46:34
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Experiment Type
MPI;
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Machine
gmz11.benchmarkcenter.megware.com
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Architecture
x86_64
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Micro Architecture
ZEN_V4
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Model Name
AMD EPYC 9654 96-Core Processor
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Cache Size
1024 KB
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Number of Cores
96
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
Maximal Frequency
3.707812 GHz
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
same as r0
OS Version
Linux 5.14.0-427.18.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 28 06:27:02 EDT 2024
GROMACS 2024.2 compiled with AOCC 4.1 running on two 96 cores AMD Zen 4 processors, using 1 to 192 MPI ranks (no OMP) [strong scaling]. Pinning is controlled by GROMACS.