options

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Global Metrics

Metricr0r1r2
Total Time (s)71.5956.9750.31
Profiled Time (s)69.1548.4846.33
Time in analyzed loops (%)59.480.889.0
Time in analyzed innermost loops (%)42.164.479.0
Time in user code (%)60.882.491.3
Compilation Options Score (%)75.075.0100
Perfect Flow Complexity1.011.051.02
Array Access Efficiency (%)56.258.3Not Available
GFLOPS648.5721.26 E30.0
Perfect OpenMP + MPI + Pthread1.231.041.01
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution1.561.171.12
No Scalar IntegerPotential Speedup1.081.061.18
Nb Loops to get 80%121017
FP VectorisedPotential Speedup1.071.071.59
Nb Loops to get 80%81020
Fully VectorisedPotential Speedup1.291.391.67
Nb Loops to get 80%242123
Only FP ArithmeticPotential Speedup1.261.261.76
Nb Loops to get 80%282123

Cumulated Speedup If No Scalar Integer

Cumulated Speedup If FP Vectorized

Cumulated Speedup If Fully Vectorized

Cumulated Speedup If Only FP Arithmetic

Loop Based Profiles

Innermost / Single Loops

Inbetween Loops

Outermost Loops

Cumulated Coverage With All Loops

Innermost Loop Based Profiles

Coverage

Count

Application Categorization

Time

Coverage

Compilation Options

Source ObjectIssue
gmx_mpi
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
libgromacs_mpi.so.7
fft5d.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pairlist_simd_2xmm.h
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
threaded_force_buffer.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pme_gather.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
listed_forces.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
kernel_outer.h
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
kernel_ElecEw_VdwLJCombLB_VF.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
kernel_prune.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pairs.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pairlist.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
update.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
md_support.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pme.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
kernel_common.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
mdatoms.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
lincs.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pbc.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
constr.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
atomdata.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
localtopology.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
kerneldispatch.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pme_solve.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pme_spread.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
calc_verletbuf.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
fft_fftw3.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
settle.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
bonded.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
vector.tcc
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
sim_util.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
grid.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
vec.h
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
arrayref.h
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
domdec_constraints.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
partition.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
manage_threading.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
pme_grid.cpp
-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.

Path Count Profiles

Coverage

Count

Low Iteration Count Profiles

Coverage

Count

Experiment Summaries

r0r1r2
Application/home/eoseret/GROMACS/install/gplusplus/bin/gmx_mpi/ccc/work/cont001/ocre/oserete/gromacs-2022.4-install-gcc-ompi/bin/gmx_mpi/home/eoseret/GROMACS/build/gcc_2/bin/gmx_mpi
Timestamp2023-07-28 11:50:562023-08-08 09:56:512023-08-08 09:18:53
Experiment TypeMPI; OpenMP; same as r0same as r0
Machineskylakeinti6206ip-172-31-47-199
Architecturex86_64same as r0aarch64
Micro ArchitectureSKYLAKEZEN_V3ARM_NEOVERSE_V1
Model NameIntel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHzAMD EPYC 7763 64-Core Processor
Cache Size36608 KB512 KB
Number of Cores2664
Maximal Frequency2.1 GHz2.45 GHz0 GHz
OS VersionLinux 6.4.1-arch2-1 #1 SMP PREEMPT_DYNAMIC Tue, 04 Jul 2023 08:39:40 +0000Linux 4.18.0-305.88.1.el8_4.x86_64 #1 SMP Thu Apr 6 10:22:46 EDT 2023Linux 5.15.0-1039-aws #44~20.04.1-Ubuntu SMP Thu Jun 22 12:21:08 UTC 2023
Architecture used during static analysisx86_64same as r0aarch64
Micro Architecture used during static analysisSKYLAKEZEN_V3ARM_NEOVERSE_V1
Compilation Options
libgromacs_mpi.so.7: GNU C++17 13.1.1 20230429 -mavx512f -mfma -mavx512vl -mavx512dq -mavx512bw -mtune=generic -march=x86-64 -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp
gmx_mpi: N/A

libgromacs_mpi.so.7: GNU C++17 12.2.0 -mavx2 -mfma -mtune=generic -march=x86-64 -g -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fexceptions

libgromacs_mpi.so.7: GNU C++17 11.1.0 -march=armv8.2-a+sve -msve-vector-bits=256 -mlittle-endian -mabi=lp64 -g -O3 -O3 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fasynchronous-unwind-tables -fstack-protector-strong -fstack-clash-protection
Number of processes observed1same as r0same as r0
Number of threads observed52same as r0same as r0
Frequency Driverintel_cpufreqacpi-cpufreqNA
Frequency GovernorschedutilperformanceNA
Huge Pagesalwayssame as r0madvise
Hyperthreadingoffonsame as r0
Number of sockets2same as r01
Number of cores per socket2664same as r1
MAQAO version2.17.7same as r02.17.8
MAQAO buildbf11934ec971510c7f500e010d8ca2474fd787ed::20230726-123240Build information not availablesame as r1
CommentsGROMACS 2022.4 compiled with g++ 13.1.1 running on Skylake with 52 OMP threads, 10000 stepsGROMACS compiled with gcc 12.2.0 + OpenMPI, Zen 3, OV1, 10000 steps, 52 coresGNU g++ 12.2.0 (SIMD=SVE), AWS G3 (Neoverse V1), 10000 steps, 52 cores
×