Help is available by moving the cursor above any symbol or by checking MAQAO website.
Metric | r0 | r1 | r2 | |
---|---|---|---|---|
Total Time (s) | 71.59 | 56.97 | 50.31 | |
Profiled Time (s) | 69.15 | 48.48 | 46.33 | |
Time in analyzed loops (%) | 59.4 | 80.8 | 89.0 | |
Time in analyzed innermost loops (%) | 42.1 | 64.4 | 79.0 | |
Time in user code (%) | 60.8 | 82.4 | 91.3 | |
Compilation Options Score (%) | 75.0 | 75.0 | 100 | |
Perfect Flow Complexity | 1.01 | 1.05 | 1.02 | |
Array Access Efficiency (%) | 56.2 | 58.3 | Not Available | |
GFLOPS | 648.572 | 1.26 E3 | 0.0 | |
Perfect OpenMP + MPI + Pthread | 1.23 | 1.04 | 1.01 | |
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.56 | 1.17 | 1.12 | |
No Scalar Integer | Potential Speedup | 1.08 | 1.06 | 1.18 |
Nb Loops to get 80% | 12 | 10 | 17 | |
FP Vectorised | Potential Speedup | 1.07 | 1.07 | 1.59 |
Nb Loops to get 80% | 8 | 10 | 20 | |
Fully Vectorised | Potential Speedup | 1.29 | 1.39 | 1.67 |
Nb Loops to get 80% | 24 | 21 | 23 | |
Only FP Arithmetic | Potential Speedup | 1.26 | 1.26 | 1.76 |
Nb Loops to get 80% | 28 | 21 | 23 |
Source Object | Issue |
---|---|
▼gmx_mpi | |
▼ | |
○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
▼libgromacs_mpi.so.7 | |
▼fft5d.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pairlist_simd_2xmm.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼threaded_force_buffer.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_gather.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼listed_forces.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kernel_outer.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kernel_ElecEw_VdwLJCombLB_VF.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kernel_prune.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pairs.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pairlist.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼update.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼md_support.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kernel_common.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼mdatoms.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼lincs.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pbc.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼constr.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼atomdata.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼localtopology.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kerneldispatch.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_solve.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_spread.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼calc_verletbuf.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼fft_fftw3.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼settle.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼bonded.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼vector.tcc | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼sim_util.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼grid.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼vec.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼arrayref.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼domdec_constraints.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼partition.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼manage_threading.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_grid.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
r0 | r1 | r2 | |
---|---|---|---|
Application | /home/eoseret/GROMACS/install/gplusplus/bin/gmx_mpi | /ccc/work/cont001/ocre/oserete/gromacs-2022.4-install-gcc-ompi/bin/gmx_mpi | /home/eoseret/GROMACS/build/gcc_2/bin/gmx_mpi |
Timestamp | 2023-07-28 11:50:56 | 2023-08-08 09:56:51 | 2023-08-08 09:18:53 |
Experiment Type | MPI; OpenMP; | same as r0 | same as r0 |
Machine | skylake | inti6206 | ip-172-31-47-199 |
Architecture | x86_64 | same as r0 | aarch64 |
Micro Architecture | SKYLAKE | ZEN_V3 | ARM_NEOVERSE_V1 |
Model Name | Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz | AMD EPYC 7763 64-Core Processor | |
Cache Size | 36608 KB | 512 KB | |
Number of Cores | 26 | 64 | |
Maximal Frequency | 2.1 GHz | 2.45 GHz | 0 GHz |
OS Version | Linux 6.4.1-arch2-1 #1 SMP PREEMPT_DYNAMIC Tue, 04 Jul 2023 08:39:40 +0000 | Linux 4.18.0-305.88.1.el8_4.x86_64 #1 SMP Thu Apr 6 10:22:46 EDT 2023 | Linux 5.15.0-1039-aws #44~20.04.1-Ubuntu SMP Thu Jun 22 12:21:08 UTC 2023 |
Architecture used during static analysis | x86_64 | same as r0 | aarch64 |
Micro Architecture used during static analysis | SKYLAKE | ZEN_V3 | ARM_NEOVERSE_V1 |
Compilation Options | libgromacs_mpi.so.7: GNU C++17 13.1.1 20230429 -mavx512f -mfma -mavx512vl -mavx512dq -mavx512bw -mtune=generic -march=x86-64 -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp gmx_mpi: N/A | libgromacs_mpi.so.7: GNU C++17 12.2.0 -mavx2 -mfma -mtune=generic -march=x86-64 -g -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fexceptions | libgromacs_mpi.so.7: GNU C++17 11.1.0 -march=armv8.2-a+sve -msve-vector-bits=256 -mlittle-endian -mabi=lp64 -g -O3 -O3 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fasynchronous-unwind-tables -fstack-protector-strong -fstack-clash-protection |
Number of processes observed | 1 | same as r0 | same as r0 |
Number of threads observed | 52 | same as r0 | same as r0 |
Frequency Driver | intel_cpufreq | acpi-cpufreq | NA |
Frequency Governor | schedutil | performance | NA |
Huge Pages | always | same as r0 | madvise |
Hyperthreading | off | on | same as r0 |
Number of sockets | 2 | same as r0 | 1 |
Number of cores per socket | 26 | 64 | same as r1 |
MAQAO version | 2.17.7 | same as r0 | 2.17.8 |
MAQAO build | bf11934ec971510c7f500e010d8ca2474fd787ed::20230726-123240 | Build information not available | same as r1 |
Comments | GROMACS 2022.4 compiled with g++ 13.1.1 running on Skylake with 52 OMP threads, 10000 steps | GROMACS compiled with gcc 12.2.0 + OpenMPI, Zen 3, OV1, 10000 steps, 52 cores | GNU g++ 12.2.0 (SIMD=SVE), AWS G3 (Neoverse V1), 10000 steps, 52 cores |