Help is available by moving the cursor above any symbol or by checking MAQAO website.
Metric | r0 | r1 | r2 | |
---|---|---|---|---|
Total Time (s) | 284.20 | 277.60 | 374.98 | |
Profiled Time (s) | 282.15 | 276.57 | 374.74 | |
Time in analyzed loops (%) | 85.0 | 90.5 | 93.4 | |
Time in analyzed innermost loops (%) | 68.1 | 77.5 | 85.8 | |
Time in user code (%) | 87.7 | 92.6 | 95.2 | |
Compilation Options Score (%) | 75.0 | 75.0 | 100 | |
Perfect Flow Complexity | 1.02 | 1.02 | 1.03 | |
Array Access Efficiency (%) | 55.1 | 57.2 | Not Available | |
GFLOPS | 32.712 | 51.828 | 0.0 | |
Perfect OpenMP + MPI + Pthread | 1.00 | 1.00 | 1.00 | |
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.00 | 1.00 | 1.00 | |
No Scalar Integer | Potential Speedup | 1.08 | 1.05 | 1.17 |
Nb Loops to get 80% | 12 | 9 | 14 | |
FP Vectorised | Potential Speedup | 1.08 | 1.06 | 1.61 |
Nb Loops to get 80% | 8 | 10 | 13 | |
Fully Vectorised | Potential Speedup | 1.24 | 1.15 | 1.67 |
Nb Loops to get 80% | 26 | 21 | 15 | |
Only FP Arithmetic | Potential Speedup | 1.27 | 1.14 | 1.75 |
Nb Loops to get 80% | 26 | 18 | 15 |
Source Object | Issue |
---|---|
▼libgromacs_mpi.so.7 | |
▼pairlist_simd_2xmm.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼threaded_force_buffer.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_gather.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼listed_forces.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼partition.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼manage_threading.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kernel_prune.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pairs.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pairlist.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼update.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼md_support.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼redistribute.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼mdatoms.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼lincs.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pbc.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼atomdata.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼localtopology.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼vector.tcc | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_solve.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_spread.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼calc_verletbuf.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼vec.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼computemultibodycutoffs.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼bonded.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼settle.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼sim_util.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼grid.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼mshift.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼arrayref.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼domdec_constraints.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kernel_ElecEw_VdwLJCombLB_VF.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼kernel_outer.h | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
▼pme_grid.cpp | |
○ | -march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native. |
r0 | r1 | r2 | |
---|---|---|---|
Application | /home/eoseret/GROMACS/install/gplusplus/bin/gmx_mpi | /ccc/work/cont001/ocre/oserete/gromacs-2022.4-install-gcc-ompi/bin/gmx_mpi | /home/eoseret/GROMACS/build/gcc_2/bin/gmx_mpi |
Timestamp | 2023-07-28 12:01:12 | 2023-08-08 09:43:00 | 2023-08-08 09:21:48 |
Experiment Type | MPI; | same as r0 | same as r0 |
Machine | skylake | inti6224 | ip-172-31-47-199 |
Architecture | x86_64 | same as r0 | aarch64 |
Micro Architecture | SKYLAKE | ZEN_V3 | ARM_NEOVERSE_V1 |
Model Name | Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz | AMD EPYC 7763 64-Core Processor | |
Cache Size | 36608 KB | 512 KB | |
Number of Cores | 26 | 64 | |
Maximal Frequency | 2.1 GHz | 2.45 GHz | 0 GHz |
OS Version | Linux 6.4.1-arch2-1 #1 SMP PREEMPT_DYNAMIC Tue, 04 Jul 2023 08:39:40 +0000 | Linux 4.18.0-305.88.1.el8_4.x86_64 #1 SMP Thu Apr 6 10:22:46 EDT 2023 | Linux 5.15.0-1039-aws #44~20.04.1-Ubuntu SMP Thu Jun 22 12:21:08 UTC 2023 |
Architecture used during static analysis | x86_64 | same as r0 | aarch64 |
Micro Architecture used during static analysis | SKYLAKE | ZEN_V3 | ARM_NEOVERSE_V1 |
Compilation Options | libgromacs_mpi.so.7: GNU C++17 13.1.1 20230429 -mavx512f -mfma -mavx512vl -mavx512dq -mavx512bw -mtune=generic -march=x86-64 -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp | libgromacs_mpi.so.7: GNU C++17 12.2.0 -mavx2 -mfma -mtune=generic -march=x86-64 -g -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fexceptions | libgromacs_mpi.so.7: GNU C++17 11.1.0 -march=armv8.2-a+sve -msve-vector-bits=256 -mlittle-endian -mabi=lp64 -g -O3 -O3 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fasynchronous-unwind-tables -fstack-protector-strong -fstack-clash-protection |
Number of processes observed | 1 | same as r0 | same as r0 |
Number of threads observed | 1 | same as r0 | same as r0 |
Frequency Driver | intel_cpufreq | acpi-cpufreq | NA |
Frequency Governor | schedutil | performance | NA |
Huge Pages | always | same as r0 | madvise |
Hyperthreading | off | on | same as r0 |
Number of sockets | 2 | same as r0 | 1 |
Number of cores per socket | 26 | 64 | same as r1 |
MAQAO version | 2.17.7 | same as r0 | 2.17.8 |
MAQAO build | bf11934ec971510c7f500e010d8ca2474fd787ed::20230726-123240 | Build information not available | same as r1 |
Comments | GROMACS 2022.4 compiled with g++ 13.1.1 running on Skylake with 1 OMP thread, 2000 steps | GROMACS compiled with gcc 12.2.0 + OpenMPI, Zen 3, OV1, 2000 steps, single core | GNU g++ 12.2.0 (SIMD=SVE), AWS G3 (Neoverse V1), 2000 steps, single core |