Help is available by moving the cursor above any symbol or by checking MAQAO website.
Metric | r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
---|---|---|---|---|---|---|---|---|---|
Total Time (s) | 1.11 E3 | 565.94 | 287.22 | 144.32 | 75.64 | 41.74 | 31.32 | 28.18 | |
Profiled Time (s) | 1.11 E3 | 563.76 | 284.83 | 142.32 | 73.62 | 39.52 | 28.96 | 25.76 | |
GFLOPS | 6.454 | 12.644 | 24.905 | 49.556 | 94.544 | 171.339 | 228.389 | 253.857 | |
Time in analyzed loops (%) | 98.8 | 98.4 | 97.8 | 97.3 | 95.7 | 92.5 | 82.6 | 80.7 | |
Time in analyzed innermost loops (%) | 38.6 | 38.8 | 39.4 | 38.7 | 39.0 | 39.5 | 43.7 | 46.0 | |
Time in user code (%) | 98.8 | 98.5 | 97.8 | 97.3 | 95.7 | 92.6 | 82.6 | 80.8 | |
Compilation Options Score (%) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
Array Access Efficiency (%) | 71.5 | 71.6 | 71.8 | 71.7 | 72.1 | 73.0 | 77.3 | 78.9 | |
Scalability - Gap | 1.00 | 1.02 | 1.04 | 1.04 | 1.09 | 1.20 | 1.81 | 2.44 | |
Potential Speedups | |||||||||
Perfect Flow Complexity | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
Perfect OpenMP + MPI + Pthread | 1.00 | 1.01 | 1.02 | 1.01 | 1.03 | 1.05 | 1.06 | 1.06 | |
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.01 | 1.02 | 1.02 | 1.03 | 1.05 | 1.08 | 1.27 | 1.36 | |
No Scalar Integer | Potential Speedup | 1.28 | 1.28 | 1.27 | 1.27 | 1.26 | 1.24 | 1.16 | 1.14 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
FP Vectorised | Potential Speedup | 1.96 | 1.94 | 1.93 | 1.91 | 1.86 | 1.76 | 1.47 | 1.41 |
Nb Loops to get 80% | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
Fully Vectorised | Potential Speedup | 4.46 | 4.35 | 4.25 | 4.17 | 3.88 | 3.40 | 2.38 | 2.13 |
Nb Loops to get 80% | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | |
Only FP Arithmetic | Potential Speedup | 1.58 | 1.58 | 1.56 | 1.57 | 1.55 | 1.52 | 1.42 | 1.40 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | |
OpenMP perfectly balanced | Potential Speedup | 1.00 | 1.00 | 1.01 | 1.02 | 1.03 | 1.05 | 1.10 | 1.12 |
Nb Loops to get 80% | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
---|---|---|---|---|---|---|---|---|
Experiment Name | ||||||||
Application | /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/run/binaries/aocc_10/exec | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Timestamp | 2024-02-23 11:55:24 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Experiment Type | MPI; | MPI; OpenMP; | same as r1 | same as r1 | same as r1 | same as r1 | same as r1 | same as r1 |
Machine | gmz16.benchmarkcenter.megware.com | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture | ZEN_V4 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Model Name | AMD EPYC 9654 96-Core Processor | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Cache Size | 1024 KB | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of Cores | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Maximal Frequency | 3.707812 GHz | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
OS Version | Linux 5.14.0-362.13.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 21 07:12:43 EST 2023 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture used during static analysis | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture used during static analysis | ZEN_V4 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Compilation Options | libkripke.so: AMD clang version 16.0.3 (CLANG: AOCC_4.1.0-Build#270 2023_07_10) /cluster/comp/aocc/4.1.0/bin/clang-16 --driver-mode=g++ -D kripke_EXPORTS -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/src -I include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/include -I tpl/raja/include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/cub -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/rocPRIM/rocprim/include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/camp/include -O3 -O3 -march=znver4 -mprefer-vector-width=512 -flto=full -g -grecord-command-line -fno-omit-frame-pointer -fcf-protection=none -nopie -Wall -Wextra -O3 -D NDEBUG -fPIC -fopenmp=libomp -std=c++14 -MD -MT CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o -MF CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o.d -o CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o -c /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/src/Kripke/Kernel/SweepSubdomain.cpp -I /cluster/intel/oneapi/2024.0.0/mpi/2021.11/include libRAJA.so: N/A | libkripke.so: AMD clang version 16.0.3 (CLANG: AOCC_4.1.0-Build#270 2023_07_10) /cluster/comp/aocc/4.1.0/bin/clang-16 --driver-mode=g++ -D kripke_EXPORTS -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/src -I include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/include -I tpl/raja/include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/cub -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/rocPRIM/rocprim/include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/camp/include -O3 -O3 -march=znver4 -mprefer-vector-width=512 -flto=full -g -grecord-command-line -fno-omit-frame-pointer -fcf-protection=none -nopie -Wall -Wextra -O3 -D NDEBUG -fPIC -fopenmp=libomp -std=c++14 -MD -MT CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o -MF CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o.d -o CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o -c /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/src/Kripke/Kernel/SweepSubdomain.cpp -I /cluster/intel/oneapi/2024.0.0/mpi/2021.11/include + [vdso]: N/A | same as r1 | same as r1 | same as r1 | libkripke.so: AMD clang version 16.0.3 (CLANG: AOCC_4.1.0-Build#270 2023_07_10) /cluster/comp/aocc/4.1.0/bin/clang-16 --driver-mode=g++ -D kripke_EXPORTS -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/src -I include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/include -I tpl/raja/include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/cub -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/rocPRIM/rocprim/include -I /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/tpl/raja/tpl/camp/include -O3 -O3 -march=znver4 -mprefer-vector-width=512 -flto=full -g -grecord-command-line -fno-omit-frame-pointer -fcf-protection=none -nopie -Wall -Wextra -O3 -D NDEBUG -fPIC -fopenmp=libomp -std=c++14 -MD -MT CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o -MF CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o.d -o CMakeFiles/kripke.dir/src/Kripke/Kernel/SweepSubdomain.cpp.o -c /beegfs/hackathon/users/eoseret/qaas_runs/170-850-6313/intel/Kripke/build/Kripke/src/Kripke/Kernel/SweepSubdomain.cpp -I /cluster/intel/oneapi/2024.0.0/mpi/2021.11/include | same as r5 | same as r1 |
Number of processes observed | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of threads observed | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 192 |
Frequency Driver | acpi-cpufreq | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Frequency Governor | performance | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Huge Pages | always | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Hyperthreading | on | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of sockets | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of cores per socket | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO version | 2.19.1 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO build | e26c8ffcefb997f114892e36591c060f98f53e6a::20240206-190005 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Comments | Execution on the Megware (https://www.megware.com/en/) benchmarking cluster | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |