Help is available by moving the cursor above any
symbol or by checking MAQAO website.
- r0: OMP1 - option thread_filter-threshold (1%) discards 7 threads, cumulating 0.12 seconds CPU time.
- r1: OMP2 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.11 seconds CPU time.
- r2: OMP4 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.13 seconds CPU time.
- r3: OMP8 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.12 seconds CPU time.
- r4: OMP16 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.13 seconds CPU time.
- r5: OMP24 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.12 seconds CPU time.
Metric | r0 | r1 | r2 | r3 | r4 | r5 |
---|
Total Time (s) | 748.15 | 423.00 | 249.76 | 164.57 | 113.53 | 91.74 |
Max (Thread Active Time) (s) | 727.42 | 410.15 | 241.97 | 158.17 | 108.24 | 86.43 |
Average Active Time (s) | 726.56 | 397.74 | 224.29 | 140.94 | 87.72 | 67.75 |
Activity Ratio (%) | 97.1 | 94.2 | 90.1 | 86.1 | 78.0 | 74.7 |
Average number of active threads | 7.769 | 15.044 | 28.737 | 54.812 | 98.900 | 141.787 |
Affinity Stability (%) | 15.1 | 14.0 | 22.5 | 33.1 | 47.2 | 42.5 |
GFLOPS | 82.850 | 146.385 | 247.933 | 375.890 | 545.237 | 674.333 |
Time in analyzed loops (%) | 2.05 | 1.36 | 0.91 | 0.58 | 0.43 | 0.35 |
Time in analyzed innermost loops (%) | 2.04 | 1.35 | 0.90 | 0.58 | 0.43 | 0.35 |
Time in user code (%) | 99.4 | 95.9 | 90.8 | 86.1 | 73.2 | 68.4 |
Compilation Options Score (%) | 100 | 100 | 100 | 100 | 100 | 100 |
Array Access Efficiency (%) | 55.3 | 52.9 | 52.6 | 50.5 | 50.4 | 50.1 |
|
Potential Speedups |  |
Perfect Flow Complexity | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Perfect OpenMP + MPI + Pthread | 1.00 | 1.00 | 1.00 | 1.00 | 1.01 | 1.01 |
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.00 | 1.07 | 1.19 | 1.30 | 1.68 | 1.86 |
Scalability - Gap | 1.00 | 1.13 | 1.34 | 1.76 | 2.43 | 2.94 |
No Scalar Integer | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 |
FP Vectorised | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 |
Fully Vectorised | Potential Speedup | 1.01 | 1.01 | 1.01 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 3 | 3 | 3 | 3 | 3 | 3 |
Only FP Arithmetic | Potential Speedup | 1.02 | 1.01 | 1.01 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 3 | 3 | 3 | 3 | 3 | 3 |
Source Object | Issue |
▼xhpl– | |
▼HPL_lmul.c– | |
○ | |
▼HPL_rand.c– | |
○ | |
▼HPL_dlaswp03N.c– | |
○ | |
▼HPL_bcast.c– | |
○ | |
▼HPL_dlacpy.c– | |
○ | |
▼HPL_dlaswp04N.c– | |
○ | |
▼HPL_1ring.c– | |
○ | |
▼HPL_setran.c– | |
○ | |
▼HPL_ladd.c– | |
○ | |
▼HPL_pdlange.c– | |
○ | |
▼HPL_pdgesv0.c– | |
○ | |
▼HPL_dlaswp02N.c– | |
○ | |
▼HPL_dlaswp01N.c– | |
○ | |
Source Object | Issue |
▼xhpl– | |
▼HPL_lmul.c– | |
○ | |
▼HPL_rand.c– | |
○ | |
▼HPL_dlaswp03N.c– | |
○ | |
▼HPL_bcast.c– | |
○ | |
▼HPL_dlacpy.c– | |
○ | |
▼HPL_dlaswp04N.c– | |
○ | |
▼HPL_1ring.c– | |
○ | |
▼HPL_setran.c– | |
○ | |
▼HPL_ladd.c– | |
○ | |
▼HPL_pdlange.c– | |
○ | |
▼HPL_pdgesv0.c– | |
○ | |
▼HPL_dlaswp02N.c– | |
○ | |
▼HPL_dlaswp01N.c– | |
○ | |
Source Object | Issue |
▼xhpl– | |
▼HPL_lmul.c– | |
○ | |
▼HPL_rand.c– | |
○ | |
▼HPL_dlaswp03N.c– | |
○ | |
▼HPL_bcast.c– | |
○ | |
▼HPL_dlacpy.c– | |
○ | |
▼HPL_dlaswp04N.c– | |
○ | |
▼HPL_1ring.c– | |
○ | |
▼HPL_setran.c– | |
○ | |
▼HPL_ladd.c– | |
○ | |
▼HPL_pdlange.c– | |
○ | |
▼HPL_pdgesv0.c– | |
○ | |
▼HPL_dlaswp02N.c– | |
○ | |
▼HPL_dlaswp01N.c– | |
○ | |
Source Object | Issue |
▼xhpl– | |
▼HPL_lmul.c– | |
○ | |
▼HPL_rand.c– | |
○ | |
▼HPL_dlaswp03N.c– | |
○ | |
▼HPL_bcast.c– | |
○ | |
▼HPL_dlacpy.c– | |
○ | |
▼HPL_dlaswp04N.c– | |
○ | |
▼HPL_1ring.c– | |
○ | |
▼HPL_setran.c– | |
○ | |
▼HPL_ladd.c– | |
○ | |
▼HPL_pdlange.c– | |
○ | |
▼HPL_pdgesv0.c– | |
○ | |
▼HPL_dlaswp02N.c– | |
○ | |
▼HPL_dlaswp01N.c– | |
○ | |
Source Object | Issue |
▼xhpl– | |
▼HPL_lmul.c– | |
○ | |
▼HPL_rand.c– | |
○ | |
▼HPL_dlaswp03N.c– | |
○ | |
▼HPL_bcast.c– | |
○ | |
▼HPL_dlacpy.c– | |
○ | |
▼HPL_dlaswp04N.c– | |
○ | |
▼HPL_1ring.c– | |
○ | |
▼HPL_setran.c– | |
○ | |
▼HPL_ladd.c– | |
○ | |
▼HPL_pdlange.c– | |
○ | |
▼HPL_pdgesv0.c– | |
○ | |
▼HPL_dlaswp02N.c– | |
○ | |
▼HPL_dlaswp01N.c– | |
○ | |
Source Object | Issue |
▼xhpl– | |
▼HPL_lmul.c– | |
○ | |
▼HPL_rand.c– | |
○ | |
▼HPL_dlaswp03N.c– | |
○ | |
▼HPL_bcast.c– | |
○ | |
▼HPL_dlacpy.c– | |
○ | |
▼HPL_dlaswp04N.c– | |
○ | |
▼HPL_1ring.c– | |
○ | |
▼HPL_setran.c– | |
○ | |
▼HPL_ladd.c– | |
○ | |
▼HPL_pdlange.c– | |
○ | |
▼HPL_pdgesv0.c– | |
○ | |
▼HPL_dlaswp02N.c– | |
○ | |
▼HPL_dlaswp01N.c– | |
○ | |
| r0 | r1 | r2 | r3 | r4 | r5 |
Experiment Name | | | | | | |
Application | ./hpl-2.3/bin/Linux_Intel64_Zen5_AOCL/xhpl | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Timestamp | 2025-06-23 10:53:13 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Experiment Type | MPI; | MPI; OpenMP; | same as r1 | same as r1 | same as r1 | same as r1 |
Machine | gmz12.benchmarkcenter.megware.com | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture | ZEN_V5 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Model Name | AMD EPYC 9655 96-Core Processor | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Cache Size | 1024 KB | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of Cores | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Maximal Frequency | 4.509375 GHz | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
OS Version | Linux 5.14.0-503.31.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Mar 13 06:50:51 EDT 2025 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture used during static analysis | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture used during static analysis | ZEN_V5 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Compilation Options |
xhpl: AMD clang version 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24) /cluster/comp/aocc/5.0.0/bin/clang-17 -o HPL_rand.o -c -D Add__ -D F77_INTEGER=int -D StringSunStyle -D HPL_DETAILED_TIMING -D HPL_PROGRESS_REPORT -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include/Linux_Intel64_Zen5_AOCL -I /cluster/libs/aocl/5.0.0/aocc/include -fopenmp -O3 -ffast-math -g -grecord-command-line -march=znver5 -mprefer-vector-width=512 -Wall ../HPL_rand.c -I /cluster/hpcx/2.22/ompi-aocc/include -I /cluster/hpcx/2.22/ompi-aocc/include/openmpi -I /cluster/hpcx/2.22/ompi-aocc/include/openmpi/opal/mca/event/libevent2022/libevent -I /cluster/hpcx/2.22/ompi-aocc/include/openmpi/opal/mca/event/libevent2022/libevent/include | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of processes observed | 8 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of threads observed | 8 | 16 | 32 | 64 | 128 | 192 |
Frequency Driver | acpi-cpufreq | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Frequency Governor | ondemand | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Huge Pages | always | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Hyperthreading | on | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of sockets | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of cores per socket | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO version | 2025.1.0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO build | b107544c0173fc3785aa7d997ff783dc12b975d2::20250527-133805 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Comments | HPL benchmark compiled with AMD AOCC/AOCL 5.0. Matrix order: 100K, block size 384. Run on AMD Zen 5 with 8 NUMA nodes and 24 cores per NUMA node | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |