Help is available by moving the cursor above any
symbol or by checking MAQAO website.
| Metric | r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
|---|---|---|---|---|---|---|---|---|---|
| Total Time (s) | 209.44 | 131.78 | 90.89 | 69.66 | 61.47 | 58.83 | 58.00 | 63.66 | |
| Max (Thread Active Time) (s) | 134.53 | 69.16 | 35.25 | 18.21 | 10.62 | 7.70 | 7.17 | 8.61 | |
| Average Active Time (s) | 134.53 | 69.11 | 35.12 | 18.18 | 10.58 | 7.68 | 7.14 | 8.48 | |
| Activity Ratio (%) | 64.2 | 59.7 | 52.5 | 43.2 | 33.4 | 28.6 | 27.0 | 27.2 | |
| Average number of active threads | 0.642 | 1.049 | 1.546 | 2.088 | 2.754 | 4.175 | 7.881 | 17.050 | |
| Affinity Stability (%) | 64.4 | 59.9 | 52.5 | 43.2 | 33.5 | 28.7 | 27.1 | 27.5 | |
| Time in analyzed loops (%) | 27.0 | 27.8 | 27.8 | 29.1 | 25.4 | 17.8 | 9.42 | 4.30 | |
| Time in analyzed innermost loops (%) | 16.1 | 16.8 | 16.7 | 17.7 | 15.4 | 10.8 | 5.61 | 2.57 | |
| Time in user code (%) | 34.3 | 35.1 | 35.4 | 36.3 | 31.9 | 22.5 | 12.0 | 5.53 | |
| Compilation Options Score (%) | 100 | 100 | 100 | 100 | 100 | 100.0 | 100.0 | 99.9 | |
| Array Access Efficiency (%) | 95.3 | 95.7 | 95.6 | 96.0 | 96.0 | 95.8 | 96.0 | 96.0 | |
| Potential Speedups | |||||||||
| Perfect Flow Complexity | 1.01 | 1.01 | 1.01 | 1.01 | 1.01 | 1.01 | 1.00 | 1.00 | |
| Perfect OpenMP + MPI + Pthread | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
| Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.01 | |
| Scalability - Gap | 1.00 | 1.26 | 1.74 | 2.66 | 4.70 | 8.99 | 17.72 | 38.91 | |
| No Scalar Integer | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Nb Loops to get 80% | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | |
| FP Vectorised | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Nb Loops to get 80% | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| Fully Vectorised | Potential Speedup | 1.13 | 1.14 | 1.13 | 1.15 | 1.13 | 1.09 | 1.04 | 1.02 |
| Nb Loops to get 80% | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | |
| Only FP Arithmetic | Potential Speedup | 1.03 | 1.03 | 1.03 | 1.03 | 1.03 | 1.02 | 1.01 | 1.00 |
| Nb Loops to get 80% | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | |
| Source Object | Issue |
|---|---|
| ▼[vdso] | |
| ▼ | |
| ○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
| ○ | -O2, -O3 or -Ofast is missing. |
| ○ | -march=(target) is missing. |
| ▼libfinite_elements.so | |
| ▼MapBase.h | |
| ○ | |
| ▼element_U.tpp | |
| ○ | |
| ▼TensorMap.h | |
| ○ | |
| ▼TensorDeviceDefault.h | |
| ○ | |
| ▼AssignEvaluator.h | |
| ○ | |
| ▼GeneralProduct.h | |
| ○ | |
| ▼generic_elements.hpp | |
| ○ | |
| ▼stl_vector.h | |
| ○ | |
| ▼DenseStorage.h | |
| ○ | |
| ▼material_brick.hpp | |
| ○ | |
| ▼InverseImpl.h | |
| ○ | |
| ▼Memory.h | |
| ○ | |
| ▼PlainObjectBase.h | |
| ○ | |
| ▼libdofs.so | |
| ▼MapBase.h | |
| ○ | |
| ▼stl_vector.h | |
| ○ | |
| ▼dof_list.cpp | |
| ○ | |
| ▼stl_iterator.h | |
| ○ | |
| ▼dof.cpp | |
| ○ | |
| ▼multithreading_assembly_perf_test | |
| ▼finite_elements.hpp | |
| ○ | |
| ▼basic_string.tcc | |
| ○ | |
| ▼enumerable_thread_specific.h | |
| ○ | |
| ▼assembler.hpp | |
| ○ | |
| ▼parallel_for.h | |
| ○ | |
| ▼libnon_linear_solvers.so | |
| ▼DenseStorage.h | |
| ○ |
| r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
|---|---|---|---|---|---|---|---|---|
| Application | ./multithreading_assembly_perf_test | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Timestamp | 2025-05-20 11:59:19 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Experiment Type | MPI; | MPI; OpenMP; | same as r1 | same as r1 | same as r1 | same as r1 | same as r1 | same as r1 |
| Machine | be-seq028 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Architecture | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Micro Architecture | ZEN_V4 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Model Name | AMD EPYC 9534 64-Core Processor | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Cache Size | 1024 KB | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Number of Cores | 64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Maximal Frequency | 3.718066 GHz | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| OS Version | Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Architecture used during static analysis | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Micro Architecture used during static analysis | ZEN_V4 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Compilation Options | libdofs.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libnon_linear_solvers.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops | same as r0 | same as r0 | same as r0 | same as r0 | + [vdso]: N/A libdofs.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libnon_linear_solvers.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops | same as r5 | same as r5 |
| Number of processes observed | 1 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Number of threads observed | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
| Frequency Driver | acpi-cpufreq | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Frequency Governor | performance | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Huge Pages | always | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Hyperthreading | off | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Number of sockets | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Number of cores per socket | 64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| MAQAO version | 2.21.4 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| MAQAO build | 07eb2902ade069371c0df3e2f8cceca5d41c0371::20250519-154801 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
| Comments | - | - | - | - | - | - | - | - |