Help is available by moving the cursor above any symbol or by checking MAQAO website.
Total Time (s) | 65.15 | ||
Max (Thread Active Time) (s) | 8.58 | ||
Average Active Time (s) | 8.54 | ||
Activity Ratio (%) | 26.9 | ||
Average number of active threads | 16.775 | ||
Affinity Stability (%) | 27.0 | ||
Time in analyzed loops (%) | 4.34 | ||
Time in analyzed innermost loops (%) | 2.61 | ||
Time in user code (%) | 5.54 | ||
Compilation Options Score (%) | 99.9 | ||
Array Access Efficiency (%) | 96.2 | ||
Potential Speedups | ![]() | ||
Perfect Flow Complexity | 1.00 | ||
Perfect OpenMP + MPI + Pthread | 1.00 | ||
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.00 | ||
No Scalar Integer | Potential Speedup | 1.00 | |
Nb Loops to get 80% | 2 | ||
FP Vectorised | Potential Speedup | 1.00 | |
Nb Loops to get 80% | 2 | ||
Fully Vectorised | Potential Speedup | 1.02 | |
Nb Loops to get 80% | 1 | ||
FP Arithmetic Only | Potential Speedup | 1.00 | |
Nb Loops to get 80% | 5 |
Source Object | Issue |
---|---|
▼[vdso] | |
▼ | |
○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
○ | -O2, -O3 or -Ofast is missing. |
○ | -march=(target) is missing. |
▼libfinite_elements.so | |
○InverseImpl.h | |
○element_U.tpp | |
○TensorMap.h | |
○GeneralProduct.h | |
○generic_elements.hpp | |
○stl_vector.h | |
○AssignEvaluator.h | |
○TensorDeviceDefault.h | |
○MapBase.h | |
○PlainObjectBase.h | |
▼libdofs.so | |
○dof_list.cpp | |
○dof.cpp | |
○MapBase.h | |
○stl_vector.h | |
▼multithreading_assembly_perf_test | |
○enumerable_thread_specific.h | |
○finite_elements.hpp | |
○assembler.hpp | |
▼libnon_linear_solvers.so | |
▼ | |
○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
○ | -O2, -O3 or -Ofast is missing. |
○ | -march=(target) is missing. |
Application | ./multithreading_assembly_perf_test | ||||
Timestamp | 2025-05-20 10:57:29 | Universal Timestamp | 1747731449 | ||
Number of processes observed | 1 | Number of threads observed | 128 | ||
Experiment Type | MPI; OpenMP; | ||||
Machine | be-seq033 | ||||
Model Name | AMD EPYC 9534 64-Core Processor | ||||
Architecture | x86_64 | Micro Architecture | ZEN_V4 | ||
Cache Size | 1024 KB | Number of Cores | 64 | ||
OS Version | Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023 | ||||
Architecture used during static analysis | x86_64 | Micro Architecture used during static analysis | ZEN_V4 | ||
Frequency Driver | acpi-cpufreq | Frequency Governor | performance | ||
Huge Pages | always | Hyperthreading | off | ||
Number of sockets | 2 | Number of cores per socket | 64 | ||
Compilation Options | + [vdso]: N/A libdofs.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libnon_linear_solvers.so: N/A multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops |
Dataset | |
Run Command | <executable> --max_threads <OMP_NUM_THREADS> --ncut 200 --method ColMutexes --storage SparseCOO |
MPI Command | mpirun -n <number_processes> --map-by slot:PE=<OMP_NUM_THREADS> --bind-to core |
Number Processes | 1 |
Number Nodes | 1 |
Filter | Not Used |
Profile Start | Not Used |