Help is available by moving the cursor above any symbol or by checking MAQAO website.
Total Time (s) | 126.84 | ||
Max (Thread Active Time) (s) | 11.86 | ||
Average Active Time (s) | 4.70 | ||
Activity Ratio (%) | 3.73 | ||
Average number of active threads | 4.747 | ||
Affinity Stability (%) | 4.40 | ||
Time in analyzed loops (%) | 29.4 | ||
Time in analyzed innermost loops (%) | 22.0 | ||
Time in user code (%) | 45.2 | ||
Compilation Options Score (%) | 100 | ||
Array Access Efficiency (%) | 82.0 | ||
Potential Speedups | ![]() | ||
Perfect Flow Complexity | 1.00 | ||
Perfect OpenMP + MPI + Pthread | 1.00 | ||
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 2.55 | ||
No Scalar Integer | Potential Speedup | 1.05 | |
Nb Loops to get 80% | 8 | ||
FP Vectorised | Potential Speedup | 1.02 | |
Nb Loops to get 80% | 4 | ||
Fully Vectorised | Potential Speedup | 1.12 | |
Nb Loops to get 80% | 12 | ||
FP Arithmetic Only | Potential Speedup | 1.14 | |
Nb Loops to get 80% | 11 |
Source Object | Issue |
---|---|
▼libassembly.so | |
○Kokkos_OpenMP_Parallel_Scan.hpp | |
○finite_elements.hpp | |
○Kokkos_OpenMP_Parallel_For.hpp | |
▼libfinite_elements.so | |
○PacketMath.h | |
○MapBase.h | |
○material_brick.hpp | |
○GeneralMatrixMatrix.h | |
○GeneralMatrixVector.h | |
○GeneralProduct.h | |
○generic_elements.hpp | |
○GemmKernel.h | |
○stl_vector.h | |
○GeneralBlockPanelKernel.h | |
○Matrix.h | |
○element_U.tpp | |
○TensorDeviceDefault.h | |
▼libamat.so | |
○behavior_base.hpp | |
○behavior_integrator_direct.hpp | |
○TensorMap.h | |
○behavior_base.cpp | |
○GeneralMatrixVector.h | |
○integration_point_data_view.cpp | |
○elastic_behavior.cpp | |
○TensorExecutor.h | |
○material_context.cpp | |
○ProductEvaluators.h | |
▼libdofs.so | |
○dof_list.cpp | |
○stl_vector.h | |
○MapBase.h | |
○stl_iterator.h | |
○dof.cpp | |
▼multithreading_assembly_perf_test | |
○std_function.h | |
▼libboundary_conditions.so | |
○GemmKernel.h |
Experiment Name | direct assembly 128 threads | ||||
Application | ./multithreading_assembly_perf_test | ||||
Timestamp | 2025-07-30 12:07:29 | Universal Timestamp | 1753870049 | ||
Number of processes observed | 1 | Number of threads observed | 128 | ||
Experiment Type | OpenMP; | ||||
Machine | be-par054 | ||||
Model Name | AMD EPYC 9534 64-Core Processor | ||||
Architecture | x86_64 | Micro Architecture | ZEN_V4 | ||
Cache Size | 1024 KB | Number of Cores | 64 | ||
OS Version | Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023 | ||||
Architecture used during static analysis | x86_64 | Micro Architecture used during static analysis | ZEN_V4 | ||
Frequency Driver | acpi-cpufreq | Frequency Governor | performance | ||
Huge Pages | always | Hyperthreading | off | ||
Number of sockets | 2 | Number of cores per socket | 64 | ||
Compilation Options | libamat.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libassembly.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fPIC -fopenmp libboundary_conditions.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libdofs.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fopenmp | ||||
Comments |
Dataset | |
Run Command | <executable> --method direct --ncut 280 --max_threads=128 --min_threads=128 |
Number Processes | 1 |
Number Nodes | 1 |
Filter | Not Used |
Profile Start | Not Used |
Maximal Path Number | 4 |