Help is available by moving the cursor above any symbol or by checking MAQAO website.
Total Time (s) | 403.20 | ||
Max (Thread Active Time) (s) | 290.91 | ||
Average Active Time (s) | 290.91 | ||
Activity Ratio (%) | 72.1 | ||
Average number of active threads | 0.721 | ||
Affinity Stability (%) | 72.1 | ||
Time in analyzed loops (%) | 48.9 | ||
Time in analyzed innermost loops (%) | 35.5 | ||
Time in user code (%) | 68.7 | ||
Compilation Options Score (%) | 100 | ||
Array Access Efficiency (%) | 82.8 | ||
Potential Speedups | ![]() | ||
Perfect Flow Complexity | 1.00 | ||
Perfect OpenMP + MPI + Pthread | 1.00 | ||
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.00 | ||
No Scalar Integer | Potential Speedup | 1.08 | |
Nb Loops to get 80% | 8 | ||
FP Vectorised | Potential Speedup | 1.02 | |
Nb Loops to get 80% | 3 | ||
Fully Vectorised | Potential Speedup | 1.20 | |
Nb Loops to get 80% | 13 | ||
FP Arithmetic Only | Potential Speedup | 1.26 | |
Nb Loops to get 80% | 12 |
Source Object | Issue |
---|---|
▼libassembly.so | |
○Kokkos_OpenMP_Parallel_Scan.hpp | |
○finite_elements.hpp | |
○Kokkos_OpenMP_Parallel_For.hpp | |
▼libfinite_elements.so | |
○PacketMath.h | |
○MapBase.h | |
○material_brick.hpp | |
○GeneralMatrixMatrix.h | |
○GeneralMatrixVector.h | |
○GeneralProduct.h | |
○generic_elements.hpp | |
○GemmKernel.h | |
○stl_vector.h | |
○GeneralBlockPanelKernel.h | |
○Matrix.h | |
○element_U.tpp | |
○TensorDeviceDefault.h | |
▼libamat.so | |
○behavior_base.hpp | |
○behavior_integrator_direct.hpp | |
○TensorMap.h | |
○behavior_base.cpp | |
○GeneralMatrixVector.h | |
○integration_point_data_view.cpp | |
○elastic_behavior.cpp | |
○TensorExecutor.h | |
○material_context.cpp | |
○ProductEvaluators.h | |
▼libdofs.so | |
○dof_list.cpp | |
○stl_vector.h | |
○MapBase.h | |
○stl_iterator.h | |
○dof.cpp | |
▼multithreading_assembly_perf_test | |
○std_function.h | |
○basic_string.tcc | |
▼libboundary_conditions.so | |
○GemmKernel.h |
Experiment Name | direct assembly sequential | ||||
Application | ./multithreading_assembly_perf_test | ||||
Timestamp | 2025-07-30 12:18:57 | Universal Timestamp | 1753870737 | ||
Number of processes observed | 1 | Number of threads observed | 1 | ||
Experiment Type | Sequential | ||||
Machine | be-par054 | ||||
Model Name | AMD EPYC 9534 64-Core Processor | ||||
Architecture | x86_64 | Micro Architecture | ZEN_V4 | ||
Cache Size | 1024 KB | Number of Cores | 64 | ||
OS Version | Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023 | ||||
Architecture used during static analysis | x86_64 | Micro Architecture used during static analysis | ZEN_V4 | ||
Frequency Driver | acpi-cpufreq | Frequency Governor | performance | ||
Huge Pages | always | Hyperthreading | off | ||
Number of sockets | 2 | Number of cores per socket | 64 | ||
Compilation Options | libamat.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libassembly.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fPIC -fopenmp libboundary_conditions.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libdofs.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fopenmp | ||||
Comments |
Dataset | |
Run Command | <executable> --method direct --ncut 280 --max_threads=1 --min_threads=1 |
Number Processes | 1 |
Number Nodes | 1 |
Filter | Not Used |
Profile Start | Not Used |
Maximal Path Number | 4 |