Help is available by moving the cursor above any
symbol or by checking MAQAO website.
| Total Time (s) | 123.05 | ||
| Max (Thread Active Time) (s) | 10.04 | ||
| Average Active Time (s) | 5.31 | ||
| Activity Ratio (%) | 4.36 | ||
| Average number of active threads | 2.763 | ||
| Affinity Stability (%) | 4.52 | ||
| Time in analyzed loops (%) | 41.3 | ||
| Time in analyzed innermost loops (%) | 31.3 | ||
| Time in user code (%) | 60.5 | ||
| Compilation Options Score (%) | 100 | ||
| Array Access Efficiency (%) | 82.4 | ||
| Potential Speedups | |||
| Perfect Flow Complexity | 1.00 | ||
| Perfect OpenMP + MPI + Pthread | 1.00 | ||
| Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.90 | ||
| No Scalar Integer | Potential Speedup | 1.06 | |
| Nb Loops to get 80% | 9 | ||
| FP Vectorised | Potential Speedup | 1.02 | |
| Nb Loops to get 80% | 3 | ||
| Fully Vectorised | Potential Speedup | 1.15 | |
| Nb Loops to get 80% | 14 | ||
| FP Arithmetic Only | Potential Speedup | 1.21 | |
| Nb Loops to get 80% | 11 | ||
| Source Object | Issue |
|---|---|
| ▼libassembly.so | |
| ○Kokkos_OpenMP_Parallel_Scan.hpp | |
| ○finite_elements.hpp | |
| ○Kokkos_OpenMP_Parallel_For.hpp | |
| ▼libfinite_elements.so | |
| ○PacketMath.h | |
| ○MapBase.h | |
| ○material_brick.hpp | |
| ○GeneralMatrixMatrix.h | |
| ○GeneralMatrixVector.h | |
| ○GeneralProduct.h | |
| ○generic_elements.hpp | |
| ○GemmKernel.h | |
| ○stl_vector.h | |
| ○GeneralBlockPanelKernel.h | |
| ○Matrix.h | |
| ○element_U.tpp | |
| ○TensorDeviceDefault.h | |
| ▼libamat.so | |
| ○behavior_base.hpp | |
| ○behavior_integrator_direct.hpp | |
| ○TensorMap.h | |
| ○behavior_base.cpp | |
| ○GeneralMatrixVector.h | |
| ○integration_point_data_view.cpp | |
| ○ProductEvaluators.h | |
| ○TensorExecutor.h | |
| ○material_context.cpp | |
| ○elastic_behavior.cpp | |
| ▼libdofs.so | |
| ○dof_list.cpp | |
| ○stl_vector.h | |
| ○MapBase.h | |
| ○stl_iterator.h | |
| ○dof.cpp | |
| ▼multithreading_assembly_perf_test | |
| ○std_function.h | |
| ○basic_string.tcc | |
| ▼libboundary_conditions.so | |
| ○GemmKernel.h |
| Experiment Name | direct assembly 64 threads | ||||
| Application | ./multithreading_assembly_perf_test | ||||
| Timestamp | 2025-07-30 12:13:15 | Universal Timestamp | 1753870395 | ||
| Number of processes observed | 1 | Number of threads observed | 64 | ||
| Experiment Type | OpenMP; | ||||
| Machine | be-par054 | ||||
| Model Name | AMD EPYC 9534 64-Core Processor | ||||
| Architecture | x86_64 | Micro Architecture | ZEN_V4 | ||
| Cache Size | 1024 KB | Number of Cores | 64 | ||
| OS Version | Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023 | ||||
| Architecture used during static analysis | x86_64 | Micro Architecture used during static analysis | ZEN_V4 | ||
| Frequency Driver | acpi-cpufreq | Frequency Governor | performance | ||
| Huge Pages | always | Hyperthreading | off | ||
| Number of sockets | 2 | Number of cores per socket | 64 | ||
| Compilation Options | libamat.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libassembly.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fPIC -fopenmp libboundary_conditions.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libdofs.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fopenmp | ||||
| Comments | |||||
| Dataset | |
| Run Command | <executable> --method direct --ncut 280 --max_threads=64 --min_threads=64 |
| Number Processes | 1 |
| Number Nodes | 1 |
| Filter | Not Used |
| Profile Start | Not Used |
| Maximal Path Number | 4 |