OV - direct assembly sequential

multithreading_assembly_perf_test - 2025-07-30 12:18:57 - MAQAO 2025.1.0

Help is available by moving the cursor above any symbol or by checking MAQAO website.

▶Filter Information

There is no filter information to display

Global Metrics

Total Time (s)		403.20
Max (Thread Active Time) (s)		290.91
Average Active Time (s)		290.91
Activity Ratio (%)		72.1
Average number of active threads		0.721
Affinity Stability (%)		72.1
Time in analyzed loops (%)		48.9
Time in analyzed innermost loops (%)		35.5
Time in user code (%)		68.7
Compilation Options Score (%)		100
Array Access Efficiency (%)		82.8

Potential Speedups
Perfect Flow Complexity		1.00
Perfect OpenMP + MPI + Pthread		1.00
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		1.00
No Scalar Integer	Potential Speedup	1.08
No Scalar Integer	Nb Loops to get 80%	8
FP Vectorised	Potential Speedup	1.02
FP Vectorised	Nb Loops to get 80%	3
Fully Vectorised	Potential Speedup	1.20
Fully Vectorised	Nb Loops to get 80%	13
FP Arithmetic Only	Potential Speedup	1.26
FP Arithmetic Only	Nb Loops to get 80%	12

CQA Potential Speedups Summary

Average Active Threads Count⏎

Loop Based Profile⏎

Innermost Loop Based Profile⏎

Application Categorization⏎

Compilation Options⏎

Source Object	Issue
▼libassembly.so–
○Kokkos_OpenMP_Parallel_Scan.hpp
○finite_elements.hpp
○Kokkos_OpenMP_Parallel_For.hpp
▼libfinite_elements.so–
○PacketMath.h
○MapBase.h
○material_brick.hpp
○GeneralMatrixMatrix.h
○GeneralMatrixVector.h
○GeneralProduct.h
○generic_elements.hpp
○GemmKernel.h
○stl_vector.h
○GeneralBlockPanelKernel.h
○Matrix.h
○element_U.tpp
○TensorDeviceDefault.h
▼libamat.so–
○behavior_base.hpp
○behavior_integrator_direct.hpp
○TensorMap.h
○behavior_base.cpp
○GeneralMatrixVector.h
○integration_point_data_view.cpp
○elastic_behavior.cpp
○TensorExecutor.h
○material_context.cpp
○ProductEvaluators.h
▼libdofs.so–
○dof_list.cpp
○stl_vector.h
○MapBase.h
○stl_iterator.h
○dof.cpp
▼multithreading_assembly_perf_test–
○std_function.h
○basic_string.tcc
▼libboundary_conditions.so–
○GemmKernel.h

Loop Path Count Profile⏎

Cumulated Speedup If No Scalar Integer⏎

Cumulated Speedup If FP Vectorized⏎

Cumulated Speedup If Fully Vectorized⏎

Cumulated Speedup If FP Arithmetic Only⏎

Experiment Summary

Experiment Name	direct assembly sequential
Application	./multithreading_assembly_perf_test
Timestamp	2025-07-30 12:18:57	Universal Timestamp	1753870737
Number of processes observed	1	Number of threads observed	1
Experiment Type	Sequential
Machine	be-par054
Model Name	AMD EPYC 9534 64-Core Processor
Architecture	x86_64	Micro Architecture	ZEN_V4
Cache Size	1024 KB	Number of Cores	64
OS Version	Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	ZEN_V4
Frequency Driver	acpi-cpufreq	Frequency Governor	performance
Huge Pages	always	Hyperthreading	off
Number of sockets	2	Number of cores per socket	64
Compilation Options	libamat.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libassembly.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fPIC -fopenmp libboundary_conditions.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libdofs.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -mprefer-vector-width=256 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -funroll-loops -fopenmp
Comments

Configuration Summary

Dataset
Run Command	<executable> --method direct --ncut 280 --max_threads=1 --min_threads=1
Number Processes	1
Number Nodes	1
Filter	Not Used
Profile Start	Not Used
Maximal Path Number	4

Report Configuration

multithreading_assembly_perf_test - 2025-07-30 12:18:57 - MAQAO 2025.1.0

▶Filter Information

Global Metrics

CQA Potential Speedups Summary

Average Active Threads Count⏎

Loop Based Profile⏎

Innermost Loop Based Profile⏎

Application Categorization⏎

Compilation Options⏎

Loop Path Count Profile⏎

Cumulated Speedup If No Scalar Integer⏎

Cumulated Speedup If FP Vectorized⏎

Cumulated Speedup If Fully Vectorized⏎

Cumulated Speedup If FP Arithmetic Only⏎

Experiment Summary

Configuration Summary