OV - multithreading_assembly_perf_test

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Total Time (s)		65.15
Max (Thread Active Time) (s)		8.58
Average Active Time (s)		8.54
Activity Ratio (%)		26.9
Average number of active threads		16.775
Affinity Stability (%)		27.0
Time in analyzed loops (%)		4.34
Time in analyzed innermost loops (%)		2.61
Time in user code (%)		5.54
Compilation Options Score (%)		99.9
Array Access Efficiency (%)		96.2

Potential Speedups
Perfect Flow Complexity		1.00
Perfect OpenMP + MPI + Pthread		1.00
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		1.00
No Scalar Integer	Potential Speedup	1.00
No Scalar Integer	Nb Loops to get 80%	2
FP Vectorised	Potential Speedup	1.00
FP Vectorised	Nb Loops to get 80%	2
Fully Vectorised	Potential Speedup	1.02
Fully Vectorised	Nb Loops to get 80%	1
FP Arithmetic Only	Potential Speedup	1.00
FP Arithmetic Only	Nb Loops to get 80%	5

Source Object	Issue
▼[vdso]–
▼–
○	-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○	-O2, -O3 or -Ofast is missing.
○	-march=(target) is missing.
▼libfinite_elements.so–
○InverseImpl.h
○element_U.tpp
○TensorMap.h
○GeneralProduct.h
○generic_elements.hpp
○stl_vector.h
○AssignEvaluator.h
○TensorDeviceDefault.h
○MapBase.h
○PlainObjectBase.h
▼libdofs.so–
○dof_list.cpp
○dof.cpp
○MapBase.h
○stl_vector.h
▼multithreading_assembly_perf_test–
○enumerable_thread_specific.h
○finite_elements.hpp
○assembler.hpp
▼libnon_linear_solvers.so–
▼–
○	-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○	-O2, -O3 or -Ofast is missing.
○	-march=(target) is missing.

Application	./multithreading_assembly_perf_test
Timestamp	2025-05-20 10:57:29	Universal Timestamp	1747731449
Number of processes observed	1	Number of threads observed	128
Experiment Type	MPI; OpenMP;
Machine	be-seq033
Model Name	AMD EPYC 9534 64-Core Processor
Architecture	x86_64	Micro Architecture	ZEN_V4
Cache Size	1024 KB	Number of Cores	64
OS Version	Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	ZEN_V4
Frequency Driver	acpi-cpufreq	Frequency Governor	performance
Huge Pages	always	Hyperthreading	off
Number of sockets	2	Number of cores per socket	64
Compilation Options	+ [vdso]: N/A libdofs.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libnon_linear_solvers.so: N/A multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops

Dataset
Run Command	<executable> --max_threads <OMP_NUM_THREADS> --ncut 200 --method ColMutexes --storage SparseCOO
MPI Command	mpirun -n <number_processes> --map-by slot:PE=<OMP_NUM_THREADS> --bind-to core
Number Processes	1
Number Nodes	1
Filter	Not Used
Profile Start	Not Used