OV - Compare Index

Help is available by moving the cursor above any symbol or by checking MAQAO website.

▶Compared Reports

r0: tbb_1

Global Metrics

Metric		r0
Total Time (s)		213.19
Max (Thread Active Time) (s)		193.88
Average Active Time (s)		193.88
Activity Ratio (%)		90.9
Average number of active threads		0.909
Affinity Stability (%)		99.9
Time in analyzed loops (%)		34.5
Time in analyzed innermost loops (%)		19.2
Time in user code (%)		48.1
Compilation Options Score (%)		100
Array Access Efficiency (%)		87.2

Potential Speedups
Perfect Flow Complexity		1.02
Perfect OpenMP + MPI + Pthread		1.00
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		1.00
No Scalar Integer	Potential Speedup	1.03
No Scalar Integer	Nb Loops to get 80%	8
FP Vectorised	Potential Speedup	1.00
FP Vectorised	Nb Loops to get 80%	3
Fully Vectorised	Potential Speedup	1.24
Fully Vectorised	Nb Loops to get 80%	17
Only FP Arithmetic	Potential Speedup	1.08
Only FP Arithmetic	Nb Loops to get 80%	19

Cumulated Speedup If No Scalar Integer

Cumulated Speedup If FP Vectorized

Cumulated Speedup If Fully Vectorized

Cumulated Speedup If Only FP Arithmetic

Loop Based Profiles

Innermost / Single Loops

Inbetween Loops

Outermost Loops

Cumulated Coverage With All Loops

Innermost Loop Based Profiles

Coverage

Count

Application Categorization

Time

Coverage

Compilation Options

Source Object	Issue
▼libasolve_test_helpers.so–
▼structured_grid.cpp–
○
▼libfinite_elements.so–
▼generic_elements.tpp–
○
▼MapBase.h–
○
▼material_brick.hpp–
○
▼TensorMap.h–
○
▼AssignEvaluator.h–
○
▼PlainObjectBase.h–
○
▼std_function.h–
○
▼GeneralProduct.h–
○
▼generic_elements.hpp–
○
▼singleton.hpp–
○
▼stl_vector.h–
○
▼element_U.tpp–
○
▼InverseImpl.h–
○
▼Memory.h–
○
▼TensorDeviceDefault.h–
○
▼libmesh.so–
▼mesh.cpp–
○
▼mesh_sets.cpp–
○
▼libdofs.so–
▼dof_list.cpp–
○
▼stl_vector.h–
○
▼dof_type_recorder.cpp–
○
▼MapBase.h–
○
▼stl_iterator.h–
○
▼dof.cpp–
○
▼multithreading_assembly_perf_test–
▼basic_string.tcc–
○
▼enumerable_thread_specific.h–
○
▼parallel_for.h–
○
▼sparse_matrix_utilities.hpp–
○
▼partitioner.h–
○
▼finite_elements.hpp–
○
▼assembler.hpp–
○
▼shared_ptr_base.h–
○
▼multithreading_assembly_perf_test.cpp–
○
▼stl_iterator.h–
○
▼sparse_matrix.hpp–
○
▼libfe_space.so–
▼object_factory.hpp–
○
▼hashtable.h–
○
▼stl_tree.h–
○
▼fespace.cpp–
○
▼partitions.cpp–
○
▼libnon_linear_solvers.so–
▼DenseStorage.h–
○

Path Count Profiles

Coverage

Count

Low Iteration Count Profiles

Coverage

Count

Average Number of Active Threads

Run 1 - tbb_1

Experiment Summaries

	r0
Application	./multithreading_assembly_perf_test
Timestamp	2025-05-20 12:28:58
Experiment Type	MPI;
Machine	be-seq028
Architecture	x86_64
Micro Architecture	ZEN_V4
Model Name	AMD EPYC 9534 64-Core Processor
Cache Size	1024 KB
Number of Cores	64
Maximal Frequency	3.718066 GHz
OS Version	Linux 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Wed Apr 5 13:35:01 EDT 2023
Architecture used during static analysis	x86_64
Micro Architecture used during static analysis	ZEN_V4
Compilation Options	libasolve_test_helpers.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libdofs.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfe_space.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libfinite_elements.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libmesh.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC libnon_linear_solvers.so: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops -fPIC multithreading_assembly_perf_test: GNU C++20 13.2.0 -march=znver4 -g3 -O3 -std=c++20 -fno-omit-frame-pointer -fopenmp -funroll-loops
Number of processes observed	1
Number of threads observed	1
Frequency Driver	acpi-cpufreq
Frequency Governor	performance
Huge Pages	always
Hyperthreading	off
Number of sockets	2
Number of cores per socket	64
MAQAO version	2.21.2
MAQAO build	605a656e415499abc9db62b5ea50b183e5295485::20250228-110016
Comments	-

Report Configuration

▶Compared Reports

Global Metrics

Cumulated Speedup If No Scalar Integer

Cumulated Speedup If FP Vectorized

Cumulated Speedup If Fully Vectorized

Cumulated Speedup If Only FP Arithmetic

Loop Based Profiles

Innermost / Single Loops

Inbetween Loops

Outermost Loops

Cumulated Coverage With All Loops

Innermost Loop Based Profiles

Coverage

Count

Application Categorization

Time

Coverage

Compilation Options

Path Count Profiles

Coverage

Count

Low Iteration Count Profiles

Coverage

Count

Average Number of Active Threads

Run 1 - tbb_1

Experiment Summaries