OV - exec - Global

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Total Time (s)		44.69
Max (Thread Active Time) (s)		27.53
Average Active Time (s)		27.21
Activity Ratio (%)		97.6
Average number of active threads		116.901
Affinity Stability (%)		99.0
GFLOPS		16.875
Time in analyzed loops (%)		15.2
Time in analyzed innermost loops (%)		15.0
Time in user code (%)		15.4
Compilation Options Score (%)		87.4
Array Access Efficiency (%)		83.3

Potential Speedups
Perfect Flow Complexity		1.00
Perfect OpenMP/MPI/Pthread/TBB		2.15
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution		6.52
No Scalar Integer	Potential Speedup	1.00
No Scalar Integer	Nb Loops to get 80%	2
FP Vectorised	Potential Speedup	1.00
FP Vectorised	Nb Loops to get 80%	3
Fully Vectorised	Potential Speedup	1.03
Fully Vectorised	Nb Loops to get 80%	1
FP Arithmetic Only	Potential Speedup	1.15
FP Arithmetic Only	Nb Loops to get 80%	1

Source Object	Issue
▼libllama.so–
○llama-vocab.cpp	-O3 or -Ofast is missing.
▼libggml-cpu.so–
○binary-ops.cpp	-O3 or -Ofast is missing.
○amx.cpp	-O3 or -Ofast is missing.
○ops.cpp	-O3 or -Ofast is missing.
○vec.cpp	-O3 or -Ofast is missing.
○mmq.cpp	-O3 or -Ofast is missing.
○common.h	-O3 or -Ofast is missing.
○ggml-cpu.c	-O3 or -Ofast is missing.
○quants.c	-O3 or -Ofast is missing.
▼libggml-base.so–
▼–
○	-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○	-O2, -O3 or -Ofast is missing.
○	-march=(target) is missing.
▼exec–
○sampling.cpp	-O3 or -Ofast is missing.

Application	/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/binaries/gcc_6/exec
Timestamp	2025-09-30 20:08:08	Universal Timestamp	1759255688
Number of processes observed	1	Number of threads observed	192
Experiment Type	MPI; OpenMP;
Machine	isix06.benchmarkcenter.megware.com
Model Name	Intel(R) Xeon(R) 6972P
Architecture	x86_64	Micro Architecture	GRANITE_RAPIDS
Cache Size	491520 KB	Number of Cores	96
OS Version	Linux 5.14.0-570.39.1.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep 4 05:08:52 EDT 2025
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	GRANITE_RAPIDS
Frequency Driver	intel_pstate	Frequency Governor	performance
Huge Pages	always	Hyperthreading	on
Number of sockets	2	Number of cores per socket	96
Compilation Options	exec: GNU C++17 14.2.0 -march=graniterapids -g -O2 -funroll-loops -ffast-math -fno-omit-frame-pointer -fcf-protection=none -fno-finite-math-only -fPIC libggml-base.so: N/A libggml-cpu.so: GNU C++17 14.2.0 -march=graniterapids -g -O2 -std=gnu++17 -funroll-loops -ffast-math -fno-omit-frame-pointer -fcf-protection=none -fno-finite-math-only -fPIC -fopenmp libllama.so: GNU C++17 14.2.0 -march=graniterapids -g -O2 -funroll-loops -ffast-math -fno-omit-frame-pointer -fcf-protection=none -fno-finite-math-only -fPIC

Dataset
Run Command	<executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p "what is a LLM?" --seed 0
MPI Command	mpirun -n <number_processes>
Number Processes	1
Number Nodes	1
Filter	Not Used
Profile Start	Not Used
Profile Stop	Not Used