OV - exec - Global

Help is available by moving the cursor above any symbol or by checking MAQAO website.

1 threads covering less than 1% of profiled time ( = Max (Thread Active Time)) were discarded, cumulating 0.03 seconds CPU time. You can adjust the threshold below which a thread will be discarded with the thread-filter-threshold option.

Total Time (s)		40.13
Max (Thread Active Time) (s)		25.63
Average Active Time (s)		10.56
Activity Ratio (%)		26.6
Average number of active threads		18.945
Affinity Stability (%)		0
Time in analyzed loops (%)		10.9
Time in analyzed innermost loops (%)		10.8
Time in user code (%)		11.0
Compilation Options Score (%)		74.6
Array Access Efficiency (%)		71.6

Potential Speedups
Perfect Flow Complexity		1.00
Perfect OpenMP + MPI + Pthread		3.27
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		13.8
No Scalar Integer	Potential Speedup	1.00
No Scalar Integer	Nb Loops to get 80%	1
FP Vectorised	Potential Speedup	1.00
FP Vectorised	Nb Loops to get 80%	1
Fully Vectorised	Potential Speedup	1.00
Fully Vectorised	Nb Loops to get 80%	1
FP Arithmetic Only	Potential Speedup	1.00
FP Arithmetic Only	Nb Loops to get 80%	2

Source Object	Issue
▼libllama.so–
▼llama-sampling.cpp–
○	-O3 or -Ofast is missing.
○	-mcpu=native is missing.
○	-funroll-loops is missing.
▼exec–
▼sampling.cpp–
○	-O3 or -Ofast is missing.
○	-mcpu=native is missing.
○	-funroll-loops is missing.
▼libggml-base.so–
▼–
○	-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○	-O2, -O3 or -Ofast is missing.
○	-mcpu=native is missing.
▼libggml-cpu.so–
○vec.cpp	-funroll-loops is missing.
○quants.c	-funroll-loops is missing.

Application	/scratch/users/amazouz/QAAS/service/Llama.cpp/ortce-gh/175-931-3387/llama.cpp/run/base_runs/defaults/gcc/exec
Timestamp	2025-10-01 03:16:54	Universal Timestamp	1759313814
Number of processes observed	1	Number of threads observed	72
Experiment Type	MPI; OpenMP;
Machine	ortce-gh
Architecture	aarch64	Micro Architecture	ARM_NEOVERSE_V2
OS Version	Linux 6.8.0-84-generic-64k #84-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 5 15:19:10 UTC 2025
Architecture used during static analysis	aarch64	Micro Architecture used during static analysis	ARM_NEOVERSE_V2
Frequency Driver	cppc_cpufreq	Frequency Governor	ondemand
Huge Pages	always	Hyperthreading	off
Number of sockets	1	Number of cores per socket	72
Compilation Options	exec: GNU C++17 14.2.0 -mlittle-endian -mabi=lp64 -g -O3 -O3 -fno-omit-frame-pointer -fcf-protection=none -fPIC GNU C17 14.2.0 -mlittle-endian -mabi=lp64 -g -g -g -O2 -O2 -O2 -fbuilding-libgcc -fno-stack-protector -fPIC libggml-base.so: N/A libggml-cpu.so: GNU C11 14.2.0 -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+sve2-sm4+norng+dotprod+i8mm+sve+nosme -mlittle-endian -mabi=lp64 -g -O3 -O3 -std=gnu11 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fopenmp libllama.so: GNU C++17 14.2.0 -mlittle-endian -mabi=lp64 -g -O3 -O3 -fno-omit-frame-pointer -fcf-protection=none -fPIC GNU C17 14.2.0 -mlittle-endian -mabi=lp64 -g -g -g -O2 -O2 -O2 -fbuilding-libgcc -fno-stack-protector -fPIC

Dataset
Run Command	<executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 72 -n 512 -p "what is a LLM?" --seed 0
MPI Command	mpirun -n <number_processes> --bind-to none --report-bindings
Number Processes	1
Number Nodes	1
Filter	Not Used
Profile Start	Not Used