OV - Compare Loops

Loops

▶mmq.cpp: 1138 - 24.99 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	GFLOP/s
Run orig_default							Run aocc_default		Run icx_1		Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1138-1151 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 2035-2040 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 2060-2074 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 2093-2101 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 2146-2152 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 2488-2488 /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_algobase.h: 235-235						Loop Source Regions		Loop Source Regions		Loop Source Regions
565	0.02	0.00	0.26	0	0	515.64
577	0.51	0.42	24.73	0	0	667.28

Sum on 2 analyzed binary loops (libggml-cpu.so - 565, libggml-cpu.so - 577)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis	Count	Analysis	Count	Analysis	Count
Control Flow Issues
Vectorization Roadblocks
Presence of more than 4 paths						1

▶mmq.cpp: 303 - 7.42 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-330 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 651-663 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 816-818 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 825-825 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-330 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 651-663 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 816-818 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 825-825 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-330 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 651-663 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 816-818 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 825-825 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-330 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 651-663 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 816-818 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 825-825 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392
659	0.07	0.03	1.74	90.91	38.76	0	384	0.07	0.03	1.95	90.91	38.76	0	673	0.06	0.03	1.96	90.91	38.76	0	386	0.06	0.03	1.77	90.91	38.76	0

Sum on 1 analyzed binary loop (libggml-cpu.so - 659)							Sum on 1 analyzed binary loop (libggml-cpu.so - 384)							Sum on 1 analyzed binary loop (libggml-cpu.so - 673)							Sum on 1 analyzed binary loop (libggml-cpu.so - 386)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues							Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1
Data Access Issues							Data Access Issues							Data Access Issues							Data Access Issues
Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1
Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1
More than 20% of the loads are accessing the stack						1	More than 20% of the loads are accessing the stack						1	More than 20% of the loads are accessing the stack						1	More than 20% of the loads are accessing the stack						1
Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						1
Inefficient Vectorization							Inefficient Vectorization							Inefficient Vectorization							Inefficient Vectorization
Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1

▶<unknown>: 0 - 1.73 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions							Loop Source Regions							Loop Source Regions							Loop Source Regions
2392	0.00	0.00	0.00	0	0	0	2640	0.01	0.00	0.01	0	0	0	1073	0.01	0.00	0.00	0	0	0	2917	0.00	0.00	0.00	0	0	0
2134	0.00	0.00	0.00	0	0	0	2628	0.01	0.00	0.00	0	0	0	2416	0.02	0.00	0.01	0	0	0	1774	0.00	0.00	0.00	0	0	0
2289	0.01	0.00	0.01	0	0	0	2450	0.01	0.00	0.00	0	0	0	2409	0.01	0.00	0.00	0	0	0	2792	0.01	0.00	0.01	0	0	0
2395	0.01	0.00	0.01	0	0	0	2638	0.02	0.00	0.01	0	0	0	2407	0.01	0.00	0.00	0	0	0	2904	0.00	0.00	0.00	0	0	0
2155	0.02	0.00	0.01	0	0	0	2741	0.02	0.00	0.01	0	0	0	2519	0.01	0.00	0.01	0	0	0	3014	0.01	0.00	0.01	0	0	0
2294	0.02	0.00	0.01	0	0	0	816	0.01	0.00	0.09	0	0	100.83	2633	0.01	0.00	0.00	0	0	0	2906	0.00	0.00	0.00	0	0	0
2298	0.00	0.00	0.00	0	0	0	695	0.00	0.00	0.00	0	0	NA	2412	0.01	0.00	0.01	0	0	0	2802	0.02	0.00	0.02	0	0	0
1955	0.01	0.00	0.11	100	50	4.45	107	0.00	0.00	0.00	0	0	NA	1968	0.01	0.00	0.08	0	0	24.21	2804	0.01	0.00	0.01	0	0	0
818	0.00	0.00	0.00	0	0	0	108	0.00	0.00	0.01	0	0	0	194	0.01	0.00	0.00	0	0	0	702	0.00	0.00	0.00	0	0	0
575	0.00	0.00	0.00	0	0	293.1	2105	0.01	0.00	0.03	0	0	790.63	847	0.01	0.00	0.00	0	0	0	697	0.00	0.00	0.00	0	0	2074.15
576	0.00	0.00	0.00	0	0	0	377	0.01	0.00	0.00	0	0	0	743	0.01	0.00	0.00	0	0	0	2032	0.01	0.00	0.00	0	0	0
2250	0.01	0.00	0.01	0	0	114.49	2441	0.00	0.00	0.01	0	0	102.56	2599	0.00	0.00	0.00	0	0	NA	815	0.01	0.00	0.07	0	0	121.7
146	0.01	0.00	0.01	0	0	48.85	2419	0.01	0.00	0.05	0	0	0.67	2028	0.01	0.00	0.03	0	0	437.62	1821	0.01	0.00	0.04	0	0	149
1605	0.01	0.00	0.01	0	0	549.57	2410	0.01	0.00	0.01	0	0	226.09	1762	0.01	0.00	0.02	0	0	319.24	107	0.00	0.00	0.00	0	0	NA
2265	0.01	0.00	0.03	0	0	2.04	705	0.02	0.00	0.10	0	0	3024.65	738	0.01	0.00	0.09	0	0	95.53	703	0.00	0.00	0.00	0	0	NA
719	0.01	0.00	0.08	0	0	101.49	1759	0.01	0.00	0.04	0	0	173.56	2457	0.02	0.00	0.06	0	0	1.03	2136	0.01	0.00	0.08	0	0	266.11
1863	0.00	0.00	0.02	0	0	717.49	838	0.01	0.00	0.00	0	0	0	2453	0.01	0.00	0.03	0	0	20.95	108	0.01	0.00	0.00	0	0	0
570	0.00	0.00	0.00	0	0	NA								1202	0.01	0.00	0.07	0	0	1087.38	379	0.01	0.00	0.00	0	0	0
2254	0.01	0.00	0.04	0	0	1.53								1924	0.00	0.00	0.01	0	0	0	2424	0.00	0.00	0.01	0	0	3.09
														146	0.00	0.00	0.01	0	0	0	2415	0.01	0.00	0.02	0	0	0
														582	0.00	0.00	0.00	0	0	NA	706	0.01	0.00	0.09	0	0	532.26
														570	0.00	0.00	0.00	0	0	NA	707	0.01	0.00	0.10	0	0	2953.2
														2468	0.01	0.00	0.04	0	0	12.18	2128	0.01	0.00	0.04	0	0	349.02

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count

▶binary-ops.cpp: 18 - 1.19 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
833	0.03	0.01	0.30	0	6.25	27.77	914	0.02	0.00	0.29	0	6.25	30.48	874	0.02	0.00	0.28	0	6.25	30.05	913	0.02	0.01	0.31	0	6.25	27.78

Sum on 1 analyzed binary loop (libggml-cpu.so - 833)							Sum on 1 analyzed binary loop (libggml-cpu.so - 914)							Sum on 1 analyzed binary loop (libggml-cpu.so - 874)							Sum on 1 analyzed binary loop (libggml-cpu.so - 913)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues							Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1

▶vec.h: 491 - 1.16 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 491-497						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 491-497						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 491-497						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 491-497
2263	0.02	0.01	0.30	0	0	297.83	2437	0.02	0.00	0.27	0	0	361.43	2465	0.03	0.01	0.36	0	0	238.74	2442	0.02	0.00	0.22	0	0	346.42

Sum on 1 analyzed binary loop (libggml-cpu.so - 2263)							Sum on 1 analyzed binary loop (libggml-cpu.so - 2437)							Sum on 1 analyzed binary loop (libggml-cpu.so - 2465)							Sum on 1 analyzed binary loop (libggml-cpu.so - 2442)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Control Flow Issues							Control Flow Issues							Control Flow Issues							Control Flow Issues
Presence of 2 to 4 paths							Presence of 2 to 4 paths						1	Presence of 2 to 4 paths							Presence of 2 to 4 paths						1
Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
Presence of 2 to 4 paths							Presence of 2 to 4 paths						1	Presence of 2 to 4 paths							Presence of 2 to 4 paths						1

▶vec.h: 1084 - 1.02 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 372-373 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1084-1101 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1113-1116						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 372-373 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1084-1101 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1113-1116						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 372-373 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1084-1101 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1113-1116						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 372-373 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1084-1101 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1115-1115
1147	0.03	0.00	0.28	100	100	950.06	1236	0.02	0.00	0.27	98	98.13	1036.86	1209	0.02	0.00	0.21	100	100	1231.26	1299	0.02	0.00	0.26	98	98.13	1061.15

Sum on 1 analyzed binary loop (libggml-cpu.so - 1147)							Sum on 1 analyzed binary loop (libggml-cpu.so - 1236)							Sum on 1 analyzed binary loop (libggml-cpu.so - 1209)							Sum on 1 analyzed binary loop (libggml-cpu.so - 1299)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues							Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
Presence of expensive FP instructions						1	Presence of expensive FP instructions						1	Presence of expensive FP instructions						1	Presence of expensive FP instructions						1
Control Flow Issues							Control Flow Issues							Control Flow Issues							Control Flow Issues
Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1
Data Access Issues							Data Access Issues							Data Access Issues							Data Access Issues
Presence of constant non-unit stride data access							Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access							Presence of constant non-unit stride data access						1
Presence of special instructions executing on a single port							Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port							Presence of special instructions executing on a single port						1
Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1
Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						1	Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						1
Inefficient Vectorization							Inefficient Vectorization							Inefficient Vectorization							Inefficient Vectorization
Presence of special instructions executing on a single port						0	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						0	Presence of special instructions executing on a single port						1
Use of masked instructions						1	Use of masked instructions						1	Use of masked instructions						1	Use of masked instructions						1

▶mmq.cpp: 1140 - 0.92 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1140-1142 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1150-1151						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1140-1142 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1150-1151						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1140-1142 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1150-1151						Loop Source Regions
566	0.03	0.00	0.28	100	100	288.11	704	0.01	0.00	0.11	100	100	456.3	575	0.02	0.00	0.22	100	100	326.03
567	0.02	0.00	0.15	100	100	2057.96								576	0.02	0.00	0.16	100	100	1996.31

Sum on 1 analyzed binary loop (libggml-cpu.so - 566)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 2 analyzed binary loops (libggml-cpu.so - 575, libggml-cpu.so - 576)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis	Count
Loop Computation Issues														Loop Computation Issues
Presence of a large number of scalar integer instructions						1								Presence of a large number of scalar integer instructions						1
Low iteration count						0								Low iteration count						1
Control Flow Issues														Control Flow Issues
Low iteration count														Low iteration count						1
Data Access Issues														Data Access Issues
Presence of constant non-unit stride data access						1								Presence of constant non-unit stride data access						1
Presence of indirect access						1								Presence of indirect access						1
Vectorization Roadblocks														Vectorization Roadblocks
Presence of constant non-unit stride data access						1								Presence of constant non-unit stride data access						1
Presence of indirect access						1								Presence of indirect access						1

▶quants.c: 298 - 0.80 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
3016	0.32	0.00	0.20	59.66	29.26	401.73	3307	0.35	0.00	0.23	58.33	28.75	376.53	3262	0.31	0.00	0.20	60.7	29.66	413.09	3325	0.25	0.00	0.17	60.7	29.66	505.47

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (libggml-cpu.so - 3307)							Sum on 1 analyzed binary loop (libggml-cpu.so - 3262)							Sum on 1 analyzed binary loop (libggml-cpu.so - 3325)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
							Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
							Presence of expensive FP instructions						1	Presence of expensive FP instructions						1	Presence of expensive FP instructions						1
							Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						0	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						0
							Control Flow Issues							Control Flow Issues							Control Flow Issues
							Presence of 2 to 4 paths						0	Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1
							Presence of more than 4 paths						1	Presence of more than 4 paths						0	Presence of more than 4 paths						0
							Data Access Issues							Data Access Issues							Data Access Issues
							More than 10% of the vector loads instructions are unaligned						1	More than 10% of the vector loads instructions are unaligned						1	More than 10% of the vector loads instructions are unaligned						1
							Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1
							Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
							Presence of 2 to 4 paths						0	Presence of 2 to 4 paths						1	Presence of 2 to 4 paths						1
							Presence of more than 4 paths						1	Presence of more than 4 paths						0	Presence of more than 4 paths						0
							Inefficient Vectorization							Inefficient Vectorization							Inefficient Vectorization
							Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1

▶ops.cpp: 4325 - 0.79 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
1610	0.02	0.00	0.22	0	7.81	67.75	1758	0.02	0.00	0.23	0	7.81	71.3	1767	0.01	0.00	0.18	0	7.81	80.75	1820	0.01	0.00	0.17	100	31.25	37.76

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (libggml-cpu.so - 1758)							Sum on 1 analyzed binary loop (libggml-cpu.so - 1767)							Sum on 1 analyzed binary loop (libggml-cpu.so - 1820)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count
							Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
							Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1

▶ops.cpp: 6220 - 0.78 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6222 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6229-6230 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6220 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6229-6230 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6222 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6229-6230 /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245						Loop Source Regions
1888	0.02	0.00	0.26	15.56	9.17	267.89	2113	0.03	0.00	0.29	1.96	6.62	353.98	2061	0.02	0.00	0.23	15.56	9.17	296.43

Sum on 1 analyzed binary loop (libggml-cpu.so - 1888)							Sum on 1 analyzed binary loop (libggml-cpu.so - 2113)							Sum on 1 analyzed binary loop (libggml-cpu.so - 2061)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count	Analysis	Count
Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
Presence of expensive FP instructions						1	Presence of expensive FP instructions						1	Presence of expensive FP instructions						1
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
Control Flow Issues							Control Flow Issues							Control Flow Issues
Presence of calls						1	Presence of calls						1	Presence of calls						1
Data Access Issues							Data Access Issues							Data Access Issues
Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						1
More than 20% of the loads are accessing the stack						1	More than 20% of the loads are accessing the stack						1	More than 20% of the loads are accessing the stack						1
Vectorization Roadblocks							Vectorization Roadblocks							Vectorization Roadblocks
Presence of calls						1	Presence of calls						1	Presence of calls						1
Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						0	Presence of constant non-unit stride data access						1

▶ggml-cpu.c: 3204 - 0.64 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1							Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3204-3207						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3204-3207						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3204-3207						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3204-3207
6	0.02	0.00	0.16	100	100	0	5	0.01	0.00	0.14	100	100	0	6	0.01	0.00	0.14	100	100	0	5	0.02	0.00	0.20	100	100	0

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (libggml-cpu.so - 5)
Analysis						Count	Analysis						Count	Analysis						Count	Analysis						Count

▶vec.cpp: 311 - 0.47 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default							Run aocc_default							Run icx_1		Run aocc_3
Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 311-316						Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 311-316						Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 311-316
1140	0.02	0.00	0.15	0	0	714.44	1232	0.02	0.00	0.17	0	0	501.13			1289	0.01	0.00	0.16	0	0	508.49

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (libggml-cpu.so - 1232)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis	Count	Analysis						Count

▶mmq.cpp: 2068 - 0.33 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run orig_default		Run aocc_default							Run icx_1		Run aocc_3
Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 2068-2078						Loop Source Regions		Loop Source Regions	/beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-003-2415/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 2068-2078
		710	0.02	0.00	0.13	0	0	33.56			712	0.03	0.00	0.19	0	0	37.67

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (libggml-cpu.so - 712)
Analysis	Count	Analysis						Count	Analysis	Count	Analysis						Count

Report Configuration

Loops

▶mmq.cpp: 1138 - 24.99 %

▶mmq.cpp: 303 - 7.42 %

▶<unknown>: 0 - 1.73 %

▶binary-ops.cpp: 18 - 1.19 %

▶vec.h: 491 - 1.16 %

▶vec.h: 1084 - 1.02 %

▶mmq.cpp: 1140 - 0.92 %

▶quants.c: 298 - 0.80 %

▶ops.cpp: 4325 - 0.79 %

▶ops.cpp: 6220 - 0.78 %

▶ggml-cpu.c: 3204 - 0.64 %

▶vec.cpp: 311 - 0.47 %

▶mmq.cpp: 2068 - 0.33 %