OV - - Loops

MAQAO

options

Loops Index

34 loops have been discarded from the report because their ratio ((Max Inclusive Time Over Threads * 100) / Max Thread Active Time) is lower than the threshold set by object_coverage_threshold (0.1%). It represents about 0.03% of the application. To include them, change the value of object_coverage_threshold in the experiment directory configuration file, then rerun the command with the additionnal parameter --force-static-analysis

▶Filters

Loop id	Source Location	Source Function	Level	Max Thread Time / Walltime aocc_0 (%)	Exclusive Coverage aocc_0 (%)	Inclusive Coverage aocc_0 (%)	Max Exclusive Time Over Threads aocc_0 (s)	Max Inclusive Time Over Threads aocc_0 (s)	Exclusive Time w.r.t. Wall Time aocc_0 (s)	Inclusive Time w.r.t. Wall Time aocc_0 (s)	Nb Threads aocc_0	GFLOPS aocc_0	Vectorization Ratio (%)	Vector Length Use (%)	Speedup If No Scalar Integer	Speedup If FP Vectorized	Speedup If Fully Vectorized	Speedup If Perfect Load Balancing aocc_0	Stride 0	Stride 1	Stride n	Stride Unknown	Stride Indirect	Array Access Efficiency
531	libggml-cpu.so - mmq.cpp:1570-1597 [...]	ggml_backend_amx_mul_mat(ggml_compute_params const, ggml_tensor)::$_1::operator()(int, int) const::{lambda()#1}::operator()() const	Single	37.58	24.24	24.24	13.59	13.59	4.56	4.56	183	78.81	NA	NA	NA	NA	NA	2.88	NA	NA	NA	NA	NA	0.00
384	libggml-cpu.so - mmq.cpp:303-1392 [...]	void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int,...	Innermost	0.14	0.11	0.11	0.05	0.05	0.02	0.02	183	0.00	90.91	38.76	1.47	1	1.41	2.39	23	0	0	9	0	85.94
1232	libggml-cpu.so - vec.cpp:311-316	ggml_vec_dot_f16	Single	0.41	0.10	0.10	0.15	0.15	0.02	0.02	33	1168.45	NA	NA	NA	NA	NA	1.42	NA	NA	NA	NA	NA	0.00
2437	libggml-cpu.so - vec.h:491-497	ggml_compute_forward_flash_attn_ext	Innermost	0.51	0.08	0.08	0.18	0.18	0.02	0.02	35	881.41	NA	NA	NA	NA	NA	2.14	NA	NA	NA	NA	NA	0.00
696	libggml-cpu.so - mmq.cpp:520-2488 [...]	ggml_backend_amx_mul_mat(ggml_compute_params const, ggml_tensor)::$_2::operator()(int, int) const::{lambda()#1}::operator()() const	InBetween	0.11	0.08	0.09	0.04	0.05	0.01	0.02	150	320.56	NA	NA	NA	NA	NA	2.14	NA	NA	NA	NA	NA	0.00
2421	libggml-cpu.so - ops.cpp:8759-8881 [...]	ggml_compute_forward_flash_attn_ext	InBetween	0.19	0.04	0.12	0.07	0.21	0.01	0.02	39	1508.62	20.85	14.01	2.87	1.57	1.68	2.08	NA	NA	NA	NA	NA	0.00
2113	libggml-cpu.so - ops.cpp:6220-6245 [...]	ggml_compute_forward_rope_f32(ggml_compute_params const, ggml_tensor, bool)	Innermost	0.11	0.04	0.04	0.04	0.04	0.01	0.01	192	577.43	1.96	6.62	1.01	1.14	4.06	6.05	1	2	0	0	0	100.00
108	libggml-cpu.so - ggml-cpu.c:533-2891 [...]	ggml_graph_compute_thread	Single	0.06	0.02	0.02	0.02	0.02	0.00	0.00	113	16.82	0	9.58	1	1	14.77	2.54	NA	NA	NA	NA	NA	0.00
2102	libggml-cpu.so - ops.cpp:6365-6484 [...]	ggml_compute_forward_rope_f32(ggml_compute_params const, ggml_tensor, bool)	InBetween	0.06	0.01	0.02	0.02	0.04	0.00	0.00	164	48.82	0	9.9	3.16	3.56	28.06	6.63	NA	NA	NA	NA	NA	0.00
2419	libggml-cpu.so - ops.cpp:8885-8886 [...]	ggml_compute_forward_flash_attn_ext	Innermost	0.07	0.01	0.01	0.02	0.02	0.00	0.00	28	0.00	0	6.25	1.33	1	16	2.09	0	2	0	0	1	66.67
3307	libggml-cpu.so - quants.c:298-355 [...]	quantize_row_q8_0	Single	0.84	0.01	0.01	0.30	0.30	0.00	0.00	1	877.69	58.33	28.75	1	1.38	2.83	1	NA	NA	NA	NA	NA	0.00
914	libggml-cpu.so - binary-ops.cpp:18-32 [...]	ggml_compute_forward_mul	Innermost	0.57	0.01	0.01	0.20	0.20	0.00	0.00	6	134.05	0	6.25	1	1.06	16	6	0	3	0	0	0	100.00
2105	libggml-cpu.so - ops.cpp:6446-6456 [...]	ggml_compute_forward_rope_f32(ggml_compute_params const, ggml_tensor, bool)	Innermost	0.06	0.00	0.00	0.02	0.02	0.00	0.00	32	390.18	72.73	19.13	1	1.43	4.56	3.76	1	2	0	1	0	87.50
1758	libggml-cpu.so - ops.cpp:4325-4326	ggml_compute_forward_rms_norm	Innermost	0.41	0.00	0.00	0.15	0.15	0.00	0.00	6	358.82	0	7.81	1	1.98	13.02	6	0	1	0	0	0	100.00
1236	libggml-cpu.so - vec.h:1084-1116 [...]	ggml_vec_swiglu_f32	Single	0.35	0.00	0.00	0.13	0.13	0.00	0.00	1	7871.81	98	98.13	1.02	1	1	1	0.5	0	0	3	0	56.25
816	libggml-cpu.so - binary-ops.cpp:10-32 [...]	ggml_compute_forward_add_non_quantized	Innermost	0.26	0.00	0.00	0.09	0.09	0.00	0.00	2	274.30	0	6.25	1	1.06	16	2	0	3	0	0	0	100.00
2992	exec - sampling.cpp:125-126 [...]	common_sampler::set_logits(llama_context*, int)	Single	0.21	0.00	0.00	0.07	0.07	0.00	0.00	1	0.00	0	6.25	3	1	16	1	1	1	2	0	0	87.50
2207	libllama.so - stl_heap.h:139-262 [...]	llama_token_data_array_partial_sort_inplace(llama_token_data_array*, int)	Outermost	0.18	0.00	0.00	0.06	0.06	0.00	0.00	1	0.00	0	8.48	2.33	1	13.95	1	NA	NA	NA	NA	NA	0.00
2638	libllama.so - hashtable.h:1840-1843 [...]	std::_Hashtable<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pai...	Single	0.06	0.00	0.00	0.02	0.02	0.00	0.00	1	0.00	0	11.33	1	1	2.46	1	NA	NA	NA	NA	NA	0.00

×