| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
| start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | aocc_4 | aocc_4 | aocc_4 | aocc_4 | aocc_4 | aocc_4 | aocc_4 | aocc_4 |
| libggml-cpu.so:0x30185 | ggml_graph_compute | ggml-cpu.c:3148 | 0 | 0 | runtime | parallel | 192 | 17.953 | 513 | 13.147 | 13.077 | 73.2 | 3.735 | 3.278 |
| libggml-cpu.so:0x57d9f | ggml_backend_amx_convert_weight(ggml_tensor*, void const*, u... | mmq.cpp:2337 | 0 | 0 | runtime | parallel | 192 | 0.169 | 225 | 67.1 E-3 | 67.1 E-3 | 39.7 | 1.657 | 1.004 |