| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
| start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 | 1x8 | 1x64 | 1x96 | 1x128 | 1x160 | 1x192 |
| libggml-cpu.so:0x25efc | ggml_graph_compute.A | ggml-cpu.c:682 | 0 | 0 | runtime | parallel | 8 | 64 | 96 | 128 | 160 | 192 | 41.807 | 59.284 | 59.833 | 59.616 | 60.379 | 60.400 | 513 | 513 | 513 | 513 | 513 | 513 | 3.142 | 22.621 | 21.984 | 22.642 | 23.067 | 22.742 | 3.071 | 22.546 | 21.904 | 22.566 | 22.992 | 22.658 | 7.52 | 38.2 | 36.7 | 38.0 | 38.2 | 37.7 | 1.081 | 1.617 | 1.581 | 1.612 | 1.618 | 1.604 | 1.081 | 1.612 | 1.576 | 1.607 | 1.613 | 1.599 |