| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 |
exec:0x401690 | main | main.c:139 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 96 | 1.08 E3 | 544.166 | 270.287 | 135.905 | 67.789 | 34.064 | 26.619 | 16.259 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 0.0 | 2.108 | 0.772 | 0.763 | 0.381 | 0.272 | 1.699 | 0.756 | 0.0 | 2.108 | 0.772 | 0.763 | 0.381 | 0.271 | 1.699 | 0.755 | 0 | 0.39 | 0.29 | 0.56 | 0.56 | 0.80 | 6.38 | 4.65 | 1.000 | 1.004 | 1.003 | 1.006 | 1.006 | 1.008 | 1.068 | 1.049 | 1.000 | 1.004 | 1.003 | 1.006 | 1.006 | 1.008 | 1.066 | 1.046 |
exec:0x4015cd | main | main.c:97 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 96 | 9.60 E-6 | 423 E-6 | 382 E-6 | 610 E-6 | 1.04 E-3 | 2.41 E-3 | 3.00 E-3 | 5.21 E-3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 217 E-6 | 112 E-6 | 136 E-6 | 136 E-6 | 212 E-6 | 199 E-6 | 652 E-6 | 0.0 | 217 E-6 | 112 E-6 | 136 E-6 | 136 E-6 | 211 E-6 | 198 E-6 | 651 E-6 | 0 | 51.4 | 29.4 | 22.4 | 13.1 | 8.76 | 6.61 | 12.5 | 1.000 | 2.059 | 1.417 | 1.288 | 1.150 | 1.096 | 1.071 | 1.143 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |