| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 | 1x1 | 1x2 | 1x4 | 1x8 | 1x16 | 1x32 | 1x48 | 1x96 |
exec:0x401690 | main | main.c:139 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 96 | 1.08 E3 | 540.119 | 270.238 | 135.208 | 67.769 | 34.064 | 26.513 | 15.690 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 4.98 E3 | 0.0 | 0.931 | 0.578 | 0.414 | 0.390 | 0.284 | 1.610 | 0.480 | 0.0 | 0.930 | 0.577 | 0.414 | 0.390 | 0.284 | 1.609 | 0.480 | 0 | 0.17 | 0.21 | 0.31 | 0.58 | 0.83 | 6.07 | 3.06 | 1.000 | 1.002 | 1.002 | 1.003 | 1.006 | 1.008 | 1.065 | 1.032 | 1.000 | 1.002 | 1.002 | 1.003 | 1.006 | 1.008 | 1.062 | 1.030 |
exec:0x4015cd | main | main.c:97 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 96 | 15.2 E-6 | 303 E-6 | 294 E-6 | 535 E-6 | 1.59 E-3 | 2.26 E-3 | 5.86 E-3 | 6.76 E-3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 109 E-6 | 70.2 E-6 | 96.6 E-6 | 190 E-6 | 143 E-6 | 2.92 E-3 | 1.01 E-3 | 0.0 | 109 E-6 | 70.0 E-6 | 96.5 E-6 | 190 E-6 | 143 E-6 | 2.92 E-3 | 1.01 E-3 | 0 | 36.1 | 23.9 | 18.0 | 12.0 | 6.32 | 49.8 | 15.0 | 1.000 | 1.564 | 1.314 | 1.220 | 1.136 | 1.067 | 1.991 | 1.176 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |