| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 |
exec:0x40851d | main | miniqmc.cpp:411 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 38.556 | 39.393 | 39.333 | 40.623 | 48.269 | 69.230 | 99.937 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 15.5 E-6 | 0.274 | 0.589 | 0.596 | 1.092 | 1.011 | 2.438 | 3.91 E-6 | 0.274 | 0.589 | 0.596 | 1.092 | 1.011 | 2.438 | 0.00 | 0.69 | 1.49 | 1.47 | 2.26 | 1.46 | 2.43 | 1.000 | 1.007 | 1.015 | 1.015 | 1.023 | 1.015 | 1.025 | 1.000 | 1.007 | 1.014 | 1.014 | 1.021 | 1.014 | 1.023 |
exec:0x408194 | main | miniqmc.cpp:378 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 2.354 | 2.574 | 2.632 | 2.788 | 3.216 | 4.763 | 6.311 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 0.121 | 63.5 E-3 | 89.9 E-3 | 0.122 | 0.276 | 0.342 | 0.0 | 0.121 | 63.5 E-3 | 89.9 E-3 | 0.122 | 0.276 | 0.342 | 0 | 4.64 | 2.40 | 3.22 | 3.80 | 5.80 | 5.42 | 1.000 | 1.049 | 1.025 | 1.033 | 1.040 | 1.062 | 1.057 | 1.000 | 1.003 | 1.001 | 1.002 | 1.002 | 1.004 | 1.003 |
exec:0x43b97e | miniqmcreference::einspline_spo_ref<double>::set(int, int, i... | BsplineAllocator.hpp:171 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 0.318 | 0.181 | 90.6 E-3 | 46.2 E-3 | 30.7 E-3 | 16.1 E-3 | 14.8 E-3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 10.8 E-3 | 3.66 E-3 | 2.16 E-3 | 6.84 E-3 | 919 E-6 | 690 E-6 | 0.0 | 10.8 E-3 | 3.66 E-3 | 2.16 E-3 | 6.84 E-3 | 919 E-6 | 690 E-6 | 0 | 5.97 | 4.04 | 4.68 | 19.4 | 5.71 | 4.66 | 1.000 | 1.063 | 1.042 | 1.049 | 1.241 | 1.061 | 1.049 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x408569 | main | miniqmc.cpp:482 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 3.83 E-3 | 8.59 E-3 | 17.3 E-3 | 33.3 E-3 | 66.0 E-3 | 0.137 | 0.212 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 851 E-6 | 780 E-6 | 5.58 E-3 | 8.16 E-3 | 24.3 E-3 | 29.9 E-3 | 0.0 | 851 E-6 | 780 E-6 | 5.58 E-3 | 8.16 E-3 | 24.3 E-3 | 29.9 E-3 | 0 | 9.94 | 4.48 | 16.7 | 12.4 | 17.7 | 14.1 | 1.000 | 1.110 | 1.047 | 1.201 | 1.142 | 1.215 | 1.164 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x445d27 | qmcplusplus::DelayedUpdate<double, double>::updateInvMat(qmc... | OpenMP.h:43 | 1 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 465 E-6 | 564 E-6 | 682 E-6 | 640 E-6 | 891 E-6 | 1.27 E-3 | 1.70 E-3 | 485 | 482 | 484 | 484 | 484 | 484 | 484 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x444e2d | qmcplusplus::DiracMatrix<double, double>::invert_transpose(q... | OpenMP.h:43 | 1 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 8.08 E-6 | 8.78 E-6 | 10.7 E-6 | 10.0 E-6 | 10.8 E-6 | 15.0 E-6 | 21.9 E-6 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |