| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x56 |
exec:0x40795b | main | miniqmc.cpp:411 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 56 | 55.840 | 56.599 | 56.817 | 55.824 | 58.556 | 69.523 | 108.725 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 7.13 E-6 | 0.216 | 0.715 | 0.747 | 1.151 | 1.496 | 1.861 | 1.64 E-6 | 0.216 | 0.715 | 0.747 | 1.151 | 1.496 | 1.861 | 0.00 | 0.38 | 1.26 | 1.34 | 1.97 | 2.15 | 1.71 | 1.000 | 1.004 | 1.013 | 1.014 | 1.020 | 1.022 | 1.017 | 1.000 | 1.004 | 1.012 | 1.012 | 1.018 | 1.019 | 1.015 |
exec:0x4075cc | main | miniqmc.cpp:378 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 56 | 4.318 | 4.397 | 4.691 | 4.902 | 5.454 | 7.158 | 11.221 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 38.2 E-3 | 0.176 | 0.220 | 0.179 | 0.173 | 1.223 | 0.0 | 38.2 E-3 | 0.176 | 0.220 | 0.179 | 0.173 | 1.223 | 0 | 0.86 | 3.72 | 4.49 | 3.28 | 2.42 | 10.9 | 1.000 | 1.009 | 1.039 | 1.047 | 1.034 | 1.025 | 1.122 | 1.000 | 1.001 | 1.003 | 1.004 | 1.003 | 1.002 | 1.010 |
exec:0x433b34 | miniqmcreference::einspline_spo_ref<double>::set(int, int, i... | BsplineAllocator.hpp:171 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 56 | 0.883 | 0.495 | 0.261 | 0.148 | 0.103 | 0.105 | 96.9 E-3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 9.47 E-3 | 3.39 E-3 | 3.14 E-3 | 1.61 E-3 | 4.70 E-3 | 13.5 E-3 | 0.0 | 9.47 E-3 | 3.39 E-3 | 3.14 E-3 | 1.61 E-3 | 4.69 E-3 | 13.5 E-3 | 0 | 1.91 | 1.30 | 2.12 | 1.57 | 4.54 | 13.9 | 1.000 | 1.020 | 1.013 | 1.022 | 1.016 | 1.048 | 1.161 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x40799c | main | miniqmc.cpp:482 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 56 | 57.8 E-3 | 0.122 | 0.239 | 0.519 | 1.200 | 2.935 | 6.228 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 14.5 E-3 | 32.0 E-3 | 92.0 E-3 | 0.212 | 0.422 | 1.020 | 0.0 | 14.5 E-3 | 32.0 E-3 | 92.0 E-3 | 0.212 | 0.422 | 1.020 | 0 | 11.9 | 13.4 | 17.8 | 17.6 | 14.5 | 16.5 | 1.000 | 1.135 | 1.154 | 1.216 | 1.214 | 1.170 | 1.198 | 1.000 | 1.000 | 1.001 | 1.001 | 1.003 | 1.005 | 1.008 |
exec:0x43e64c | qmcplusplus::DelayedUpdate<double, double>::updateInvMat(qmc... | OpenMP.h:43 | 1 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 56 | 781 E-6 | 761 E-6 | 786 E-6 | 710 E-6 | 820 E-6 | 1.11 E-3 | 1.90 E-3 | 485 | 482 | 484 | 484 | 484 | 484 | 484 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x43d49d | qmcplusplus::DiracMatrix<double, double>::invert_transpose(q... | OpenMP.h:43 | 1 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 56 | 16.1 E-6 | 12.7 E-6 | 14.5 E-6 | 12.9 E-6 | 12.9 E-6 | 13.5 E-6 | 27.4 E-6 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |