| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 | 2x1 | 2x2 | 2x4 | 2x8 | 2x16 | 2x32 | 2x48 |
exec:0x4092be | main | miniqmc.cpp:411 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 38.033 | 38.307 | 37.615 | 38.431 | 40.560 | 50.695 | 83.624 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 6.97 E-6 | 0.434 | 0.460 | 0.824 | 1.187 | 1.475 | 10.609 | 1.71 E-6 | 0.434 | 0.460 | 0.824 | 1.187 | 1.475 | 10.609 | 0.00 | 1.13 | 1.22 | 2.14 | 2.93 | 2.91 | 12.7 | 1.000 | 1.011 | 1.012 | 1.022 | 1.030 | 1.030 | 1.145 | 1.000 | 1.011 | 1.012 | 1.021 | 1.028 | 1.028 | 1.131 |
exec:0x408efc | main | miniqmc.cpp:378 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 2.046 | 2.054 | 2.147 | 2.305 | 2.384 | 2.830 | 7.302 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 22.2 E-3 | 0.116 | 0.162 | 0.164 | 0.159 | 1.393 | 0.0 | 22.2 E-3 | 0.116 | 0.162 | 0.164 | 0.159 | 1.393 | 0 | 1.10 | 5.42 | 7.02 | 6.82 | 5.60 | 18.1 | 1.000 | 1.011 | 1.057 | 1.076 | 1.073 | 1.059 | 1.221 | 1.000 | 1.001 | 1.003 | 1.004 | 1.004 | 1.003 | 1.015 |
exec:0x44c840 | miniqmcreference::einspline_spo_ref<double>::set(int, int, i... | BsplineAllocator.hpp:171 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 0.112 | 56.1 E-3 | 29.2 E-3 | 15.3 E-3 | 8.95 E-3 | 9.96 E-3 | 6.28 E-3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 608 E-6 | 661 E-6 | 402 E-6 | 353 E-6 | 3.94 E-3 | 413 E-6 | 0.0 | 608 E-6 | 661 E-6 | 401 E-6 | 353 E-6 | 3.94 E-3 | 413 E-6 | 0 | 1.08 | 2.27 | 2.62 | 3.90 | 30.6 | 6.58 | 1.000 | 1.011 | 1.023 | 1.027 | 1.041 | 1.442 | 1.070 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x409306 | main | miniqmc.cpp:482 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 3.81 E-3 | 8.03 E-3 | 15.3 E-3 | 30.2 E-3 | 65.3 E-3 | 0.129 | 0.240 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 1.29 E-3 | 1.19 E-3 | 5.94 E-3 | 13.5 E-3 | 18.4 E-3 | 20.5 E-3 | 0.0 | 1.29 E-3 | 1.19 E-3 | 5.94 E-3 | 13.5 E-3 | 18.4 E-3 | 20.5 E-3 | 0 | 16.1 | 7.83 | 19.6 | 20.7 | 14.3 | 8.56 | 1.000 | 1.192 | 1.085 | 1.244 | 1.261 | 1.167 | 1.094 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x45983c | qmcplusplus::DelayedUpdate<double, double>::updateInvMat(qmc... | OpenMP.h:43 | 1 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 652 E-6 | 741 E-6 | 947 E-6 | 744 E-6 | 843 E-6 | 1.04 E-3 | 1.27 E-3 | 485 | 482 | 484 | 484 | 484 | 484 | 484 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
exec:0x4585cc | qmcplusplus::DiracMatrix<double, double>::invert_transpose(q... | OpenMP.h:43 | 1 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 13.7 E-6 | 13.0 E-6 | 12.5 E-6 | 12.1 E-6 | 11.8 E-6 | 13.8 E-6 | 23.6 E-6 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |