| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 |
libqmckl.so.0.0.0:0x15132 | qmckl_compute_ao_vgl_hpc_gaussian | qmckl_ao.c:3279 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 74.607 | 37.701 | 19.150 | 11.734 | 5.660 | 4.365 | 3.916 | 101 | 101 | 101 | 101 | 101 | 101 | 101 | 274 E-6 | 42.9 E-3 | 0.497 | 2.015 | 0.601 | 1.047 | 0.880 | 40.8 E-6 | 42.7 E-3 | 0.497 | 2.015 | 0.601 | 1.047 | 0.880 | 0.00 | 0.11 | 2.60 | 17.2 | 10.6 | 24.0 | 22.5 | 1.000 | 1.001 | 1.027 | 1.207 | 1.119 | 1.316 | 1.290 | 1.000 | 1.001 | 1.017 | 1.126 | 1.066 | 1.161 | 1.153 |
libqmckl.so.0.0.0:0x1c8cc | qmckl_compute_ao_value_hpc_gaussian | qmckl_ao.c:2781 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 35.464 | 17.800 | 8.861 | 4.513 | 2.275 | 1.478 | 0.965 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 104 E-6 | 557 E-6 | 571 E-6 | 909 E-6 | 1.27 E-3 | 1.44 E-3 | 56.1 E-3 | 16.4 E-6 | 492 E-6 | 519 E-6 | 860 E-6 | 1.22 E-3 | 1.38 E-3 | 56.1 E-3 | 0.00 | 0.00 | 0.01 | 0.02 | 0.06 | 0.10 | 5.82 | 1.000 | 1.000 | 1.000 | 1.000 | 1.001 | 1.001 | 1.062 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.009 |