miniqmc git branch: OMP_offload
miniqmc git commit: 34c39aa17b79f2e7e5c41ff1896cb0847b88715a
number of ranks : 1, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
OpenMP threads = 52
Number of walkers per rank = 52
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0504 0.0504 1 0.050403909
ParticleSet:::update 0.0000 0.0000 1 0.000007803
Total 198.2891 17.7271 1 198.289072703
Diffusion 99.1012 0.1702 5 19.820239544
Complete Updates 0.6366 0.0001 5 0.127316558
DeterminantRef::update 0.6365 0.6365 10 0.063651006
Current Gradient 5.8368 0.1103 30720 0.000189998
DeterminantRef::ratio 5.6780 5.6780 30720 0.000184831
OneBodyJastrowRef 0.0274 0.0274 30720 0.000000892
TwoBodyJastrowRef 0.0210 0.0210 30720 0.000000684
Kinetic Energy 0.9840 0.9828 5 0.196799993
OneBodyJastrowRef 0.0008 0.0008 5 0.000150409
TwoBodyJastrowRef 0.0004 0.0004 5 0.000082190
New Gradient 29.9644 0.1311 30720 0.000975405
DeterminantRef::ratio 0.4485 0.4485 30720 0.000014600
DeterminantRef::spovgl 26.0315 1.3601 30720 0.000847379
Single-Particle Orbitals 24.6714 24.6714 30720 0.000803106
OneBodyJastrowRef 0.3838 0.3838 30720 0.000012492
TwoBodyJastrowRef 2.9696 2.9696 30720 0.000096667
ParticleSet:::acceptMove 13.1821 0.0479 15371 0.000857597
DTAAOMPTarget::update_e_e 12.9881 12.9881 15371 0.000844972
DTABOMPTarget::update_ion_e 0.1462 0.1462 15371 0.000009509
ParticleSet:::computeNewPosDT 3.0931 0.0545 30720 0.000100686
DTAAOMPTarget::move_e_e 2.7898 2.7898 30720 0.000090814
DTABOMPTarget::move_ion_e 0.2487 0.2487 30720 0.000008096
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000002918
Update 45.2340 0.0412 15371 0.002942816
DeterminantRef::update 41.8797 41.8797 15371 0.002724589
OneBodyJastrowRef 0.0086 0.0086 15371 0.000000557
TwoBodyJastrowRef 3.3047 3.3047 15371 0.000214993
Initialization 11.7751 4.6636 1 11.775113315
DeterminantRef::inverse 2.1724 2.1724 2 1.086224912
DeterminantRef::spovgl 4.3202 0.1989 2 2.160097934
Single-Particle Orbitals 4.1213 4.1213 6144 0.000670790
OneBodyJastrowRef 0.0223 0.0223 1 0.022333579
ParticleSet:::update 0.4312 0.1113 2 0.215604593
DTAAOMPTarget::evaluate_e_e 0.2997 0.2997 1 0.299728475
DTABOMPTarget::evaluate_ion_e 0.0202 0.0004 1 0.020208187
DTABOMPTarget::offload_ion_e 0.0198 0.0198 1 0.019763373
TwoBodyJastrowRef 0.1653 0.1653 1 0.165301417
Pseudopotential 69.6857 0.1675 5 13.937133733
DeterminantRef::spoval 58.8402 2.3166 10215 0.005760173
Single-Particle Orbitals 56.5236 56.5236 122580 0.000461116
OneBodyJastrowRef 0.0931 0.0931 10215 0.000009115
ParticleSet:::update 8.4194 0.0327 10215 0.000824223
DTABOMPTarget::evaluate_e_virtual 7.6853 0.0135 10215 0.000752358
DTABOMPTarget::offload_e_virtual 7.6719 7.6719 10215 0.000751040
DTABOMPTarget::evaluate_ion_virtual 0.7014 0.0136 10215 0.000068663
DTABOMPTarget::offload_ion_virtual 0.6878 0.6878 10215 0.000067328
TwoBodyJastrowRef 2.1654 2.1654 10215 0.000211984
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 6.08216e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.21696e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 2.81684e+07
* Info: Dumping samples (host skylake, process 2863407)
* Info: Dumping source info for callchain nodes (host skylake, process 2863407)
* Info: Building/writing metadata (host skylake)
* Info: Finished collect step (host skylake, process 2863407)
Your experiment path is /home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0
To display your profiling results:
#########################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#########################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
#########################################################################################################################################################