miniqmc git branch: OMP_offload
miniqmc git commit: 34c39aa17b79f2e7e5c41ff1896cb0847b88715a
number of ranks : 1, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
OpenMP threads = 52
Number of walkers per rank = 52
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0510 0.0510 1 0.051035645
ParticleSet:::update 0.0000 0.0000 1 0.000007983
Total 189.0101 3.5041 1 189.010072927
Diffusion 103.1838 0.1842 5 20.636763694
Complete Updates 0.7786 0.0001 5 0.155725454
DeterminantRef::update 0.7786 0.7786 10 0.077857620
Current Gradient 5.9472 0.0979 30720 0.000193593
DeterminantRef::ratio 5.8071 5.8071 30720 0.000189035
OneBodyJastrowRef 0.0227 0.0227 30720 0.000000738
TwoBodyJastrowRef 0.0195 0.0195 30720 0.000000634
Kinetic Energy 1.0035 1.0025 5 0.200690110
OneBodyJastrowRef 0.0006 0.0006 5 0.000126852
TwoBodyJastrowRef 0.0004 0.0004 5 0.000072756
New Gradient 28.4427 0.1244 30720 0.000925870
DeterminantRef::ratio 0.4435 0.4435 30720 0.000014435
DeterminantRef::spovgl 25.1043 1.2195 30720 0.000817197
Single-Particle Orbitals 23.8848 23.8848 30720 0.000777499
OneBodyJastrowRef 0.3369 0.3369 30720 0.000010967
TwoBodyJastrowRef 2.4337 2.4337 30720 0.000079221
ParticleSet:::acceptMove 14.4576 0.0474 15371 0.000940573
DTAAOMPTarget::update_e_e 14.2770 14.2770 15371 0.000928830
DTABOMPTarget::update_ion_e 0.1331 0.1331 15371 0.000008659
ParticleSet:::computeNewPosDT 3.2083 0.0553 30720 0.000104437
DTAAOMPTarget::move_e_e 2.9111 2.9111 30720 0.000094762
DTABOMPTarget::move_ion_e 0.2419 0.2419 30720 0.000007874
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000003093
Update 49.1618 0.0389 15371 0.003198346
DeterminantRef::update 46.5697 46.5697 15371 0.003029714
OneBodyJastrowRef 0.0089 0.0089 15371 0.000000576
TwoBodyJastrowRef 2.5443 2.5443 15371 0.000165523
Initialization 9.2946 2.0994 1 9.294563904
DeterminantRef::inverse 2.4640 2.4640 2 1.232009293
DeterminantRef::spovgl 4.1346 0.1932 2 2.067278149
Single-Particle Orbitals 3.9413 3.9413 6144 0.000641491
OneBodyJastrowRef 0.0205 0.0205 1 0.020478534
ParticleSet:::update 0.3890 0.0710 2 0.194477142
DTAAOMPTarget::evaluate_e_e 0.3029 0.3029 1 0.302939119
DTABOMPTarget::evaluate_ion_e 0.0150 0.0003 1 0.014981242
DTABOMPTarget::offload_ion_e 0.0147 0.0147 1 0.014722986
TwoBodyJastrowRef 0.1872 0.1872 1 0.187200390
Pseudopotential 73.0276 0.1484 5 14.605520151
DeterminantRef::spoval 61.3432 2.4833 10215 0.006005211
Single-Particle Orbitals 58.8599 58.8599 122580 0.000480175
OneBodyJastrowRef 0.0820 0.0820 10215 0.000008031
ParticleSet:::update 9.2287 0.0296 10215 0.000903442
DTABOMPTarget::evaluate_e_virtual 8.4282 0.0143 10215 0.000825084
DTABOMPTarget::offload_e_virtual 8.4140 8.4140 10215 0.000823686
DTABOMPTarget::evaluate_ion_virtual 0.7708 0.0139 10215 0.000075456
DTABOMPTarget::offload_ion_virtual 0.7569 0.7569 10215 0.000074098
TwoBodyJastrowRef 2.2252 2.2252 10215 0.000217841
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 6.38075e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.16881e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 2.68793e+07
* Info: Dumping samples (host skylake, process 2109675)
* Info: Dumping source info for callchain nodes (host skylake, process 2109675)
* Info: Building/writing metadata (host skylake)
* Info: Finished collect step (host skylake, process 2109675)
Your experiment path is /home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0
To display your profiling results:
#########################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#########################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/kcamus/miniapp_intel/miniqmc/runs/miniqmc_422_zmmhigh_o52_prompt/tools/lprof_npsu_run_0 #
#########################################################################################################################################################