OV - exec - Global

exec - 2024-01-22 16:18:35 - MAQAO 2.19.0

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Global Metrics

Total Time (s)		28.52
Profiled Time (s)		27.32
GFLOPS		142.996
Time in analyzed loops (%)		56.6
Time in analyzed innermost loops (%)		41.1
Time in user code (%)		67.0
Compilation Options Score (%)		100
Array Access Efficiency (%)		78.7

Potential Speedups
Iterations Count		1.00
Perfect Flow Complexity		1.00
Perfect OpenMP + MPI + Pthread		1.08
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		1.09
No Scalar Integer	Potential Speedup	1.26
No Scalar Integer	Nb Loops to get 80%	14
FP Vectorised	Potential Speedup	1.17
FP Vectorised	Nb Loops to get 80%	30
Fully Vectorised	Potential Speedup	1.82
Fully Vectorised	Nb Loops to get 80%	41
FP Arithmetic Only	Potential Speedup	1.37
FP Arithmetic Only	Nb Loops to get 80%	27

CQA Potential Speedups Summary

Loop Based Profile⏎

Innermost Loop Based Profile⏎

Application Categorization⏎

Compilation Options⏎

Source Object	Issue
▼exec–
○optwf_sr_more.f90
○hpsi.f90
○matinv.f90
○distances.f90
○jastrow4e.f90
○optci.f90
○gammai.f90
○jastrowe.f90
○multideterminant.f90
○metrop_mov1_slat.f90
○hpsie.f90
○deriv_nonlpsi.f90
○get_norbterm.f90
○optwf_handle_wf.f90
○basis_fns.f90
○xoroshiro256starstar.c
○nonloc.f90
○detsav.f90
○determinante.f90
○acuest.f90
○random.f90
○multiply_slmi_mderiv.f90
○splfit.f90
○deriv_nonloc.f90
○jastrow4.f90
○jassav.f90
○bxmatrices.f90
○optwf_sr.f90
○multideterminante.f90
○deriv_jastrow4.f90
○optorb.f90
○readps_gauss.f90
○pot_local.f90
○determinant.f90
○optjas.f90
○determinante_psit.f90
○nonlpsi.f90
○orbitals.f90
○slm.f90
○scale_dist.f90

Loop Iteration Count Profile⏎

Loop Path Count Profile⏎

Cumulated Speedup If No Scalar Integer⏎

Cumulated Speedup If FP Vectorized⏎

Cumulated Speedup If Fully Vectorized⏎

Cumulated Speedup If FP Arithmetic Only⏎

Experiment Summary

Application	/home/kcamus/qaas/qaas_runs/170-593-0710/uvsq/champ/run/binaries/icc_1/exec
Timestamp	2024-01-22 16:18:35	Universal Timestamp	1705936715
Number of processes observed	52	Number of threads observed	52
Experiment Type	MPI; OpenMP;
Machine	skylake
Model Name	Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz
Architecture	x86_64	Micro Architecture	SKYLAKE
Cache Size	36608 KB	Number of Cores	26
OS Version	Linux 6.5.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 10 Oct 2023 21:10:21 +0000
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	SKYLAKE
Frequency Driver	intel_cpufreq	Frequency Governor	performance
Huge Pages	always	Hyperthreading	off
Number of sockets	2	Number of cores per socket	26
Compilation Options	exec: Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.8.0 Build 20221119_000000 -I/home/kcamus/qaas/qaas_runs/170-593-0710/uvsq/champ/build/champ/src/vmc -I/home/kcamus/qaas/qaas_runs/170-593-0710/uvsq/champ/build/icc_1/src/module -I/home/kcamus/qaas/qaas_runs/170-593-0710/uvsq/champ/build/icc_1/src/parser -I/opt/intel/oneapi.old/mpi/2021.8.0//include -I/opt/intel/oneapi.old/mpi/2021.8.0/include -DTARGET_ARCHITECTURE=\"avx512\" -DVECTORIZATION=\"avx512\" -O3 -xSKYLAKE-AVX512 -g -fno-omit-frame-pointer -no-pie -module src/vmc -fPIC -implicitnone -finline -ip -align array64byte -fma -ftz -fomit-frame-pointer -fpp -mcmodel=small -shared-intel -dyncom=grid3d_data,orbital_num_spl,orbital_num_lag,orbital_num_spl2,grid3d_data -D_MPI_ -DCLUSTER -xSKYLAKE-AVX512 -g -fno-omit-frame-pointer -no-pie -c -o src/vmc/CMakeFiles/shared_objects.dir/basis_fns.f90.o

Configuration Summary

Dataset
Run Command	<executable> -i vmc_optimization_500.inp
MPI Command	mpirun -np 52
Number Processes	1
Number Nodes	1
Filter	{type = number ; value = 1 ; }
Profile Start	{unit = none ; value = 0 ; }

Report Configuration

exec - 2024-01-22 16:18:35 - MAQAO 2.19.0

Global Metrics

CQA Potential Speedups Summary

Loop Based Profile⏎

Innermost Loop Based Profile⏎

Application Categorization⏎

Compilation Options⏎

Loop Iteration Count Profile⏎

Loop Path Count Profile⏎

Cumulated Speedup If No Scalar Integer⏎

Cumulated Speedup If FP Vectorized⏎

Cumulated Speedup If Fully Vectorized⏎

Cumulated Speedup If FP Arithmetic Only⏎

Experiment Summary

Configuration Summary