* [MAQAO] Info: Detected 3 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 1 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 67.4862
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 3.9739
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4815647661
-- (4) Integer space for factors (estimated) = 63960178
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167568
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 0
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 1.833D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Total space in MBytes, IC factorization (INFOG(17)): 55459
Total space in MBytes, OOC factorization (INFOG(27)): 16461
Elapsed time in analysis driver= 79.3801
Analysis time by clock_gettime(): 79.378 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 1 and #OMP = 2
Elapsed time in save structure driver= 0.0002
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 1 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 1
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4815647661
INFOG(4) Integer space for factors (estim.)= 63960178
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167568
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m
Statistics on the scaling phase
Elapsed time for scaling = 2.5370
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Effective size of S (based on INFO(39))= 3155107895
Redistrib: total data local/sent = 0 0
Elapsed time to reformat/distribute matrix = 4.9139
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 230000
Size of async. emission buffer (bytes).. = 566623
Small emission buffer (bytes) .......... = 20
** Memory allocated, total in Mbytes (INFOG(19)): 55544
** Memory effectively used, total in Mbytes (INFOG(22)): 49742
Flops under L0 layer = 2.293D+12
Elapsed time under L0 = 57.2887
Elapsed time for factorization = 340.7959
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.091D+09
------ (3) Operations in node elimination = 1.836D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4829216999
INFOG (10) Integer space for factors = 64006664
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 36969
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 0
Elapsed time in factorization driver = 348.3132
Factorization time by clock_gettime(): 348.3015 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 1 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_0 #
###############################################################################################################################################################
* [MAQAO] Info: Detected 2 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 2 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 67.5848
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 3.9733
A root of estimated size 8025 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4844627931
-- (4) Integer space for factors (estimated) = 63985906
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167568
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 2
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 1.850D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 32901
Total space in MBytes, IC factorization (INFOG(17)): 60006
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 9398
Total space in MBytes, OOC factorization (INFOG(27)): 18127
Elapsed time in analysis driver= 80.3424
Analysis time by clock_gettime(): 80.340 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 2 and #OMP = 2
Elapsed time in save structure driver= 0.0004
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 2 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 2
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4844627931
INFOG(4) Integer space for factors (estim.)= 63985906
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167568
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 2.5150
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Average Effective size of S (based on INFO(39))= 1799576084
Elapsed time to reformat/distribute matrix = 5.6747
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 19754932
Size of async. emission buffer (bytes).. = 79217269
Small emission buffer (bytes) .......... = 248
** Memory allocated, max in Mbytes (INFOG(18)): 32982
** Memory allocated, total in Mbytes (INFOG(19)): 59462
** Memory effectively used, max in Mbytes (INFOG(21)): 29095
** Memory effectively used, total in Mbytes (INFOG(22)): 52781
Flops under L0 layer (avg/max across MPI) = 1.261D+12 1.631D+12
Elapsed time under L0 (avg/max across MPI) = 29.7072 33.3788
Elapsed time to process root node = 2.8730
Elapsed time for factorization = 200.9397
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.091D+09
------ (3) Operations in node elimination = 1.853D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4858197269
INFOG (10) Integer space for factors = 64032420
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 36969
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 0
Elapsed time in factorization driver = 209.1844
Factorization time by clock_gettime(): 209.1774 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 2 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_1 #
###############################################################################################################################################################
* [MAQAO] Info: Detected 4 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 4 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 67.0084
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 3.9099
A root of estimated size 8025 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4844627931
-- (4) Integer space for factors (estimated) = 64090936
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167568
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 5
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 1.850D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 18053
Total space in MBytes, IC factorization (INFOG(17)): 63555
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 6329
Total space in MBytes, OOC factorization (INFOG(27)): 21334
Elapsed time in analysis driver= 79.5888
Analysis time by clock_gettime(): 79.586 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 4 and #OMP = 2
Elapsed time in save structure driver= 0.0004
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 4 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 4
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4844627931
INFOG(4) Integer space for factors (estim.)= 64090936
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167568
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Statistics on the scaling phase
Elapsed time for scaling = 2.4664
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Average Effective size of S (based on INFO(39))= 989863809
Elapsed time to reformat/distribute matrix = 4.7076
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 18243492
Size of async. emission buffer (bytes).. = 73156393
Small emission buffer (bytes) .......... = 644
** Memory allocated, max in Mbytes (INFOG(18)): 18133
** Memory allocated, total in Mbytes (INFOG(19)): 62836
** Memory effectively used, max in Mbytes (INFOG(21)): 15653
** Memory effectively used, total in Mbytes (INFOG(22)): 54075
Flops under L0 layer (avg/max across MPI) = 7.762D+11 1.837D+12
Elapsed time under L0 (avg/max across MPI) = 15.5937 24.6805
Elapsed time to process root node = 1.7156
Elapsed time for factorization = 120.8970
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.093D+09
------ (3) Operations in node elimination = 1.853D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4858197269
INFOG (10) Integer space for factors = 64113599
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 36969
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 0
Elapsed time in factorization driver = 128.1125
Factorization time by clock_gettime(): 128.1470 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 4 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_2 #
###############################################################################################################################################################
* [MAQAO] Info: Detected 8 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 8 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 67.8264
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 4.0085
A root of estimated size 8025 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4844627931
-- (4) Integer space for factors (estimated) = 64272958
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167568
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 13
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 1.850D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 10024
Total space in MBytes, IC factorization (INFOG(17)): 67420
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 4383
Total space in MBytes, OOC factorization (INFOG(27)): 28623
Elapsed time in analysis driver= 80.6930
Analysis time by clock_gettime(): 80.691 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 8 and #OMP = 2
Elapsed time in save structure driver= 0.0005
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 8 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 8
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4844627931
INFOG(4) Integer space for factors (estim.)= 64272958
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167568
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m
Statistics on the scaling phase
Elapsed time for scaling = 2.4750
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Average Effective size of S (based on INFO(39))= 558163519
Elapsed time to reformat/distribute matrix = 4.4978
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 15786264
Size of async. emission buffer (bytes).. = 63302908
Small emission buffer (bytes) .......... = 1956
** Memory allocated, max in Mbytes (INFOG(18)): 10102
** Memory allocated, total in Mbytes (INFOG(19)): 67397
** Memory effectively used, max in Mbytes (INFOG(21)): 8435
** Memory effectively used, total in Mbytes (INFOG(22)): 56600
Flops under L0 layer (avg/max across MPI) = 3.367D+11 7.570D+11
Elapsed time under L0 (avg/max across MPI) = 7.1324 10.9684
Elapsed time to process root node = 0.9314
Elapsed time for factorization = 79.7806
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.104D+09
------ (3) Operations in node elimination = 1.853D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4858197269
INFOG (10) Integer space for factors = 64260751
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 36969
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 1
Elapsed time in factorization driver = 86.8366
Factorization time by clock_gettime(): 86.8380 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 8 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_3 #
###############################################################################################################################################################
* [MAQAO] Info: Detected 16 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 16 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 68.4124
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 4.0355
A root of estimated size 8025 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4843250499
-- (4) Integer space for factors (estimated) = 64481250
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167569
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 24
Number of split nodes = 1
RINFOG(1) Operations during elimination (estim)= 1.850D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 6915
Total space in MBytes, IC factorization (INFOG(17)): 74795
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 4251
Total space in MBytes, OOC factorization (INFOG(27)): 37833
Elapsed time in analysis driver= 81.3388
Analysis time by clock_gettime(): 81.336 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 16 and #OMP = 2
Elapsed time in save structure driver= 0.0006
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 16 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 16
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4843250499
INFOG(4) Integer space for factors (estim.)= 64481250
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167569
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 2.5909
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Average Effective size of S (based on INFO(39))= 324547318
Elapsed time to reformat/distribute matrix = 4.2481
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 18928808
Size of async. emission buffer (bytes).. = 75904514
Small emission buffer (bytes) .......... = 6400
** Memory allocated, max in Mbytes (INFOG(18)): 6914
** Memory allocated, total in Mbytes (INFOG(19)): 74711
** Memory effectively used, max in Mbytes (INFOG(21)): 5596
** Memory effectively used, total in Mbytes (INFOG(22)): 62769
Flops under L0 layer (avg/max across MPI) = 1.675D+11 3.816D+11
Elapsed time under L0 (avg/max across MPI) = 3.6183 5.7675
Elapsed time to process root node = 0.6185
Elapsed time for factorization = 36.6164
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.166D+09
------ (3) Operations in node elimination = 1.853D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4856855549
INFOG (10) Integer space for factors = 64441504
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 36969
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 3
Elapsed time in factorization driver = 43.5249
Factorization time by clock_gettime(): 43.5394 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 16 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_4 #
###############################################################################################################################################################
* [MAQAO] Info: Detected 32 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 32 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 74.0810
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 4.7484
A root of estimated size 8025 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4843056339
-- (4) Integer space for factors (estimated) = 65395880
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167569
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 48
Number of split nodes = 1
RINFOG(1) Operations during elimination (estim)= 1.850D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 3369
Total space in MBytes, IC factorization (INFOG(17)): 79801
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 1920
Total space in MBytes, OOC factorization (INFOG(27)): 46114
Elapsed time in analysis driver= 89.3745
Analysis time by clock_gettime(): 89.372 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 32 and #OMP = 2
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Elapsed time in save structure driver= 0.0010
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 32 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 32
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4843056339
INFOG(4) Integer space for factors (estim.)= 65395880
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167569
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 2.5492
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Average Effective size of S (based on INFO(39))= 173582296
Elapsed time to reformat/distribute matrix = 4.8220
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 10992152
Size of async. emission buffer (bytes).. = 44078523
Small emission buffer (bytes) .......... = 23008
** Memory allocated, max in Mbytes (INFOG(18)): 3368
** Memory allocated, total in Mbytes (INFOG(19)): 79596
** Memory effectively used, max in Mbytes (INFOG(21)): 2810
** Memory effectively used, total in Mbytes (INFOG(22)): 64528
Flops under L0 layer (avg/max across MPI) = 7.242D+10 1.419D+11
Elapsed time under L0 (avg/max across MPI) = 1.8836 2.5743
Elapsed time to process root node = 0.3839
Elapsed time for factorization = 21.5600
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.162D+09
------ (3) Operations in node elimination = 1.853D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4856855549
INFOG (10) Integer space for factors = 65112396
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 36570
Number of 2x2 pivots in type 2 nodes = 399
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 25
Elapsed time in factorization driver = 28.9887
Factorization time by clock_gettime(): 29.0392 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 32 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_5 #
###############################################################################################################################################################
* [MAQAO] Info: Detected 64 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 64 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 72.4452
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 4.5922
A root of estimated size 8025 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4840692933
-- (4) Integer space for factors (estimated) = 66894318
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167572
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 99
Number of split nodes = 4
RINFOG(1) Operations during elimination (estim)= 1.850D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 1796
Total space in MBytes, IC factorization (INFOG(17)): 87588
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 1141
Total space in MBytes, OOC factorization (INFOG(27)): 55497
Elapsed time in analysis driver= 87.3259
Analysis time by clock_gettime(): 87.324 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 64 and #OMP = 2
Elapsed time in save structure driver= 0.0016
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 64 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 64
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4840692933
INFOG(4) Integer space for factors (estim.)= 66894318
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167572
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 2.5771
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Average Effective size of S (based on INFO(39))= 95906173
Elapsed time to reformat/distribute matrix = 5.1690
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 8604236
Size of async. emission buffer (bytes).. = 34502984
Small emission buffer (bytes) .......... = 87004
** Memory allocated, max in Mbytes (INFOG(18)): 1796
** Memory allocated, total in Mbytes (INFOG(19)): 87647
** Memory effectively used, max in Mbytes (INFOG(21)): 1625
** Memory effectively used, total in Mbytes (INFOG(22)): 69846
Flops under L0 layer (avg/max across MPI) = 2.789D+10 4.937D+10
Elapsed time under L0 (avg/max across MPI) = 1.2162 1.5036
Elapsed time to process root node = 0.3068
Elapsed time for factorization = 14.1837
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.300D+09
------ (3) Operations in node elimination = 1.853D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4854951552
INFOG (10) Integer space for factors = 66210064
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 36072
Number of 2x2 pivots in type 2 nodes = 897
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 70
Elapsed time in factorization driver = 22.0056
Factorization time by clock_gettime(): 22.0614 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 64 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_6 #
###############################################################################################################################################################
* [MAQAO] Info: Detected 86 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 5412840 208814389
executing #MPI = 86 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 5412840
Average density of rows/columns = 75
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 72.4608
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 4.6108
A root of estimated size 8025 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 4678450993
-- (3) Real space for factors (estimated) = 4840272379
-- (4) Integer space for factors (estimated) = 67920698
-- (5) Maximum frontal size (estimated) = 15351
-- (6) Number of nodes in the tree = 167573
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 146
Number of split nodes = 5
RINFOG(1) Operations during elimination (estim)= 1.850D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 1452
Total space in MBytes, IC factorization (INFOG(17)): 90522
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 870
Total space in MBytes, OOC factorization (INFOG(27)): 58511
Elapsed time in analysis driver= 87.4382
Analysis time by clock_gettime(): 87.437 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 5412840 208814389
executing #MPI = 86 and #OMP = 2
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Elapsed time in save structure driver= 0.0018
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 5412840 208814389
executing #MPI = 86 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 86
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 4840272379
INFOG(4) Integer space for factors (estim.)= 67920698
Maximum frontal size (estimated) = 15351
Number of nodes in the tree = 167573
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 2.5295
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.12D+00
Average Effective size of S (based on INFO(39))= 72435587
Elapsed time to reformat/distribute matrix = 5.5234
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 6125820
Size of async. emission buffer (bytes).. = 24564538
Small emission buffer (bytes) .......... = 155000
** Memory allocated, max in Mbytes (INFOG(18)): 1452
** Memory allocated, total in Mbytes (INFOG(19)): 90531
** Memory effectively used, max in Mbytes (INFOG(21)): 1187
** Memory effectively used, total in Mbytes (INFOG(22)): 72935
Flops under L0 layer (avg/max across MPI) = 1.851D+10 3.062D+10
Elapsed time under L0 (avg/max across MPI) = 1.1443 1.4275
Elapsed time to process root node = 0.3189
Elapsed time for factorization = 12.6573
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.330D+09
------ (3) Operations in node elimination = 1.853D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 4854653714
INFOG (10) Integer space for factors = 66966837
INFOG (11) Maximum front size = 15351
INFOG (29) Number of entries in factors = 4691271697
INFOG (12) Number of negative pivots = 73938
INFOG (13) Number of delayed pivots = 23243
Number of 2x2 pivots in type 1 nodes = 35586
Number of 2x2 pivots in type 2 nodes = 1383
RINFOG(19) Smallest pivot WITH perturbed pivots = 9.314D-07
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 9.314D-07
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 124
Elapsed time in factorization driver = 20.8020
Factorization time by clock_gettime(): 20.8869 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 86 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7
To display your profiling results:
###############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
###############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_perf009_allowextra_scala_kptr_probe/tools/lprof_run_7 #
###############################################################################################################################################################