* [MAQAO] Info: Detected 1 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 1 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 10.8956
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.7139
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2847298516
-- (4) Integer space for factors (estimated) = 18711200
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32938
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 0
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 2.969D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Total space in MBytes, IC factorization (INFOG(17)): 38143
Total space in MBytes, OOC factorization (INFOG(27)): 15285
Elapsed time in analysis driver= 12.6333
Analysis time by clock_gettime(): 12.633 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 1 and #OMP = 2
Elapsed time in save structure driver= 0.0003
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 1 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 1
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2847298516
INFOG(4) Integer space for factors (estim.)= 18711200
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32938
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m
Statistics on the scaling phase
Elapsed time for scaling = 0.3718
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Effective size of S (based on INFO(39))= 2423196984
Redistrib: total data local/sent = 0 0
Elapsed time to reformat/distribute matrix = 0.5650
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 237684
Size of async. emission buffer (bytes).. = 953103
Small emission buffer (bytes) .......... = 20
** Memory allocated, total in Mbytes (INFOG(19)): 38143
** Memory effectively used, total in Mbytes (INFOG(22)): 33670
Flops under L0 layer = 6.532D+12
Elapsed time under L0 = 79.8843
Elapsed time for factorization = 457.8346
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.029D+09
------ (3) Operations in node elimination = 2.969D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2847298516
INFOG (10) Integer space for factors = 18711200
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 0
Elapsed time in factorization driver = 458.7831
Factorization time by clock_gettime(): 458.7669 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 1 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_0 #
##############################################################################################################################################################
* [MAQAO] Info: Detected 2 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 2 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 10.9041
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.6740
A root of estimated size 15552 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2956137604
-- (4) Integer space for factors (estimated) = 18766632
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32938
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 3
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 3.095D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 24644
Total space in MBytes, IC factorization (INFOG(17)): 47842
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 13311
Total space in MBytes, OOC factorization (INFOG(27)): 26573
Elapsed time in analysis driver= 12.8006
Analysis time by clock_gettime(): 12.800 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 2 and #OMP = 2
Elapsed time in save structure driver= 0.0004
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 2 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 2
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2956137604
INFOG(4) Integer space for factors (estim.)= 18766632
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32938
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Statistics on the scaling phase
Elapsed time for scaling = 0.3770
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Average Effective size of S (based on INFO(39))= 1803297043
Elapsed time to reformat/distribute matrix = 0.6958
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 45298052
Size of async. emission buffer (bytes).. = 181645177
Small emission buffer (bytes) .......... = 268
** Memory allocated, max in Mbytes (INFOG(18)): 24644
** Memory allocated, total in Mbytes (INFOG(19)): 47842
** Memory effectively used, max in Mbytes (INFOG(21)): 20038
** Memory effectively used, total in Mbytes (INFOG(22)): 39430
Flops under L0 layer (avg/max across MPI) = 3.118D+12 3.256D+12
Elapsed time under L0 (avg/max across MPI) = 39.5419 39.6272
Elapsed time to process root node = 19.7823
Elapsed time for factorization = 271.6036
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.029D+09
------ (3) Operations in node elimination = 3.095D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2956137604
INFOG (10) Integer space for factors = 18766674
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 0
Elapsed time in factorization driver = 272.6925
Factorization time by clock_gettime(): 272.6832 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 2 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_1 #
##############################################################################################################################################################
* [MAQAO] Info: Detected 4 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 4 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 10.9444
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.7065
A root of estimated size 15552 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2956117177
-- (4) Integer space for factors (estimated) = 19240562
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32938
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 14
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 3.095D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 16923
Total space in MBytes, IC factorization (INFOG(17)): 56731
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 11460
Total space in MBytes, OOC factorization (INFOG(27)): 37573
Elapsed time in analysis driver= 12.8551
Analysis time by clock_gettime(): 12.855 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 4 and #OMP = 2
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Elapsed time in save structure driver= 0.0004
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 4 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 4
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2956117177
INFOG(4) Integer space for factors (estim.)= 19240562
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32938
ICNTL(23) Memory allowed (value on host) = 0
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Statistics on the scaling phase
Elapsed time for scaling = 0.3764
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Average Effective size of S (based on INFO(39))= 1188349366
Elapsed time to reformat/distribute matrix = 0.6536
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 35507528
Size of async. emission buffer (bytes).. = 142385190
Small emission buffer (bytes) .......... = 824
** Memory allocated, max in Mbytes (INFOG(18)): 16923
** Memory allocated, total in Mbytes (INFOG(19)): 56731
** Memory effectively used, max in Mbytes (INFOG(21)): 12047
** Memory effectively used, total in Mbytes (INFOG(22)): 42821
Flops under L0 layer (avg/max across MPI) = 1.388D+12 1.727D+12
Elapsed time under L0 (avg/max across MPI) = 17.5882 20.2529
Elapsed time to process root node = 11.1221
Elapsed time for factorization = 201.3619
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.041D+09
------ (3) Operations in node elimination = 3.095D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2956137604
INFOG (10) Integer space for factors = 19065019
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 5
Elapsed time in factorization driver = 202.4044
Factorization time by clock_gettime(): 202.4059 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 4 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_2 #
##############################################################################################################################################################
* [MAQAO] Info: Detected 8 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 8 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 10.9808
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.6637
A root of estimated size 15552 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2956109325
-- (4) Integer space for factors (estimated) = 19689138
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32938
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 29
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 3.095D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 9843
Total space in MBytes, IC factorization (INFOG(17)): 68601
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 7641
Total space in MBytes, OOC factorization (INFOG(27)): 50191
Elapsed time in analysis driver= 12.8786
Analysis time by clock_gettime(): 12.878 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 8 and #OMP = 2
Elapsed time in save structure driver= 0.0004
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 8 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
****** FACTORIZATION STEP ********
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 8
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2956109325
INFOG(4) Integer space for factors (estim.)= 19689138
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32938
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Statistics on the scaling phase
Elapsed time for scaling = 0.3760
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Average Effective size of S (based on INFO(39))= 791026422
Elapsed time to reformat/distribute matrix = 0.6616
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 35507620
Size of async. emission buffer (bytes).. = 142385558
Small emission buffer (bytes) .......... = 2276
** Memory allocated, max in Mbytes (INFOG(18)): 9843
** Memory allocated, total in Mbytes (INFOG(19)): 68601
** Memory effectively used, max in Mbytes (INFOG(21)): 7679
** Memory effectively used, total in Mbytes (INFOG(22)): 46114
Flops under L0 layer (avg/max across MPI) = 4.964D+11 6.872D+11
Elapsed time under L0 (avg/max across MPI) = 6.8657 9.0860
Elapsed time to process root node = 5.7246
Elapsed time for factorization = 134.8950
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.083D+09
------ (3) Operations in node elimination = 3.095D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2956137604
INFOG (10) Integer space for factors = 19369177
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 8
Elapsed time in factorization driver = 135.9591
Factorization time by clock_gettime(): 135.9549 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 8 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_3 #
##############################################################################################################################################################
* [MAQAO] Info: Detected 16 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 16 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 10.8016
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.6506
A root of estimated size 15552 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2956055557
-- (4) Integer space for factors (estimated) = 20051299
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32938
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 40
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 3.095D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 7859
Total space in MBytes, IC factorization (INFOG(17)): 88088
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 6105
Total space in MBytes, OOC factorization (INFOG(27)): 70537
Elapsed time in analysis driver= 12.6498
Analysis time by clock_gettime(): 12.650 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 16 and #OMP = 2
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Elapsed time in save structure driver= 0.0004
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 16 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 16
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2956055557
INFOG(4) Integer space for factors (estim.)= 20051299
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32938
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m
Statistics on the scaling phase
Elapsed time for scaling = 0.3826
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Average Effective size of S (based on INFO(39))= 565086851
Elapsed time to reformat/distribute matrix = 0.6990
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 38100880
Size of async. emission buffer (bytes).. = 152784528
Small emission buffer (bytes) .......... = 6720
** Memory allocated, max in Mbytes (INFOG(18)): 7859
** Memory allocated, total in Mbytes (INFOG(19)): 88085
** Memory effectively used, max in Mbytes (INFOG(21)): 5820
** Memory effectively used, total in Mbytes (INFOG(22)): 54876
Flops under L0 layer (avg/max across MPI) = 9.602D+10 2.176D+11
Elapsed time under L0 (avg/max across MPI) = 1.7716 3.3199
Elapsed time to process root node = 3.4851
Elapsed time for factorization = 77.3482
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.057D+09
------ (3) Operations in node elimination = 3.095D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2956137604
INFOG (10) Integer space for factors = 19606438
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 5
Elapsed time in factorization driver = 78.4481
Factorization time by clock_gettime(): 78.4461 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 16 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_4 #
##############################################################################################################################################################
* [MAQAO] Info: Detected 32 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 32 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 11.4456
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.7508
A root of estimated size 15552 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2955751451
-- (4) Integer space for factors (estimated) = 21361546
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32938
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 63
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 3.095D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 4161
Total space in MBytes, IC factorization (INFOG(17)): 87423
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 3238
Total space in MBytes, OOC factorization (INFOG(27)): 69573
Elapsed time in analysis driver= 13.5665
Analysis time by clock_gettime(): 13.567 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 32 and #OMP = 2
Elapsed time in save structure driver= 0.0009
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 32 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 32
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2955751451
INFOG(4) Integer space for factors (estim.)= 21361546
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32938
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 0.3856
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Average Effective size of S (based on INFO(39))= 278943341
Elapsed time to reformat/distribute matrix = 0.9229
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 23402444
Size of async. emission buffer (bytes).. = 93843803
Small emission buffer (bytes) .......... = 23308
** Memory allocated, max in Mbytes (INFOG(18)): 4161
** Memory allocated, total in Mbytes (INFOG(19)): 87422
** Memory effectively used, max in Mbytes (INFOG(21)): 3059
** Memory effectively used, total in Mbytes (INFOG(22)): 50604
Flops under L0 layer (avg/max across MPI) = 3.597D+10 8.375D+10
Elapsed time under L0 (avg/max across MPI) = 0.8491 1.6734
Elapsed time to process root node = 1.8836
Elapsed time for factorization = 42.2892
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.043D+09
------ (3) Operations in node elimination = 3.095D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2956137604
INFOG (10) Integer space for factors = 20285876
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 22
Elapsed time in factorization driver = 43.6227
Factorization time by clock_gettime(): 43.6373 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 32 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_5 #
##############################################################################################################################################################
* [MAQAO] Info: Detected 64 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 64 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 11.4669
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.7408
A root of estimated size 15552 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2954117075
-- (4) Integer space for factors (estimated) = 23344626
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32939
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 118
Number of split nodes = 1
RINFOG(1) Operations during elimination (estim)= 3.095D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 2098
Total space in MBytes, IC factorization (INFOG(17)): 83609
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 1591
Total space in MBytes, OOC factorization (INFOG(27)): 67852
Elapsed time in analysis driver= 13.5813
Analysis time by clock_gettime(): 13.582 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 64 and #OMP = 2
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Elapsed time in save structure driver= 0.0021
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 64 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 64
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2954117075
INFOG(4) Integer space for factors (estim.)= 23344626
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32939
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 0.3823
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Average Effective size of S (based on INFO(39))= 132146676
Elapsed time to reformat/distribute matrix = 1.2748
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 14688536
Size of async. emission buffer (bytes).. = 58901028
Small emission buffer (bytes) .......... = 87384
** Memory allocated, max in Mbytes (INFOG(18)): 2099
** Memory allocated, total in Mbytes (INFOG(19)): 83607
** Memory effectively used, max in Mbytes (INFOG(21)): 1656
** Memory effectively used, total in Mbytes (INFOG(22)): 50021
Flops under L0 layer (avg/max across MPI) = 8.669D+09 2.264D+10
Elapsed time under L0 (avg/max across MPI) = 0.3758 0.7608
Elapsed time to process root node = 1.3107
Elapsed time for factorization = 25.4747
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.220D+09
------ (3) Operations in node elimination = 3.095D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2954855606
INFOG (10) Integer space for factors = 21442316
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 72
Elapsed time in factorization driver = 27.1645
Factorization time by clock_gettime(): 27.1772 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 64 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_6 #
##############################################################################################################################################################
* [MAQAO] Info: Detected 86 Lprof instances in igk-0805.
If this is incorrect, rerun with number-processes-per-node=X
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 1 1391349 32961525
executing #MPI = 86 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
=================================================
MUMPS compiled with option -Dmetis
MUMPS compiled with option -Dpord
MUMPS compiled with option -Dptscotch
MUMPS compiled with option -Dscotch
=================================================
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
Processing a graph of size: 1391349
Average density of rows/columns = 45
Ordering based on METIS
ELAPSED TIME SPENT IN METIS reordering = 11.5079
SYMBOLIC based on column counts
ELAPSED TIME IN symbolic factorization = 0.7713
A root of estimated size 15552 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 2796208095
-- (3) Real space for factors (estimated) = 2953779780
-- (4) Integer space for factors (estimated) = 24430175
-- (5) Maximum frontal size (estimated) = 25827
-- (6) Number of nodes in the tree = 32939
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL (6) Maximum transversal option = 0
ICNTL (7) Pivot order option = 7
ICNTL(12) Ordering symmetric indef. matrices = 1
ICNTL(13) Parallelism/splitting of root node = 0
ICNTL(14) Percentage of memory relaxation = 30
ICNTL(15) Analysis by block effectively used = 0
ICNTL(18) Distributed input matrix (on if >0) = 0
ICNTL(32) Forward elimination during facto. = 0
ICNTL(35) BLR activation = 0
ICNTL(48) Tree based multithreading (effective)= 1
ICNTL(58) Symbolic factorization option = 2
Number of level 2 nodes = 150
Number of split nodes = 1
RINFOG(1) Operations during elimination (estim)= 3.095D+13
MEMORY ESTIMATIONS ...
Estimations with standard Full-Rank (FR) factorization:
Maximum estim. space in Mbytes, IC facto. (INFOG(16)): 1743
Total space in MBytes, IC factorization (INFOG(17)): 81296
Maximum estim. space in Mbytes, OOC facto. (INFOG(26)): 1083
Total space in MBytes, OOC factorization (INFOG(27)): 67238
Elapsed time in analysis driver= 13.7017
Analysis time by clock_gettime(): 13.703 s
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 7 1391349 32961525
executing #MPI = 86 and #OMP = 2
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
Elapsed time in save structure driver= 0.0020
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -71
On return from DMUMPS, INFOG(2)= 0
PRE FACTO START LPROF----------------------
Entering DMUMPS 5.8.2 from C interface with JOB, N, NNZ = 2 1391349 32961525
executing #MPI = 86 and #OMP = 2
Advanced settings:
KEEP(370) Static mapping = 1
KEEP(371) Advanced optimizations = 0
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
* [MAQAO] Info: STARTING COUNTERS (igk-0805)
[0m ** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
** ERROR RETURN ** FROM DMUMPS INFO(1)= -71
** INFO(2)= 0
PRE FACTO START LPROF----------------------
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
Number of working processes = 86
ICNTL(22) Out-of-core option = 0
ICNTL(35) BLR activation (eff. choice) = 0
ICNTL(37) BLR CB compression (eff. choice) = 0
ICNTL(49) Compact workarray S (end facto.) = 0
ICNTL(56) Effective value during facto. = 0
ICNTL(14) Memory relaxation = 30
INFOG(3) Real space for factors (estimated)= 2953779780
INFOG(4) Integer space for factors (estim.)= 24430175
Maximum frontal size (estimated) = 25827
Number of nodes in the tree = 32939
ICNTL(23) Memory allowed (value on host) = 0
Sum over all procs = 0
Memory provided by user, sum of LWK_USER = 0
Effective threshold for pivoting, CNTL(1) = 0.1000D-01
Statistics on the scaling phase
Elapsed time for scaling = 0.3837
Max difference from 1 after scaling the entries for ONE-NORM (option 7/8) = 0.28D+00
Average Effective size of S (based on INFO(39))= 95041933
Elapsed time to reformat/distribute matrix = 1.5711
Allocated buffers
------------------
Size of reception buffer in bytes ...... = 10422912
Size of async. emission buffer (bytes).. = 41795869
Small emission buffer (bytes) .......... = 155080
** Memory allocated, max in Mbytes (INFOG(18)): 1743
** Memory allocated, total in Mbytes (INFOG(19)): 81347
** Memory effectively used, max in Mbytes (INFOG(21)): 1525
** Memory effectively used, total in Mbytes (INFOG(22)): 49938
Flops under L0 layer (avg/max across MPI) = 5.201D+09 1.655D+10
Elapsed time under L0 (avg/max across MPI) = 0.3382 0.7346
Elapsed time to process root node = 1.3005
Elapsed time for factorization = 22.5034
Leaving factorization with ...
RINFOG (2) Operations in node assembly = 9.218D+09
------ (3) Operations in node elimination = 3.095D+13
ICNTL (8) Scaling effectively used = 7
INFOG (9) Real space for factors = 2954855606
INFOG (10) Integer space for factors = 22109070
INFOG (11) Maximum front size = 25827
INFOG (29) Number of entries in factors = 2796208095
INFOG (12) Number of negative pivots = 0
INFOG (13) Number of delayed pivots = 0
Number of 2x2 pivots in type 1 nodes = 0
Number of 2x2 pivots in type 2 nodes = 0
RINFOG(19) Smallest pivot WITH perturbed pivots = 1.403D-03
RINFOG(20) Smallest pivot WITHOUT perturbed pivots = 1.403D-03
RINFOG(21) Largest pivot in absolute value = 1.000D+00
INFOG (24) Effective value of ICNTL(12) = 1
INFOG (14) Number of memory compress = 117
Elapsed time in factorization driver = 24.5012
Factorization time by clock_gettime(): 24.5171 s
Entering DMUMPS 5.8.2 from C interface with JOB = -2
executing #MPI = 86 and #OMP = 2
Your experiment path is /home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7
To display your profiling results:
##############################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##############################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/mlkaps_org/kevin/matrices/test_m1-86_o2_serena_allowextra_scala_kptr_probe/tools/lprof_run_7 #
##############################################################################################################################################################