* [MAQAO] Info: Detected 1 Lprof instances in krakenngpu01.cluster.
If this is incorrect, rerun with number-processes-per-node=X
[0m __ ______ _____ __ ________ _
/ / / _ | __ / /____ || |
/ / /| |_) | |__) | / / / /_| | _____ __
/ / / / | _ <| ___/ / / / / _` |/ _ / /
/ ____ / | |_) | | / / / (_| | __/ V /
/_/ _/ |____/|_| / /_/ __,_|___| _/
Using branch :
Version date : Mon, 18 Nov 2024 11:40:50 +0100
Commit : b58af1ea20
MPI processes : 1
Computation #1/1
Compilation info : mpif90 -g -Mpreprocess -O3 -fastsse -Munroll -byteswapio -tp=px -acc=gpu -Minfo=accel -Minline -I/softs/local_pgi/phdf5/1.8.20_pgi204_zen/include -DHAS_PMETIS -I/softs/local_pgi/parmetis/403_r64_pgi201_px/include
Compilation wrapper info : nvfortran -I/home/logiciels/nvidia/hpc_sdk/Linux_x86_64/24.1/comm_libs/12.3/hpcx/hpcx-2.17.1/ompi/include -I/home/logiciels/nvidia/hpc_sdk/Linux_x86_64/24.1/comm_libs/12.3/hpcx/hpcx-2.17.1/ompi/lib -L/home/logiciels/nvidia/hpc_sdk/Linux_x86_64/24.1/comm_libs/12.3/hpcx/hpcx-2.17.1/ompi/lib -rpath /home/logiciels/nvidia/hpc_sdk/Linux_x86_64/24.1/comm_libs/12.3/hpcx/hpcx-2.17.1/ompi/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
Compilation user : neto
Compilation date : 2025-03-06 13:53:10
Compilation MPI version : mpirun (Open MPI) 4.1.7a1
AVBP version : 7.15.0
Reading input file version : 7.15.0
----> Reading run parameters : .//run.params
----> Using NATURAL reordering
>>>>> WARNING
>>>>> Use of cached el2part disabled
----> command.dat file API is enabled
----> GPU DIRECT ENABLED
----> Using TTGC
with UNCLOSED boundary terms
----> Using colin_species viscosity model
>>>>> WARNING
>>>>> Temporals are not computed!
>>>>> WARNING
>>>>> Specifier 'transport = computed' is deprecated since version 7.14. Please use 'transport = simplified' instead (simplified transport model).
----> Reading mesh : .//../MESH/config5_uniform_true.mesh.h5
Meshfile signature: fd2a16d9e6abb6308ccb0c380607da34
----> Initialize the solution writers (1 writers)
>>>>> WARNING
>>>>> No instantaneous solution storage required: the calculation of additional variables is deactivated.
Checking TFLES table parameters...
Local optimal thickening is applied with 5.00 cells in the flame front.
----> Reading boundary conditions in asciibound file : .//../MESH/config5_uniform_true.asciiBound.key
_______________________________________________________________________________________
| Boundary patches (no reordering) |
|______________________________________________________________________________________|
| Patch number Patch name Boundary condition |
| ------------ ---------- ------------------ |
| 1 plenum_back OUTLET_RELAX_P_3D |
| 2 plenum_left OUTLET_RELAX_P_3D |
| 3 plenum_top OUTLET_RELAX_P_3D |
| 4 plenum_right OUTLET_RELAX_P_3D |
| 5 plenum_bottom OUTLET_RELAX_P_3D |
| 6 plenum_fr WALL_NOSLIP_ADIAB |
| 7 chamber_bottom WALL_NOSLIP_ADIAB |
| 8 chamber_left WALL_NOSLIP_ADIAB |
| 9 chamber_top WALL_NOSLIP_ADIAB |
| 10 chamber_right WALL_NOSLIP_ADIAB |
| 11 chamber_fr WALL_NOSLIP_ADIAB |
| 12 grid11_fr WALL_NOSLIP_ADIAB |
| 13 grid11_ba WALL_NOSLIP_ADIAB |
| 14 grid12_fr WALL_NOSLIP_ADIAB |
| 15 grid12_ba WALL_NOSLIP_ADIAB |
| 16 grid13_fr WALL_NOSLIP_ADIAB |
| 17 grid13_ba WALL_NOSLIP_ADIAB |
| 18 grid14_fr WALL_NOSLIP_ADIAB |
| 19 grid14_ba WALL_NOSLIP_ADIAB |
| 20 grid15_fr WALL_NOSLIP_ADIAB |
| 21 grid15_ba WALL_NOSLIP_ADIAB |
| 22 obtsacle_fr WALL_NOSLIP_ADIAB |
| 23 obtsacle_ba WALL_NOSLIP_ADIAB |
|______________________________________________________________________________________|
______________________________________________________________
| Info on initial grid |
|_____________________________________________________________|
| number of dimensions : 3 |
| number of nodes : 3543424 |
| number of cells : 20083754 |
| - tetrahedra : 20083754 |
| number of cell per group : 1000000 |
| number of boundary nodes : 243230 |
| number of periodic nodes : 0 |
| number of axi-periodic nodes : 0 |
|_____________________________________________________________|
| After partitioning |
|_____________________________________________________________|
| number of nodes : 3543424 |
| extra nodes due to partitioning : 0 [+ 0.00‰] |
|_____________________________________________________________|
______________________________________________________________
| Partitioning Quality |
|_____________________________________________________________|
| Maximum number of neighbors : 0.00 |
| Average number of neighbors : 0.00 |
| Maximum number of exchange nodes : 0.00 |
| Average number of exchange nodes : 0.00 |
|_____________________________________________________________|
----> Reading initial solution : .//../MESH/init.h5
----> Reading took 1.687s
______________________________________________________________
| Info on chemistry |
|_____________________________________________________________|
| Kinetic scheme : C3H8_F2 |
| |
| Chemical reaction #1 |
| Preexponential / fthick [SI] : 2.38433847E+09 |
| Activation temperature [K] : 2.08840191E+04 |
| |
| Chemical reaction #2 |
| Preexponential / fthick [SI] : 4.50000000E+07 |
| Activation temperature [K] : 1.00645875E+04 |
|_____________________________________________________________|
______________________________________________________________
| Info on initial solution |
|_____________________________________________________________|
| number of Navier-Stokes equations : 5 |
| number of species : 6 |
| number of reactions : 2 |
| number of tpf equations : 0 |
| number of fictive species : 0 |
| initial iteration : 93387 |
| initial time : 9.00005232E-03 |
|_____________________________________________________________|
----> Reading solutbound : .//../MESH/perso.solutBound.h5
- Using 6.X format
----> Reading took 0.010s
----> Initialising metrics
----> Total volume of the mesh [m3] : 1.35006148E+01
----> Smallest cell volume [m3] : 1.48598105E-12
----> Found cached wall distance computation. Checking: ./ywall.h5
> Signatures match
----> Reading cached wall distance computation: ./ywall.h5
----> Reading took 0.172s
----> Boundary MPIs: 1
----> End pre-processing.
________________________________________________________________________________________________________
***** GPU memory (used/total): 22707 MB / 24051 MB | Cell per group: 1000000
----> Starts the temporal loop.
***** GPU memory (used/total): 23004 MB / 24051 MB | Cell per group: 1000000
----> End computation.
________________________________________________________________________________________________________
____________________________________________________________________________________________
| 1 MPI tasks with GPU Elapsed real time [s] [s.cores] [h.cores] |
|___________________________________________________________________________________________|
| AVBP : 1195.84 1.1958E+03 3.3218E-01 |
| Temporal loop : 309.07 3.0907E+02 8.5854E-02 |
| Per simulated second : 3.4250E+07 3.4250E+07 9.5139E+03 |
| Per iteration : 3.0907 3.0907E+00 |
|-------------------------------------------------------------------------------------------|
| RCT [s.mpi/node/it] : 8.72244536E-07 |
|___________________________________________________________________________________________|
----> Initial physical time : 9.00005232E-03
Initial iteration : 93387
Initial timestep : 9.02395769E-08
----> Final physical time : 9.00907632E-03
Final iteration : 93487
Final timestep : 9.02440465E-08
----> Simulated physical time : 9.02399946E-06
Simulated iterations : 100
________________________________________________________________________________________________________
TIMERS
________________________________________________________________________________________________________
Prints relevant timers and breaks down percentage regarding reference timers.
> The 'Total slave simulation' time corresponds to the 1st level, and is measured by slave_timer (sum of pre temporal loop, temporal loop and post temporal loop).
> The 'Computation' time corresponds to the time integration loops, and is measured by rungekutta_timer.
> Levels are depicted using [X.Y.Z. ...] lists. The number of entry in the list corresponds to the level.
> References to the upper level is made to compute the contribution of one sub-level to its parent level.
> The times displayed are those of the master processor.
> For each timer, the minimum, maximum and mean values for all processors are also shown in the 3 right-hand columns.
> A json file 'timers.json' containing all the data is also available in the temporal output directory.
----- 1st level timers
time [s] | relative to [ min [s] mean [s] max [s] ]
| tot. slave [%] [ ]
> [0] Total slave simulation : 1.1958E+03 | 100.00% [ 1.1958E+03 1.1958E+03 1.1958E+03 ]
----- 2nd level timers
time [s] | relative to [ min [s] mean [s] max [s] ]
| tot. slave [%] [ ]
> > [0.1] Pre temporal loop : 8.8673E+02 | 74.15% [ 8.8673E+02 8.8673E+02 8.8673E+02 ]
> > [0.2] Temporal loop : 3.0907E+02 | 25.85% [ 3.0907E+02 3.0907E+02 3.0907E+02 ]
> > [0.2a] Temporal loop without IO : 3.0907E+02 | 25.85% [ 3.0907E+02 3.0907E+02 3.0907E+02 ]
> > [0.3] Post temporal loop : 3.7884E-02 | 0.00% [ 3.7884E-02 3.7884E-02 3.7884E-02 ]
> > [0.4] Point to Point communications : 1.4293E-01 | 0.01% [ 1.4293E-01 1.4293E-01 1.4293E-01 ]
----- 3rd level timers
time [s] | relative to | relative to [ min [s] mean [s] max [s] ]
| tot. slave [%] | upper level [%] [ ]
> > [0.1] Pre temporal loop : 8.8673E+02 | 74.15% [ 8.8673E+02 8.8673E+02 8.8673E+02 ]
> > > [0.1.1] Build online postprocessing objects : 0.0000E+00 | 0.00% | 0.00% [ 0.0000E+00 0.0000E+00 0.0000E+00 ]
> > [0.2] Temporal loop : 3.0907E+02 | 25.85% [ 3.0907E+02 3.0907E+02 3.0907E+02 ]
> > > [0.2.1] Computation : 3.0841E+02 | 25.79% | 99.79% [ 3.0841E+02 3.0841E+02 3.0841E+02 ]
> > > [0.2.2] Temporal post-processing : 0.0000E+00 | 0.00% | 0.00% [ 0.0000E+00 0.0000E+00 0.0000E+00 ]
> > > [0.2.3] Instantaneous solution post-processing : 0.0000E+00 | 0.00% | 0.00% [ 0.0000E+00 0.0000E+00 0.0000E+00 ]
> > > [0.2.4] Average solution post-processing : 0.0000E+00 | 0.00% | 0.00% [ 0.0000E+00 0.0000E+00 0.0000E+00 ]
> > > [0.2.5] Online post-processing compute and storage : 0.0000E+00 | 0.00% | 0.00% [ 0.0000E+00 0.0000E+00 0.0000E+00 ]
----- 4th level timers: focus on Computation level (rungekutta_timer)
time [s] | relative to | relative to | relative to [ min [s] mean [s] max [s] ]
| tot. slave [%] | computation [%]| upper level [%][ ]
> > > [0.2.1] Computation : 3.0841E+02 | 25.79% | 100.00% [ 3.0841E+02 3.0841E+02 3.0841E+02 ]
> > > > [0.2.1.1] Convective scheme : 7.4225E+01 | 6.21% | 24.07% | 24.07% [ 7.4225E+01 7.4225E+01 7.4225E+01 ]
> > > > [0.2.1.2] Diffusion operator : 8.9274E+01 | 7.47% | 28.95% | 28.95% [ 8.9274E+01 8.9274E+01 8.9274E+01 ]
> > > > [0.2.1.4] Time-step calculation : 4.8618E+00 | 0.41% | 1.58% | 1.58% [ 4.8618E+00 4.8618E+00 4.8618E+00 ]
> > > > [0.2.1.5] Transport calculation : 8.2592E-01 | 0.07% | 0.27% | 0.27% [ 8.2592E-01 8.2592E-01 8.2592E-01 ]
> > > > [0.2.1.6] Thermo calculation : 1.4672E+00 | 0.12% | 0.48% | 0.48% [ 1.4672E+00 1.4672E+00 1.4672E+00 ]
> > > > [0.2.1.7] Gradient calculation : 2.9224E+01 | 2.44% | 9.48% | 9.48% [ 2.9224E+01 2.9224E+01 2.9224E+01 ]
> > > > [0.2.1.8] Boundary : 3.6475E+00 | 0.31% | 1.18% | 1.18% [ 3.6475E+00 3.6475E+00 3.6475E+00 ]
> > > > [0.2.1.9] Turbulent viscosity model : 4.7548E+00 | 0.40% | 1.54% | 1.54% [ 4.7548E+00 4.7548E+00 4.7548E+00 ]
> > > > [0.2.1.10] Combustion (source term + TFLES + efcy + efcy I0) : 1.5014E+01 | 1.26% | 4.87% | 4.87% [ 1.5014E+01 1.5014E+01 1.5014E+01 ]
> > > > > [0.2.1.10.1] Chemical source terms calculation : 6.2687E+00 | 0.52% | 2.03% | 41.75% [ 6.2687E+00 6.2687E+00 6.2687E+00 ]
> > > > > [0.2.1.10.2] TFLES model calculation : 5.2677E+00 | 0.44% | 1.71% | 35.08% [ 5.2677E+00 5.2677E+00 5.2677E+00 ]
> > > > > [0.2.1.10.3] Efficiency function calculation : 3.4779E+00 | 0.29% | 1.13% | 23.16% [ 3.4779E+00 3.4779E+00 3.4779E+00 ]
> > > > > [0.2.1.10.4] Efficiency I0 function calculation : 0.0000E+00 | 0.00% | 0.00% | 0.00% [ 0.0000E+00 0.0000E+00 0.0000E+00 ]
> > > > [0.2.1.11] Artificial viscosity : 3.2180E+01 | 2.69% | 10.43% | 10.43% [ 3.2180E+01 3.2180E+01 3.2180E+01 ]
> > > > [0.2.1.17] Source terms : 0.0000E+00 | 0.00% | 0.00% | 0.00% [ 0.0000E+00 0.0000E+00 0.0000E+00 ]
----> End of AVBP session
----> Found 4 warning messages for this computation, check your output file!
***** Memory usage (system): Max: 17819.121 MB (rank:0) Min: 17819.121 MB (rank:0) Ave: 17819.121 MB Std: 0.000 MB
***** Maximum memory (mod_alloc) : 2129081512 B ( 2.030450E+03 MB)
Your experiment path is /home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0
To display your profiling results:
###########################################################################################################################################
# LEVEL | REPORT | COMMAND #
###########################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/exter/neto/LEFEX_20M/RUN/maqao_2025-06-12_15-19-48/tools/lprof_npsu_run_0 #
###########################################################################################################################################