* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 743963)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 743968)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 1 threads on rank 0
0-> 0
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.03519
LPlusTimes 10 26.30353
LTimes 10 25.30808
Population 10 11.24723
Scattering 10 921.94314
Solve 1 1009.30628
Source 10 0.01539
SweepSolver 10 23.63157
SweepSubdomain 160 20.69514
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.035189,26.303530,25.308079,11.247230,921.943144,1009.306284,0.015389,23.631571,20.695143
Figures of Merit
================
Throughput: 3.989405e+06 [unknowns/(second/iteration)]
Grind time : 2.506639e-07 [(seconds/iteration)/unknowns]
Sweep efficiency : 87.57413 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 743963)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 743963) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 743968)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 743968) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
#######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744118)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744123)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 2 threads on rank 0
0-> 0 1-> 48
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02087
LPlusTimes 10 15.30236
LTimes 10 13.47482
Population 10 4.24667
Scattering 10 459.39480
Solve 1 511.24172
Source 10 0.00940
SweepSolver 10 17.94569
SweepSubdomain 160 10.58768
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020873,15.302358,13.474820,4.246668,459.394798,511.241718,0.009404,17.945686,10.587680
Figures of Merit
================
Throughput: 7.875984e+06 [unknowns/(second/iteration)]
Grind time : 1.269683e-07 [(seconds/iteration)/unknowns]
Sweep efficiency : 58.99847 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744123)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744123) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744118)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744118) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
#######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744229)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744234)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 4 threads on rank 0
0-> 0 1-> 24 2-> 48 3->120
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02109
LPlusTimes 10 8.77221
LTimes 10 8.57832
Population 10 3.32610
Scattering 10 232.16859
Solve 1 260.04760
Source 10 0.00542
SweepSolver 10 6.33482
SweepSubdomain 160 5.88000
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.021085,8.772207,8.578325,3.326104,232.168589,260.047600,0.005418,6.334818,5.880002
Figures of Merit
================
Throughput: 1.548383e+07 [unknowns/(second/iteration)]
Grind time : 6.458352e-08 [(seconds/iteration)/unknowns]
Sweep efficiency : 92.82038 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744229)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744229) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744234)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744234) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
#######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744332)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744337)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 8 threads on rank 0
0-> 0 1-> 12 2-> 24 3-> 36 4-> 48 5-> 60 6->120 7->132
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02075
LPlusTimes 10 4.57218
LTimes 10 4.47847
Population 10 1.24629
Scattering 10 115.19935
Solve 1 130.79177
Source 10 0.00326
SweepSolver 10 4.42803
SweepSubdomain 160 2.99588
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020746,4.572178,4.478471,1.246289,115.199349,130.791771,0.003265,4.428028,2.995882
Figures of Merit
================
Throughput: 3.078582e+07 [unknowns/(second/iteration)]
Grind time : 3.248249e-08 [(seconds/iteration)/unknowns]
Sweep efficiency : 67.65725 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744332)
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744337)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744332) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744337) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
#######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744447)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744452)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 16 threads on rank 0
0-> 0 1-> 6 2-> 12 3-> 18 4-> 24 5-> 30 6-> 36 7-> 42
8-> 48 9-> 54 10-> 60 11-> 66 12->120 13->126 14->132 15->138
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02016
LPlusTimes 10 2.43457
LTimes 10 2.97713
Population 10 0.70377
Scattering 10 57.66566
Solve 1 67.06154
Source 10 0.00207
SweepSolver 10 2.48738
SweepSubdomain 160 1.56057
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020156,2.434567,2.977134,0.703774,57.665657,67.061541,0.002066,2.487385,1.560568
Figures of Merit
================
Throughput: 6.004234e+07 [unknowns/(second/iteration)]
Grind time : 1.665491e-08 [(seconds/iteration)/unknowns]
Sweep efficiency : 62.73930 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744452)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744452) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744447)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744447) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
#######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744583)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744588)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 32 threads on rank 0
0-> 0 1-> 3 2-> 6 3-> 9 4-> 12 5-> 15 6-> 18 7-> 21
8-> 24 9-> 27 10-> 30 11-> 33 12-> 36 13-> 39 14-> 42 15-> 45
16-> 48 17-> 51 18-> 54 19-> 57 20-> 60 21-> 63 22-> 66 23-> 69
24->120 25->123 26->126 27->129 28->132 29->135 30->138 31->141
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02080
LPlusTimes 10 1.96904
LTimes 10 2.34191
Population 10 0.32580
Scattering 10 28.97500
Solve 1 36.89074
Source 10 0.00122
SweepSolver 10 2.53398
SweepSubdomain 160 0.82647
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020798,1.969044,2.341909,0.325795,28.975001,36.890740,0.001220,2.533978,0.826467
Figures of Merit
================
Throughput: 1.091475e+08 [unknowns/(second/iteration)]
Grind time : 9.161914e-09 [(seconds/iteration)/unknowns]
Sweep efficiency : 32.61542 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744583)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744583) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744588)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744588) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
#######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744763)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 744768)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 64 threads on rank 0
0-> 0 1->193 2-> 3 3->196 4-> 6 5->199 6-> 9 7->202
8-> 12 9->205 10-> 15 11->208 12-> 18 13->211 14-> 21 15->214
16-> 24 17->217 18-> 27 19->220 20-> 30 21->223 22-> 33 23->226
24-> 36 25->229 26-> 39 27->232 28-> 42 29->235 30-> 45 31->238
32-> 48 33->241 34-> 51 35->244 36-> 54 37->247 38-> 57 39->250
40-> 60 41->253 42-> 63 43->256 44-> 66 45->259 46-> 69 47->262
48->120 49->313 50->123 51->316 52->126 53->319 54->129 55->322
56->132 57->325 58->135 59->328 60->138 61->331 62->141 63->334
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02102
LPlusTimes 10 3.98268
LTimes 10 3.73604
Population 10 0.09198
Scattering 10 14.75223
Solve 1 28.10089
Source 10 0.00089
SweepSolver 10 4.78789
SweepSubdomain 160 0.62857
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.021020,3.982683,3.736037,0.091981,14.752230,28.100891,0.000895,4.787888,0.628565
Figures of Merit
================
Throughput: 1.432884e+08 [unknowns/(second/iteration)]
Grind time : 6.978932e-09 [(seconds/iteration)/unknowns]
Sweep efficiency : 13.12824 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744763)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744763) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 744768)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 744768) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
#######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 745089)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 745094)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O2 -march=znver4 -flto -funroll-loops -ffast-math -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=g++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 96 threads on rank 0
0-> 0 1-> 1 2-> 2 3-> 3 4-> 4 5-> 5 6-> 6 7-> 7
8-> 8 9-> 9 10-> 10 11-> 11 12-> 12 13-> 13 14-> 14 15-> 15
16-> 16 17-> 17 18-> 18 19-> 19 20-> 20 21-> 21 22-> 22 23-> 23
24-> 24 25-> 25 26-> 26 27-> 27 28-> 28 29-> 29 30-> 30 31-> 31
32-> 32 33-> 33 34-> 34 35-> 35 36-> 36 37-> 37 38-> 38 39-> 39
40-> 40 41-> 41 42-> 42 43-> 43 44-> 44 45-> 45 46-> 46 47-> 47
48-> 48 49-> 49 50-> 50 51-> 51 52-> 52 53-> 53 54-> 54 55-> 55
56-> 56 57-> 57 58-> 58 59-> 59 60-> 60 61-> 61 62-> 62 63-> 63
64-> 64 65-> 65 66-> 66 67-> 67 68-> 68 69-> 69 70-> 70 71-> 71
72->120 73->121 74->122 75->123 76->124 77->125 78->126 79->127
80->128 81->129 82->130 83->131 84->132 85->133 86->134 87->135
88->136 89->137 90->138 91->139 92->140 93->141 94->142 95->143
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02021
LPlusTimes 10 2.61444
LTimes 10 3.19529
Population 10 0.10587
Scattering 10 11.09906
Solve 1 23.56151
Source 10 0.00123
SweepSolver 10 5.72667
SweepSubdomain 160 0.51379
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020210,2.614436,3.195291,0.105868,11.099064,23.561509,0.001228,5.726666,0.513791
Figures of Merit
================
Throughput: 1.708945e+08 [unknowns/(second/iteration)]
Grind time : 5.851564e-09 [(seconds/iteration)/unknowns]
Sweep efficiency : 8.97190 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 745094)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 745094) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 745089)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 745089) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7
To display your profiling results:
#######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/gcc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
#######################################################################################################################################################################################################