* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9469)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9474)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 1 threads on rank 0
0-> 0
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02985
LPlusTimes 10 17.01816
LTimes 10 25.68544
Population 10 1.67824
Scattering 10 1025.60752
Solve 1 1107.14039
Source 10 0.04474
SweepSolver 10 36.24844
SweepSubdomain 160 18.97942
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.029850,17.018161,25.685436,1.678242,1025.607521,1107.140387,0.044739,36.248443,18.979417
Figures of Merit
================
Throughput: 3.636876e+06 [unknowns/(second/iteration)]
Grind time : 2.749613e-07 [(seconds/iteration)/unknowns]
Sweep efficiency : 52.35926 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9469)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9469) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9474)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9474) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9638)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9643)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 2 threads on rank 0
0-> 0 1-> 16
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02002
LPlusTimes 10 10.28881
LTimes 10 14.09989
Population 10 0.84334
Scattering 10 515.73824
Solve 1 564.03207
Source 10 0.02224
SweepSolver 10 22.20197
SweepSubdomain 160 9.56857
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020022,10.288809,14.099891,0.843335,515.738238,564.032071,0.022243,22.201971,9.568570
Figures of Merit
================
Throughput: 7.138835e+06 [unknowns/(second/iteration)]
Grind time : 1.400789e-07 [(seconds/iteration)/unknowns]
Sweep efficiency : 43.09784 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9643)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9643) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9638)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9638) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9757)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9762)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 4 threads on rank 0
0-> 0 1-> 8 2-> 16 3->120
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.01927
LPlusTimes 10 6.20350
LTimes 10 9.02198
Population 10 3.00649
Scattering 10 260.31054
Solve 1 285.30179
Source 10 0.01237
SweepSolver 10 5.92256
SweepSubdomain 160 5.47562
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.019270,6.203496,9.021980,3.006486,260.310541,285.301786,0.012374,5.922562,5.475624
Figures of Merit
================
Throughput: 1.411324e+07 [unknowns/(second/iteration)]
Grind time : 7.085546e-08 [(seconds/iteration)/unknowns]
Sweep efficiency : 92.45365 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9762)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9762) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9757)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9757) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9858)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9863)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 8 threads on rank 0
0-> 0 1-> 28 2-> 8 3-> 36 4-> 16 5-> 44 6->120 7->132
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.01960
LPlusTimes 10 3.21406
LTimes 10 4.63567
Population 10 0.21309
Scattering 10 128.72984
Solve 1 142.41169
Source 10 0.00637
SweepSolver 10 4.83026
SweepSubdomain 160 2.81619
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.019598,3.214059,4.635671,0.213094,128.729837,142.411694,0.006365,4.830262,2.816186
Figures of Merit
================
Throughput: 2.827388e+07 [unknowns/(second/iteration)]
Grind time : 3.536833e-08 [(seconds/iteration)/unknowns]
Sweep efficiency : 58.30296 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9858)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9858) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9863)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9863) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9968)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9973)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 16 threads on rank 0
0-> 0 1-> 6 2-> 28 3-> 50 4-> 8 5-> 14 6-> 36 7-> 58
8-> 16 9-> 22 10-> 44 11-> 66 12->120 13->126 14->132 15->138
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02015
LPlusTimes 10 2.32961
LTimes 10 2.86083
Population 10 0.47778
Scattering 10 64.92937
Solve 1 73.71058
Source 10 0.00342
SweepSolver 10 2.34543
SweepSubdomain 160 1.44526
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020146,2.329613,2.860828,0.477778,64.929371,73.710582,0.003419,2.345435,1.445257
Figures of Merit
================
Throughput: 5.462624e+07 [unknowns/(second/iteration)]
Grind time : 1.830622e-08 [(seconds/iteration)/unknowns]
Sweep efficiency : 61.62002 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9973)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9973) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9968)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9968) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10110)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10115)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 32 threads on rank 0
0-> 0 1-> 3 2-> 6 3-> 25 4-> 28 5-> 31 6-> 50 7-> 53
8-> 8 9-> 11 10-> 14 11-> 33 12-> 36 13-> 39 14-> 58 15-> 61
16-> 16 17-> 19 18-> 22 19-> 41 20-> 44 21-> 47 22-> 66 23-> 69
24->120 25->123 26->126 27->129 28->132 29->135 30->138 31->141
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.02008
LPlusTimes 10 1.95204
LTimes 10 2.38741
Population 10 0.15799
Scattering 10 32.69943
Solve 1 39.76690
Source 10 0.00187
SweepSolver 10 1.82181
SweepSubdomain 160 0.77328
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.020076,1.952037,2.387410,0.157993,32.699428,39.766902,0.001870,1.821806,0.773277
Figures of Merit
================
Throughput: 1.012533e+08 [unknowns/(second/iteration)]
Grind time : 9.876217e-09 [(seconds/iteration)/unknowns]
Sweep efficiency : 42.44560 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10110)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10110) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10115)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10115) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10292)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10297)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 64 threads on rank 0
0-> 0 1->193 2-> 3 3->196 4-> 6 5->199 6-> 25 7->218
8-> 28 9->221 10-> 31 11->240 12-> 50 13->243 14-> 53 15->246
16-> 8 17->201 18-> 11 19->204 20-> 14 21->207 22-> 33 23->226
24-> 36 25->229 26-> 39 27->248 28-> 58 29->251 30-> 61 31->254
32-> 16 33->209 34-> 19 35->212 36-> 22 37->215 38-> 41 39->234
40-> 44 41->237 42-> 47 43->256 44-> 66 45->259 46-> 69 47->262
48->120 49->313 50->123 51->316 52->126 53->319 54->129 55->322
56->132 57->325 58->135 59->328 60->138 61->331 62->141 63->334
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.01959
LPlusTimes 10 2.24294
LTimes 10 3.50936
Population 10 0.08517
Scattering 10 16.91393
Solve 1 29.32880
Source 10 0.00119
SweepSolver 10 5.80821
SweepSubdomain 160 0.58777
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.019594,2.242945,3.509356,0.085166,16.913929,29.328797,0.001191,5.808214,0.587772
Figures of Merit
================
Throughput: 1.372894e+08 [unknowns/(second/iteration)]
Grind time : 7.283886e-09 [(seconds/iteration)/unknowns]
Sweep efficiency : 10.11966 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10297)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10297) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10292)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10292) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10621)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10627)
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.4
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen
Compilation Options:
Architecture: OpenMP
Compiler: /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
Compiler Flags: "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++ -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: No
CUDA Enabled: No
MPI Enabled: Yes
OpenMP Enabled: Yes
Caliper Enabled: No
OpenMP Thread->Core mapping for 96 threads on rank 0
0-> 0 1-> 1 2-> 2 3-> 3 4-> 4 5-> 5 6-> 6 7-> 7
8-> 24 9-> 25 10-> 26 11-> 27 12-> 28 13-> 29 14-> 30 15-> 31
16-> 48 17-> 49 18-> 50 19-> 51 20-> 52 21-> 53 22-> 54 23-> 55
24-> 8 25-> 9 26-> 10 27-> 11 28-> 12 29-> 13 30-> 14 31-> 15
32-> 32 33-> 33 34-> 34 35-> 35 36-> 36 37-> 37 38-> 38 39-> 39
40-> 56 41-> 57 42-> 58 43-> 59 44-> 60 45-> 61 46-> 62 47-> 63
48-> 16 49-> 17 50-> 18 51-> 19 52-> 20 53-> 21 54-> 22 55-> 23
56-> 40 57-> 41 58-> 42 59-> 43 60-> 44 61-> 45 62-> 46 63-> 47
64-> 64 65-> 65 66-> 66 67-> 67 68-> 68 69-> 69 70-> 70 71-> 71
72->120 73->121 74->122 75->123 76->124 77->125 78->126 79->127
80->128 81->129 82->130 83->131 84->132 85->133 86->134 87->135
88->136 89->137 90->138 91->139 92->140 93->141 94->142 95->143
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 1024
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 2
Spatial decomp: 2 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 512 groups/set
Zone Sets: 1 x 1 x 1
Architecture: OpenMP
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 2 1 / 2
(Rx,Ry,Rz) R in XYZ: 2x1x1 1x1x1 / 2x1x1
(PQR) TOTAL: 2 16 / 32
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15728640 120.000
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 25165824 192.000
j_plane 25165824 192.000
k_plane 25165824 192.000
mixelem_to_fraction 4352 0.033
phi 104857600 800.000
phi_out 104857600 800.000
psi 402653184 3072.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 402653184 3072.000
sigt_zonal 4194304 32.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 1110455664 8472.104
Generation Complete!
Steady State Solve
==================
iter 0: particle count=1.197998e+09, change=1.000000e+00
iter 1: particle count=1.801368e+09, change=3.349511e-01
iter 2: particle count=2.102278e+09, change=1.431351e-01
iter 3: particle count=2.251810e+09, change=6.640521e-02
iter 4: particle count=2.325888e+09, change=3.184924e-02
iter 5: particle count=2.362467e+09, change=1.548355e-02
iter 6: particle count=2.380471e+09, change=7.563193e-03
iter 7: particle count=2.389305e+09, change=3.697158e-03
iter 8: particle count=2.393627e+09, change=1.805479e-03
iter 9: particle count=2.395735e+09, change=8.801810e-04
Solver terminated
Timers
======
Timer Count Seconds
---------------- ------------ ------------
Generate 1 0.01932
LPlusTimes 10 2.24593
LTimes 10 2.67905
Population 10 0.05873
Scattering 10 12.88735
Solve 1 26.16114
Source 10 0.00083
SweepSolver 10 7.48809
SweepSubdomain 160 0.53343
TIMER_NAMES:Generate,LPlusTimes,LTimes,Population,Scattering,Solve,Source,SweepSolver,SweepSubdomain
TIMER_DATA:0.019320,2.245928,2.679050,0.058726,12.887350,26.161144,0.000831,7.488086,0.533435
Figures of Merit
================
Throughput: 1.539127e+08 [unknowns/(second/iteration)]
Grind time : 6.497190e-09 [(seconds/iteration)/unknowns]
Sweep efficiency : 7.12378 [100.0 * SweepSubdomain time / SweepSolver time]
Number of unknowns: 402653184
END
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10621)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10621) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10627)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10627) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7 #
########################################################################################################################################################################################################