* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 19129)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 507839 microseconds.
(= 507839 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 12405.0 1.110495 1.109440 1.110877
Scale: 12453.0 1.106691 1.105160 1.108153
Add: 15573.5 1.326685 1.325576 1.327270
Triad: 15566.9 1.327069 1.326140 1.327535
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 19129)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 19472)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 247635 microseconds.
(= 247635 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 24791.8 0.555340 0.555125 0.555423
Scale: 24928.1 0.552842 0.552091 0.553257
Add: 31131.9 0.663422 0.663108 0.664081
Triad: 31099.7 0.664171 0.663796 0.665541
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 19472)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 19781)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 124086 microseconds.
(= 124086 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 43803.6 0.314404 0.314188 0.314514
Scale: 44356.0 0.310370 0.310275 0.310449
Add: 58841.2 0.351077 0.350840 0.351293
Triad: 58627.0 0.352319 0.352122 0.352830
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 19781)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 20096)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 61951 microseconds.
(= 61951 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 87390.2 0.157621 0.157484 0.157692
Scale: 88554.8 0.155548 0.155413 0.155654
Add: 117553.7 0.175733 0.175612 0.175942
Triad: 117176.7 0.176291 0.176177 0.176550
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 20096)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 20420)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Number of Threads counted = 16
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 37042 microseconds.
(= 37042 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 172731.6 0.079749 0.079676 0.079787
Scale: 174392.8 0.078972 0.078917 0.079004
Add: 229210.5 0.090152 0.090065 0.090194
Triad: 228877.6 0.090249 0.090196 0.090327
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 20420)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 20764)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads counted = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 31038 microseconds.
(= 31038 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 318452.5 0.043329 0.043217 0.043913
Scale: 321217.4 0.042998 0.042845 0.043980
Add: 407304.9 0.050820 0.050684 0.051470
Triad: 405903.4 0.050968 0.050859 0.051554
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 20764)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 21153)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 64
Number of Threads counted = 64
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 29479 microseconds.
(= 29479 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 448876.7 0.030729 0.030660 0.030734
Scale: 449683.4 0.030760 0.030605 0.030977
Add: 496843.3 0.041696 0.041550 0.041712
Triad: 496222.3 0.041657 0.041602 0.041704
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 21153)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-low-ppn' engine for node o401
* Info: Process launched (host o401, process 21642)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 112
Number of Threads counted = 112
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 29622 microseconds.
(= 29622 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 463323.5 0.029909 0.029704 0.029907
Scale: 462405.0 0.029908 0.029763 0.029962
Add: 486802.7 0.042497 0.042407 0.042546
Triad: 487147.3 0.042461 0.042377 0.042512
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host o401, process 21642)
Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7 #
########################################################################################################################################################################################################