* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6570)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 398017 microseconds.
(= 398017 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 24556.9 0.561044 0.560436 0.562432
Scale: 24374.1 0.565099 0.564638 0.566748
Add: 21307.3 0.969510 0.968862 0.970908
Triad: 21332.1 0.968443 0.967734 0.970362
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6570)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
#####################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6649)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 198987 microseconds.
(= 198987 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 50695.3 0.271878 0.271476 0.274315
Scale: 50274.4 0.273947 0.273749 0.275223
Add: 43847.0 0.471054 0.470815 0.471904
Triad: 43905.4 0.470539 0.470189 0.471637
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6649)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
#####################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6717)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 99206 microseconds.
(= 99206 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 98356.7 0.152490 0.139925 0.153822
Scale: 100801.8 0.153567 0.136531 0.154524
Add: 87302.6 0.241355 0.236463 0.242316
Triad: 87137.2 0.241137 0.236912 0.242346
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6717)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
#####################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6787)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 49523 microseconds.
(= 49523 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 184184.0 0.077286 0.074722 0.077533
Scale: 197142.7 0.077375 0.069810 0.077673
Add: 171591.7 0.123845 0.120308 0.124316
Triad: 171631.5 0.123843 0.120280 0.124527
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6787)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
#####################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6840)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Number of Threads counted = 16
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 25373 microseconds.
(= 25373 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 395147.7 0.035052 0.034829 0.038790
Scale: 394084.8 0.035164 0.034923 0.039180
Add: 344592.9 0.060084 0.059908 0.061236
Triad: 344506.5 0.060062 0.059923 0.061386
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6840)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
#####################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6928)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads counted = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 14379 microseconds.
(= 14379 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 760031.1 0.018979 0.018108 0.020222
Scale: 745860.2 0.019161 0.018452 0.020496
Add: 653182.6 0.032596 0.031605 0.033208
Triad: 652193.7 0.032587 0.031653 0.033143
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6928)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
#####################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 7027)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 10379 microseconds.
(= 10379 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 945030.6 0.014783 0.014563 0.015597
Scale: 939738.2 0.014860 0.014645 0.015532
Add: 867678.9 0.024465 0.023792 0.025085
Triad: 868627.6 0.024434 0.023766 0.025167
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 7027)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
#####################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 7159)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 96
Number of Threads counted = 96
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 8213 microseconds.
(= 8213 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 1463785.0 0.010153 0.009402 0.011155
Scale: 1502495.1 0.010120 0.009160 0.011241
Add: 1390144.5 0.015846 0.014850 0.043723
Triad: 1366149.3 0.015753 0.015111 0.017295
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 7159)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7
To display your profiling results:
#####################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
#####################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
#####################################################################################################################################################################################################