options
********************************************************************************
MAQAO 2025.1.2 - ad4b42c12cfbc289a7a711f3ded92abe2eb90c0a::20250917-142411 || 2025/09/17
/beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao oneview -R1 -WS -c=/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/config.json --with-FLOPS object-coverage-threshold=0.1 lprof_params=btm=fp --replace xp=/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241 -of=html 
CPY:  [true] /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/binaries/aocc_4/exec --> /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec
CPY:  [true] /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/build/llama.cpp/../aocc_4/bin/libggml-base.so --> /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/libs/libggml-base.so
CPY:  [true] /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/build/llama.cpp/../aocc_4/bin/libggml-blas.so --> /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/libs/libggml-blas.so
CPY:  [true] /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/build/llama.cpp/../aocc_4/bin/libggml-cpu.so --> /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/libs/libggml-cpu.so
CPY:  [true] /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/build/llama.cpp/../aocc_4/bin/libggml.so --> /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/libs/libggml.so
CPY:  [true] /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/build/llama.cpp/../aocc_4/bin/libllama.so --> /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/libs/libllama.so
CMD:  OMP_NUM_THREADS=6  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_0" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=6  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
CMD:  OMP_NUM_THREADS=72  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_1" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=72  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
CMD:  OMP_NUM_THREADS=96  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_2" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=96  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
CMD:  OMP_NUM_THREADS=120  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_3" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=120  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
CMD:  OMP_NUM_THREADS=128  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_4" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=128  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
CMD:  OMP_NUM_THREADS=144  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_5" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=144  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
CMD:  OMP_NUM_THREADS=168  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_6" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=168  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
CMD:  OMP_NUM_THREADS=192  I_MPI_PIN_ORDER=bunch  OMP_DISPLAY_AFFINITY=TRUE  OMP_PROC_BIND=spread  OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'  OMP_DISPLAY_ENV=TRUE  I_MPI_PIN_DOMAIN=auto  I_MPI_DEBUG=4  OMP_PLACES=threads   /beegfs/hackathon/users/eoseret/MAQAO_ad4b42/bin/maqao lprof _caller=oneview btm=fp --xp="/beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/tools/lprof_npsu_run_7" --mpi-command="mpirun -n 1  " --collect-CPU-time-intervals -p=SSE_AVX_FLOP  --collect-topology tpp=192  -ldi=libggml-base.so,libggml-blas.so,libggml-cpu.so,libggml.so,libllama.so  -- /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_results_1759256241/binaries/exec -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t 192 -n 512 -p \"what is a LLM?\" --seed 0
In run 1x6, 32 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.040456910879584% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
90 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.1332964663743% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
In run 1x72, 37 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.027979803009657% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
23 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.0048926430899883% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
In run 1x96, 35 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.037012105909525% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
8 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.0021956333657726% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
In run 1x120, 37 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.040869760588975% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
11 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.0016288673359668% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
In run 1x128, 38 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.03670471382793% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
14 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.0024884552403819% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
In run 1x144, 30 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.029588419434731% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
13 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.0021463102602865% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
In run 1x168, 35 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.04624492469884% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
7 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.0010510210267967% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
In run 1x192, 33 loops were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.042284650946386% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
8 functions were discarded from static analysis because their coverage
are lower than object_coverage_threshold value (0.1%).
That represents 0.0011545297456906% of the execution time. To include them, change the value
in the experiment directory configuration file, then rerun the command with the additionnal parameter
--force-static-analysis
×