| Run 1x6 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 6I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
|---|---|
| Run 1x72 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 72I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
| Run 1x96 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 96I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
| Run 1x120 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 120I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
| Run 1x128 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 128I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
| Run 1x144 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 144I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
| Run 1x168 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 168I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
| Run 1x192 | Number processes: 1Number nodes: 1Run Command: <executable> -m meta-llama-3.1-8b-instruct-Q8_0.gguf -no-cnv -t <OMP_NUM_THREADS> -n 512 -p "what is a LLM?" --seed 0MPI Command: mpirun -n <number_processes> Dataset: Run Directory: /beegfs/hackathon/users/eoseret/qaas_runs_test/175-924-9259/intel/llama.cpp/run/oneview_runs/multicore/aocc_4/oneview_run_1759256241OMP_NUM_THREADS: 192I_MPI_PIN_ORDER: bunchOMP_DISPLAY_AFFINITY: TRUEOMP_PROC_BIND: spreadOMP_AFFINITY_FORMAT: 'OMP: pid %P tid %i thread %n bound to OS proc set {%A}'OMP_DISPLAY_ENV: TRUEI_MPI_PIN_DOMAIN: autoI_MPI_DEBUG: 4OMP_PLACES: threads |
| Loop id | Source Location | Source Function | Level | Max Thread Time / Walltime 1x6 (%) | Max Thread Time / Walltime 1x72 (%) | Max Thread Time / Walltime 1x96 (%) | Max Thread Time / Walltime 1x120 (%) | Max Thread Time / Walltime 1x128 (%) | Max Thread Time / Walltime 1x144 (%) | Max Thread Time / Walltime 1x168 (%) | Max Thread Time / Walltime 1x192 (%) | Exclusive Coverage 1x6 (%) | Exclusive Coverage 1x72 (%) | Exclusive Coverage 1x96 (%) | Exclusive Coverage 1x120 (%) | Exclusive Coverage 1x128 (%) | Exclusive Coverage 1x144 (%) | Exclusive Coverage 1x168 (%) | Exclusive Coverage 1x192 (%) | Inclusive Coverage 1x6 (%) | Inclusive Coverage 1x72 (%) | Inclusive Coverage 1x96 (%) | Inclusive Coverage 1x120 (%) | Inclusive Coverage 1x128 (%) | Inclusive Coverage 1x144 (%) | Inclusive Coverage 1x168 (%) | Inclusive Coverage 1x192 (%) | Max Exclusive Time Over Threads 1x6 (s) | Max Exclusive Time Over Threads 1x72 (s) | Max Exclusive Time Over Threads 1x96 (s) | Max Exclusive Time Over Threads 1x120 (s) | Max Exclusive Time Over Threads 1x128 (s) | Max Exclusive Time Over Threads 1x144 (s) | Max Exclusive Time Over Threads 1x168 (s) | Max Exclusive Time Over Threads 1x192 (s) | Max Inclusive Time Over Threads 1x6 (s) | Max Inclusive Time Over Threads 1x72 (s) | Max Inclusive Time Over Threads 1x96 (s) | Max Inclusive Time Over Threads 1x120 (s) | Max Inclusive Time Over Threads 1x128 (s) | Max Inclusive Time Over Threads 1x144 (s) | Max Inclusive Time Over Threads 1x168 (s) | Max Inclusive Time Over Threads 1x192 (s) | Exclusive Time w.r.t. Wall Time 1x6 (s) | Exclusive Time w.r.t. Wall Time 1x72 (s) | Exclusive Time w.r.t. Wall Time 1x96 (s) | Exclusive Time w.r.t. Wall Time 1x120 (s) | Exclusive Time w.r.t. Wall Time 1x128 (s) | Exclusive Time w.r.t. Wall Time 1x144 (s) | Exclusive Time w.r.t. Wall Time 1x168 (s) | Exclusive Time w.r.t. Wall Time 1x192 (s) | Inclusive Time w.r.t. Wall Time 1x6 (s) | Inclusive Time w.r.t. Wall Time 1x72 (s) | Inclusive Time w.r.t. Wall Time 1x96 (s) | Inclusive Time w.r.t. Wall Time 1x120 (s) | Inclusive Time w.r.t. Wall Time 1x128 (s) | Inclusive Time w.r.t. Wall Time 1x144 (s) | Inclusive Time w.r.t. Wall Time 1x168 (s) | Inclusive Time w.r.t. Wall Time 1x192 (s) | Nb Threads 1x6 | Nb Threads 1x72 | Nb Threads 1x96 | Nb Threads 1x120 | Nb Threads 1x128 | Nb Threads 1x144 | Nb Threads 1x168 | Nb Threads 1x192 | GFLOPS 1x6 | GFLOPS 1x72 | GFLOPS 1x96 | GFLOPS 1x120 | GFLOPS 1x128 | GFLOPS 1x144 | GFLOPS 1x168 | GFLOPS 1x192 | Vectorization Ratio (%) | Vector Length Use (%) | Speedup If No Scalar Integer | Speedup If FP Vectorized | Speedup If Fully Vectorized | Speedup If Perfect Load Balancing 1x6 | Speedup If Perfect Load Balancing 1x72 | Speedup If Perfect Load Balancing 1x96 | Speedup If Perfect Load Balancing 1x120 | Speedup If Perfect Load Balancing 1x128 | Speedup If Perfect Load Balancing 1x144 | Speedup If Perfect Load Balancing 1x168 | Speedup If Perfect Load Balancing 1x192 | Stride 0 | Stride 1 | Stride n | Stride Unknown | Stride Indirect | Array Access Efficiency | (1x6) Efficiency | (1x6) Potential Speed-Up (%) | (1x72) Efficiency | (1x72) Potential Speed-Up (%) | (1x96) Efficiency | (1x96) Potential Speed-Up (%) | (1x120) Efficiency | (1x120) Potential Speed-Up (%) | (1x128) Efficiency | (1x128) Potential Speed-Up (%) | (1x144) Efficiency | (1x144) Potential Speed-Up (%) | (1x168) Efficiency | (1x168) Potential Speed-Up (%) | (1x192) Efficiency | (1x192) Potential Speed-Up (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 548 | libggml-cpu.so - mmq.cpp:1570-1597 [...] | ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::$_1::operator()(int, int) const::{lambda()#1}::operator()() const | Single | 65.72 | 29.05 | 29.34 | 26.43 | 25.45 | 26.45 | 35.22 | 37.92 | 74.49 | 35.41 | 31.98 | 30.78 | 30.66 | 27.87 | 25.16 | 24.91 | 74.49 | 35.41 | 31.98 | 30.78 | 30.66 | 27.87 | 25.16 | 24.91 | 48.87 | 8.85 | 8.83 | 7.88 | 7.52 | 7.66 | 12.06 | 13.26 | 48.87 | 8.85 | 8.83 | 7.88 | 7.52 | 7.66 | 12.06 | 13.26 | 42.92 | 4.88 | 4.40 | 4.04 | 3.94 | 3.80 | 4.43 | 4.56 | 42.92 | 4.88 | 4.40 | 4.04 | 3.94 | 3.80 | 4.43 | 4.56 | 6 | 72 | 96 | 118 | 126 | 144 | 167 | 183 | 8.36 | 73.58 | 81.54 | 88.39 | 90.51 | 94.48 | 80.61 | 78.86 | NA | NA | NA | NA | NA | 1.14 | 1.83 | 2.02 | 1.94 | 1.9 | 2.03 | 2.74 | 2.81 | NA | NA | NA | NA | NA | 0.00 | 1 | 0 | 0.73 | 9.46 | 0.61 | 12.49 | 0.53 | 14.42 | 0.51 | 15.01 | 0.47 | 14.77 | 0.35 | 16.46 | 0.29 | 17.58 |
| 2304 | libggml-cpu.so - vec.h:491-497 | ggml_compute_forward_flash_attn_ext | Innermost | 1.24 | 0.61 | 0.48 | 0.49 | 0.47 | 0.43 | 0.66 | 0.43 | 0.93 | 0.33 | 0.20 | 0.15 | 0.14 | 0.12 | 0.11 | 0.08 | 0.93 | 0.33 | 0.20 | 0.15 | 0.14 | 0.12 | 0.11 | 0.08 | 0.92 | 0.19 | 0.15 | 0.15 | 0.14 | 0.13 | 0.23 | 0.15 | 0.92 | 0.19 | 0.15 | 0.15 | 0.14 | 0.13 | 0.23 | 0.15 | 0.54 | 0.05 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02 | 0.01 | 0.54 | 0.05 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02 | 0.01 | 6 | 33 | 33 | 32 | 35 | 32 | 32 | 32 | 23.88 | 336.74 | 540.18 | 620.65 | 683.81 | 701.05 | 819.23 | 947.09 | NA | NA | NA | NA | NA | 1.73 | 1.9 | 1.81 | 1.93 | 2.21 | 1.69 | 2.26 | 1.72 | NA | NA | NA | NA | NA | 0.00 | 1 | 0 | 0.99 | 0 | 1.21 | 0 | 1.32 | 0 | 1.44 | 0 | 1.34 | 0 | 1 | 0 | 1.14 | 0 |
| 1232 | libggml-cpu.so - vec.cpp:311-316 | ggml_vec_dot_f16 | Single | 0.75 | 0.59 | 0.47 | 0.49 | 0.51 | 0.59 | 0.45 | 0.37 | 0.71 | 0.35 | 0.21 | 0.17 | 0.16 | 0.13 | 0.12 | 0.09 | 0.71 | 0.35 | 0.21 | 0.17 | 0.16 | 0.13 | 0.12 | 0.09 | 0.56 | 0.18 | 0.14 | 0.15 | 0.15 | 0.17 | 0.16 | 0.13 | 0.56 | 0.18 | 0.14 | 0.15 | 0.15 | 0.17 | 0.16 | 0.13 | 0.41 | 0.05 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.41 | 0.05 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 6 | 36 | 33 | 33 | 35 | 32 | 34 | 33 | 28.85 | 537.25 | 858.10 | 1157.03 | 1316.84 | 1405.40 | 1077.92 | 1350.70 | NA | NA | NA | NA | NA | 1.36 | 1.89 | 1.7 | 1.78 | 2.07 | 2.18 | 1.5 | 1.41 | NA | NA | NA | NA | NA | 0.00 | 1 | 0 | 0.71 | 0.1 | 0.89 | 0.02 | 0.9 | 0.02 | 0.96 | 0.01 | 0.97 | 0 | 0.69 | 0.04 | 0.79 | 0.02 |
| 713 | libggml-cpu.so - mmq.cpp:520-2488 [...] | ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::$_2::operator()(int, int) const::{lambda()#1}::operator()() const | InBetween | 0.32 | 0.20 | 0.17 | 0.15 | 0.15 | 0.16 | 0.15 | 0.13 | 0.32 | 0.17 | 0.14 | 0.13 | 0.15 | 0.12 | 0.09 | 0.08 | 0.34 | 0.19 | 0.15 | 0.15 | 0.17 | 0.14 | 0.11 | 0.10 | 0.24 | 0.06 | 0.05 | 0.05 | 0.05 | 0.04 | 0.05 | 0.04 | 0.25 | 0.06 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.19 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.20 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 6 | 64 | 90 | 112 | 128 | 128 | 150 | 149 | 27.54 | 199.84 | 258.90 | 271.69 | 221.41 | 249.82 | 287.88 | 315.16 | NA | NA | NA | NA | NA | 1.3 | 2.25 | 2.45 | 2.48 | 2.43 | 2.45 | 2.72 | 2.3 | NA | NA | NA | NA | NA | 0.00 | 1 | 0 | 0.65 | 0.06 | 0.6 | 0.06 | 0.54 | 0.06 | 0.46 | 0.08 | 0.47 | 0.06 | 0.4 | 0.06 | 0.38 | 0.05 |
| 399 | libggml-cpu.so - mmq.cpp:303-1392 [...] | void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int,... | Innermost | 0.36 | 0.21 | 0.17 | 0.22 | 0.22 | 0.19 | 0.16 | 0.13 | 0.31 | 0.26 | 0.20 | 0.20 | 0.22 | 0.16 | 0.11 | 0.10 | 0.31 | 0.26 | 0.20 | 0.20 | 0.22 | 0.16 | 0.11 | 0.10 | 0.27 | 0.06 | 0.05 | 0.06 | 0.06 | 0.05 | 0.05 | 0.05 | 0.27 | 0.06 | 0.05 | 0.06 | 0.06 | 0.05 | 0.05 | 0.05 | 0.18 | 0.04 | 0.03 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 | 0.18 | 0.04 | 0.03 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 | 6 | 68 | 90 | 109 | 128 | 127 | 144 | 172 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 90.91 | 38.76 | 1.47 | 1 | 1.41 | 1.47 | 1.75 | 1.68 | 2.24 | 2.33 | 2.2 | 2.41 | 2.24 | 23 | 0 | 0 | 9 | 0 | 85.94 | 1 | 0 | 0.43 | 0.15 | 0.4 | 0.12 | 0.34 | 0.13 | 0.3 | 0.15 | 0.34 | 0.11 | 0.33 | 0.08 | 0.31 | 0.07 |
| 2294 | libggml-cpu.so - ops.cpp:8759-8881 [...] | ggml_compute_forward_flash_attn_ext | InBetween | 0.26 | 0.25 | 0.25 | 0.18 | 0.24 | 0.22 | 0.19 | 0.17 | 0.28 | 0.11 | 0.09 | 0.06 | 0.08 | 0.07 | 0.04 | 0.03 | 1.22 | 0.44 | 0.29 | 0.22 | 0.21 | 0.19 | 0.15 | 0.12 | 0.20 | 0.08 | 0.08 | 0.06 | 0.07 | 0.06 | 0.06 | 0.06 | 1.13 | 0.25 | 0.19 | 0.17 | 0.17 | 0.18 | 0.26 | 0.20 | 0.16 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.70 | 0.06 | 0.04 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 6 | 33 | 34 | 33 | 33 | 36 | 33 | 34 | 104.28 | 523.92 | 705.25 | 1004.58 | 863.98 | 947.25 | 1686.84 | 1690.50 | 18.59 | 12.3 | 2.43 | 1.6 | 9.79 | 1.21 | 2.31 | 2.29 | 1.89 | 1.84 | 1.83 | 1.93 | 1.72 | NA | NA | NA | NA | NA | 0.00 | 1 | 0 | 0.91 | 0.01 | 0.87 | 0.01 | 1.01 | -0 | 0.77 | 0.02 | 0.75 | 0.02 | 0.87 | 0.01 | 0.81 | 0.01 |
| 3150 | libggml-cpu.so - quants.c:298-355 [...] | quantize_row_q8_0 | Single | 0.32 | 0.64 | 0.86 | 0.79 | 0.86 | 0.78 | 0.83 | 0.94 | 0.07 | 0.02 | 0.02 | 0.02 | 0.02 | 0.01 | 0.01 | 0.01 | 0.07 | 0.02 | 0.02 | 0.02 | 0.02 | 0.01 | 0.01 | 0.01 | 0.24 | 0.20 | 0.26 | 0.23 | 0.25 | 0.22 | 0.28 | 0.33 | 0.24 | 0.20 | 0.26 | 0.23 | 0.25 | 0.22 | 0.28 | 0.33 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 34.78 | 515.34 | 516.93 | 712.19 | 696.81 | 888.69 | 816.24 | 807.90 | 60.7 | 29.66 | 1 | 1.31 | 2.68 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 2 | 0 | 0 | 0 | 100.00 | 1 | 0 | 1.2 | -0 | 0.9 | 0 | 1 | 0 | 0.92 | 0 | 1.04 | -0 | 0.82 | 0 | 0.71 | 0 |
| 1240 | libggml-cpu.so - vec.h:1084-1115 [...] | ggml_vec_swiglu_f32 | Single | 0.26 | 0.62 | 0.70 | 0.52 | 0.88 | 0.57 | 0.53 | 0.53 | 0.06 | 0.02 | 0.02 | 0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.06 | 0.02 | 0.02 | 0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.19 | 0.19 | 0.21 | 0.16 | 0.26 | 0.16 | 0.18 | 0.19 | 0.19 | 0.19 | 0.21 | 0.16 | 0.26 | 0.16 | 0.18 | 0.19 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4 | 2 | 3 | 3 | 7 | 7 | 7 | 7 | 160.72 | 1943.15 | 2262.88 | 3794.56 | 2384.88 | 4318.78 | 4474.13 | 5169.97 | 98 | 98.13 | 1 | 1 | 1 | 3.9 | 2 | 2.93 | 2.91 | 7 | 7 | 6.81 | 6.82 | 0.5 | 0 | 0 | 3 | 0 | 56.25 | 1 | 0 | 1.02 | -0 | 0.9 | 0 | 1.21 | -0 | 0.75 | 0 | 1.18 | -0 | 1.05 | -0 | 1.02 | -0 |
| 914 | libggml-cpu.so - binary-ops.cpp:18-32 [...] | ggml_compute_forward_mul | Innermost | 0.22 | 0.51 | 0.55 | 0.55 | 0.64 | 0.64 | 0.44 | 0.44 | 0.05 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.00 | 0.05 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.00 | 0.17 | 0.16 | 0.17 | 0.16 | 0.19 | 0.19 | 0.15 | 0.15 | 0.17 | 0.16 | 0.17 | 0.16 | 0.19 | 0.19 | 0.15 | 0.15 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3 | 7 | 3 | 7 | 7 | 7 | 7 | 7 | 4.80 | 62.83 | 82.42 | 92.64 | 81.37 | 98.99 | 133.16 | 157.09 | 0 | 6.25 | 1 | 1.06 | 16 | 3 | 6.78 | 2.91 | 7 | 6.65 | 7 | 6.56 | 7 | 0 | 3 | 0 | 0 | 0 | 100.00 | 1 | 0 | 1.03 | -0 | 0.97 | 0 | 1 | 0 | 0.82 | 0 | 0.89 | 0 | 1.02 | -0 | 1.06 | -0 |
| 826 | libggml-cpu.so - binary-ops.cpp:10-32 [...] | ggml_compute_forward_add_non_quantized | Innermost | 0.12 | 0.15 | 0.25 | 0.22 | 0.27 | 0.14 | 0.28 | 0.17 | 0.03 | 0.01 | 0.01 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.03 | 0.01 | 0.01 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.09 | 0.04 | 0.07 | 0.06 | 0.08 | 0.04 | 0.09 | 0.06 | 0.09 | 0.04 | 0.07 | 0.06 | 0.08 | 0.04 | 0.09 | 0.06 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4 | 2 | 2 | 1 | 3 | 2 | 1 | 1 | 8.30 | 168.14 | 156.09 | 265.35 | 204.04 | 459.23 | 253.08 | 457.92 | 0 | 6.25 | 1 | 1.06 | 16 | 3.6 | 1.8 | 1.87 | 1 | 2.67 | 1.78 | 1 | 1 | 0 | 3 | 0 | 0 | 0 | 100.00 | 1 | 0 | 2 | 0 | 1.25 | -0 | 1.53 | -0 | 1.1 | -0 | 2.21 | -0 | 1.04 | -0 | 1.65 | -0 |
| 3069 | exec - sampling.cpp:125-126 [...] | common_sampler::set_logits(llama_context*, int) | Single | 0.09 | 0.25 | 0.18 | 0.13 | 0.25 | 0.29 | 0.19 | 0.31 | 0.02 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.07 | 0.08 | 0.05 | 0.04 | 0.07 | 0.09 | 0.06 | 0.11 | 0.07 | 0.08 | 0.05 | 0.04 | 0.07 | 0.09 | 0.06 | 0.11 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 6.25 | 3 | 1 | 16 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 8 | 0 | 0 | 80.00 | 1 | 0 | 0.93 | 0 | 1.27 | -0 | 1.74 | -0 | 0.93 | 0 | 0.82 | 0 | 1.07 | -0 | 0.63 | 0 |