options

Loops Index

33 loops have been discarded from the report because their ratio ((Max Inclusive Time Over Threads * 100) / Max Thread Active Time) is lower than the threshold set by object_coverage_threshold (0.1%). It represents about 0.04% of the application. To include them, change the value of object_coverage_threshold in the experiment directory configuration file, then rerun the command with the additionnal parameter --force-static-analysis

Columns Filter

Level Max Thread Time / Walltime 1x6 (%) Max Thread Time / Walltime 1x72 (%) Max Thread Time / Walltime 1x96 (%) Max Thread Time / Walltime 1x120 (%) Max Thread Time / Walltime 1x128 (%) Max Thread Time / Walltime 1x144 (%) Max Thread Time / Walltime 1x168 (%) Max Thread Time / Walltime 1x192 (%) Exclusive Coverage 1x6 (%) Exclusive Coverage 1x72 (%) Exclusive Coverage 1x96 (%) Exclusive Coverage 1x120 (%) Exclusive Coverage 1x128 (%) Exclusive Coverage 1x144 (%) Exclusive Coverage 1x168 (%) Exclusive Coverage 1x192 (%) Inclusive Coverage 1x6 (%) Inclusive Coverage 1x72 (%) Inclusive Coverage 1x96 (%) Inclusive Coverage 1x120 (%) Inclusive Coverage 1x128 (%) Inclusive Coverage 1x144 (%) Inclusive Coverage 1x168 (%) Inclusive Coverage 1x192 (%) Max Exclusive Time Over Threads 1x6 (s) Max Exclusive Time Over Threads 1x72 (s) Max Exclusive Time Over Threads 1x96 (s) Max Exclusive Time Over Threads 1x120 (s) Max Exclusive Time Over Threads 1x128 (s) Max Exclusive Time Over Threads 1x144 (s) Max Exclusive Time Over Threads 1x168 (s) Max Exclusive Time Over Threads 1x192 (s) Max Inclusive Time Over Threads 1x6 (s) Max Inclusive Time Over Threads 1x72 (s) Max Inclusive Time Over Threads 1x96 (s) Max Inclusive Time Over Threads 1x120 (s) Max Inclusive Time Over Threads 1x128 (s) Max Inclusive Time Over Threads 1x144 (s) Max Inclusive Time Over Threads 1x168 (s) Max Inclusive Time Over Threads 1x192 (s) Exclusive Time w.r.t. Wall Time 1x6 (s) Exclusive Time w.r.t. Wall Time 1x72 (s) Exclusive Time w.r.t. Wall Time 1x96 (s) Exclusive Time w.r.t. Wall Time 1x120 (s) Exclusive Time w.r.t. Wall Time 1x128 (s) Exclusive Time w.r.t. Wall Time 1x144 (s) Exclusive Time w.r.t. Wall Time 1x168 (s) Exclusive Time w.r.t. Wall Time 1x192 (s) Inclusive Time w.r.t. Wall Time 1x6 (s) Inclusive Time w.r.t. Wall Time 1x72 (s) Inclusive Time w.r.t. Wall Time 1x96 (s) Inclusive Time w.r.t. Wall Time 1x120 (s) Inclusive Time w.r.t. Wall Time 1x128 (s) Inclusive Time w.r.t. Wall Time 1x144 (s) Inclusive Time w.r.t. Wall Time 1x168 (s) Inclusive Time w.r.t. Wall Time 1x192 (s) Nb Threads 1x6 Nb Threads 1x72 Nb Threads 1x96 Nb Threads 1x120 Nb Threads 1x128 Nb Threads 1x144 Nb Threads 1x168 Nb Threads 1x192 GFLOPS 1x6 GFLOPS 1x72 GFLOPS 1x96 GFLOPS 1x120 GFLOPS 1x128 GFLOPS 1x144 GFLOPS 1x168 GFLOPS 1x192 Vectorization Ratio (%) Vector Length Use (%) Speedup If No Scalar Integer Speedup If FP Vectorized Speedup If Fully Vectorized Speedup If Perfect Load Balancing 1x6 Speedup If Perfect Load Balancing 1x72 Speedup If Perfect Load Balancing 1x96 Speedup If Perfect Load Balancing 1x120 Speedup If Perfect Load Balancing 1x128 Speedup If Perfect Load Balancing 1x144 Speedup If Perfect Load Balancing 1x168 Speedup If Perfect Load Balancing 1x192 Stride 0 Stride 1 Stride n Stride Unknown Stride Indirect Array Access Efficiency (1x6) Efficiency (1x6) Potential Speed-Up (%) (1x72) Efficiency (1x72) Potential Speed-Up (%) (1x96) Efficiency (1x96) Potential Speed-Up (%) (1x120) Efficiency (1x120) Potential Speed-Up (%) (1x128) Efficiency (1x128) Potential Speed-Up (%) (1x144) Efficiency (1x144) Potential Speed-Up (%) (1x168) Efficiency (1x168) Potential Speed-Up (%) (1x192) Efficiency (1x192) Potential Speed-Up (%) Level Max Thread Time / Walltime Exclusive Coverage Inclusive Coverage Max Exclusive Time Over Threads Max Inclusive Time Over Threads Exclusive Time w.r.t. Wall Time Inclusive Time w.r.t. Wall Time Nb Threads GFLOPS Vectorization Ratio Vector Length Use Speedup If No Scalar Integer Speedup If FP Vectorized Speedup If Fully Vectorized Speedup If Perfect Load Balancing Stride 0 Stride 1 Stride n Stride Unknown Stride Indirect Array Access Efficiency Efficiency Potential Speed-Up
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8
Loop idSource LocationSource FunctionLevelMax Thread Time / Walltime 1x6 (%)Max Thread Time / Walltime 1x72 (%)Max Thread Time / Walltime 1x96 (%)Max Thread Time / Walltime 1x120 (%)Max Thread Time / Walltime 1x128 (%)Max Thread Time / Walltime 1x144 (%)Max Thread Time / Walltime 1x168 (%)Max Thread Time / Walltime 1x192 (%)Exclusive Coverage 1x6 (%)Exclusive Coverage 1x72 (%)Exclusive Coverage 1x96 (%)Exclusive Coverage 1x120 (%)Exclusive Coverage 1x128 (%)Exclusive Coverage 1x144 (%)Exclusive Coverage 1x168 (%)Exclusive Coverage 1x192 (%)Inclusive Coverage 1x6 (%)Inclusive Coverage 1x72 (%)Inclusive Coverage 1x96 (%)Inclusive Coverage 1x120 (%)Inclusive Coverage 1x128 (%)Inclusive Coverage 1x144 (%)Inclusive Coverage 1x168 (%)Inclusive Coverage 1x192 (%)Max Exclusive Time Over Threads 1x6 (s)Max Exclusive Time Over Threads 1x72 (s)Max Exclusive Time Over Threads 1x96 (s)Max Exclusive Time Over Threads 1x120 (s)Max Exclusive Time Over Threads 1x128 (s)Max Exclusive Time Over Threads 1x144 (s)Max Exclusive Time Over Threads 1x168 (s)Max Exclusive Time Over Threads 1x192 (s)Max Inclusive Time Over Threads 1x6 (s)Max Inclusive Time Over Threads 1x72 (s)Max Inclusive Time Over Threads 1x96 (s)Max Inclusive Time Over Threads 1x120 (s)Max Inclusive Time Over Threads 1x128 (s)Max Inclusive Time Over Threads 1x144 (s)Max Inclusive Time Over Threads 1x168 (s)Max Inclusive Time Over Threads 1x192 (s)Exclusive Time w.r.t. Wall Time 1x6 (s)Exclusive Time w.r.t. Wall Time 1x72 (s)Exclusive Time w.r.t. Wall Time 1x96 (s)Exclusive Time w.r.t. Wall Time 1x120 (s)Exclusive Time w.r.t. Wall Time 1x128 (s)Exclusive Time w.r.t. Wall Time 1x144 (s)Exclusive Time w.r.t. Wall Time 1x168 (s)Exclusive Time w.r.t. Wall Time 1x192 (s)Inclusive Time w.r.t. Wall Time 1x6 (s)Inclusive Time w.r.t. Wall Time 1x72 (s)Inclusive Time w.r.t. Wall Time 1x96 (s)Inclusive Time w.r.t. Wall Time 1x120 (s)Inclusive Time w.r.t. Wall Time 1x128 (s)Inclusive Time w.r.t. Wall Time 1x144 (s)Inclusive Time w.r.t. Wall Time 1x168 (s)Inclusive Time w.r.t. Wall Time 1x192 (s)Nb Threads 1x6Nb Threads 1x72Nb Threads 1x96Nb Threads 1x120Nb Threads 1x128Nb Threads 1x144Nb Threads 1x168Nb Threads 1x192GFLOPS 1x6GFLOPS 1x72GFLOPS 1x96GFLOPS 1x120GFLOPS 1x128GFLOPS 1x144GFLOPS 1x168GFLOPS 1x192Vectorization Ratio (%)Vector Length Use (%)Speedup If No Scalar IntegerSpeedup If FP VectorizedSpeedup If Fully VectorizedSpeedup If Perfect Load Balancing 1x6Speedup If Perfect Load Balancing 1x72Speedup If Perfect Load Balancing 1x96Speedup If Perfect Load Balancing 1x120Speedup If Perfect Load Balancing 1x128Speedup If Perfect Load Balancing 1x144Speedup If Perfect Load Balancing 1x168Speedup If Perfect Load Balancing 1x192Stride 0Stride 1Stride nStride UnknownStride IndirectArray Access Efficiency(1x6) Efficiency(1x6) Potential Speed-Up (%)(1x72) Efficiency(1x72) Potential Speed-Up (%)(1x96) Efficiency(1x96) Potential Speed-Up (%)(1x120) Efficiency(1x120) Potential Speed-Up (%)(1x128) Efficiency(1x128) Potential Speed-Up (%)(1x144) Efficiency(1x144) Potential Speed-Up (%)(1x168) Efficiency(1x168) Potential Speed-Up (%)(1x192) Efficiency(1x192) Potential Speed-Up (%)
548libggml-cpu.so - mmq.cpp:1570-1597 [...]ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::$_1::operator()(int, int) const::{lambda()#1}::operator()() constSingle34.2034.1830.6235.1132.5032.2835.6137.3922.7123.3224.8124.6122.9524.3524.1624.8922.7123.3224.8124.6122.9524.3524.1624.8912.1111.9710.5312.3611.2511.2312.5813.3412.1111.9710.5312.3611.2511.2312.5813.344.054.064.204.413.924.204.254.554.054.064.204.413.924.204.254.5518318318318318318318318388.7688.5285.4981.4491.6885.4784.4978.88NANANANANA2.942.872.442.722.792.592.862.83NANANANANA0.001010.060.960.920.922.021.0300.960.910.951.160.892.77
713libggml-cpu.so - mmq.cpp:520-2488 [...]ggml_backend_amx_mul_mat(ggml_compute_params const*, ggml_tensor*)::$_2::operator()(int, int) const::{lambda()#1}::operator()() constInBetween0.130.130.160.130.130.130.110.110.100.090.100.090.100.090.080.090.110.100.110.100.110.100.090.090.040.050.050.040.050.050.040.040.050.050.060.050.050.050.050.050.020.020.020.020.020.020.010.020.020.020.020.020.020.020.020.02150150148149148149149149283.33300.14280.83301.08272.32305.97333.06313.19NANANANANA2.132.222.52.212.032.32.162.02NANANANANA0.00101.0500.9901.0600.9801.101.1701.090
1232libggml-cpu.so - vec.cpp:311-316ggml_vec_dot_f16Single0.300.410.330.470.330.460.620.640.060.090.070.120.070.100.130.120.060.090.070.120.070.100.130.120.110.140.120.170.120.160.220.230.110.140.120.170.120.160.220.230.010.020.010.020.010.020.020.020.010.020.010.020.010.020.020.0232323232323332322190.041541.041843.411058.522007.161309.84947.92965.18NANANANANA1.571.651.571.331.631.671.631.72NANANANANA0.00100.770.020.920.010.550.050.9600.690.030.50.060.510.06
2304libggml-cpu.so - vec.h:491-497ggml_compute_forward_flash_attn_extInnermost0.270.370.310.410.360.420.580.520.060.080.070.100.060.090.110.110.060.080.070.100.060.090.110.110.100.130.100.140.130.150.200.190.100.130.100.140.130.150.200.190.010.010.010.020.010.020.020.020.010.010.010.020.010.020.020.023233323332323332720.81812.54942.54879.23810.42868.95846.71830.17NANANANANA1.611.661.611.42.011.511.871.49NANANANANA0.00100.740.020.920.010.560.040.9600.630.040.530.050.480.06
2015libggml-cpu.so - ops.cpp:6220-6245 [...]ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool)Innermost0.100.090.160.110.130.100.100.170.040.040.040.040.050.040.050.050.040.040.040.040.050.040.050.050.040.030.050.040.040.040.040.060.040.030.050.040.040.040.040.060.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.01192192192192192192192192482.69559.49523.88540.25486.45533.24475.37413.0206.2511.1244.654.527.765.955.94.834.456.6612000100.00101.1501.08-01.140101.06-00.9700.850.01
2294libggml-cpu.so - ops.cpp:8759-8881 [...]ggml_compute_forward_flash_attn_extInBetween0.170.200.170.170.220.200.240.140.040.040.040.030.040.040.040.030.090.110.100.130.100.130.150.150.060.070.060.060.070.070.090.050.150.180.140.190.160.200.280.210.010.010.010.010.010.010.010.010.020.020.020.020.020.020.030.0334323533323232331302.841654.931643.701872.041381.671862.231687.781936.6118.5912.32.431.69.791.721.931.851.792.021.942.191.53NANANANANA0.00101.03-01.06-01.09-01.01-01.04-00.9701.12-0
399libggml-cpu.so - mmq.cpp:303-1392 [...]void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void*, block_q8_0 const*, int,...Innermost0.680.190.160.170.170.160.130.140.040.080.080.080.110.090.090.110.040.080.080.080.110.090.090.110.240.060.060.060.060.060.050.050.240.060.060.060.060.060.050.050.010.010.010.010.020.010.020.020.010.010.010.010.020.010.020.02669901121281251461720.000.000.000.000.000.000.000.0090.9138.761.4711.411.231.681.922.42.182.452.292.2623009085.94100.440.050.460.040.420.050.340.070.420.050.420.050.310.08
117libggml-cpu.so - ggml-cpu.c:533-2891 [...]ggml_graph_compute_threadSingle0.080.070.070.060.070.070.060.070.030.020.030.030.030.030.030.030.030.020.030.030.030.030.030.030.030.020.030.020.020.030.020.030.030.020.030.020.020.030.020.030.000.000.010.000.010.000.000.010.000.000.010.000.010.000.000.0111811111711911911311511911.567.408.4618.744.4612.7713.777.0609.581114.774.123.472.952.73.073.162.613.08NANANANANA0.00101.09-00.8800.9900.900.980100.910
3150libggml-cpu.so - quants.c:298-355 [...]quantize_row_q8_0Single0.860.810.710.940.720.650.790.880.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.310.280.250.330.250.220.280.310.310.280.250.330.250.220.280.310.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.0011111111857.46927.611078.10803.621056.651181.65951.08845.6760.729.6611.312.681111111102000100.00101.08-01.26-00.9401.24-01.38-01.11-00.990
2292libggml-cpu.so - ops.cpp:8885-8886 [...]ggml_compute_forward_flash_attn_extInnermost0.060.060.060.070.060.090.100.060.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.020.020.020.030.020.030.040.020.020.020.020.030.020.030.040.020.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.0027212431272825250.000.000.820.000.800.000.000.0006.251.331162.122.212.091.992.31.912.462.270200166.67101.36-01.12-00.6601.1-00.590.010.7301.18-0
914libggml-cpu.so - binary-ops.cpp:18-32 [...]ggml_compute_forward_mulInnermost0.650.390.520.670.640.680.510.390.010.000.010.010.010.010.010.000.010.000.010.010.010.010.010.000.230.140.180.240.220.230.180.140.230.140.180.240.220.230.180.140.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.0077777777101.64178.15134.40103.06109.55103.33135.12173.7806.2511.06166.85777777703000100.00101.76-01.32-01.01-01.08-01.02-01.33-01.71-0
1240libggml-cpu.so - vec.h:1084-1115 [...]ggml_vec_swiglu_f32Single0.450.470.360.500.510.430.350.430.010.010.000.010.010.000.000.000.010.010.000.010.010.000.000.000.160.170.130.180.170.150.130.160.160.170.130.180.170.150.130.160.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.00777777775668.055557.407240.475574.805577.506530.587849.386134.259898.131116.596.66.4877776.780.5003056.25100.9801.28-00.9900.9801.15-01.38-01.08-0
3069exec - sampling.cpp:125-126 [...]common_sampler::set_logits(llama_context*, int)Single0.250.270.290.260.200.300.160.220.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.090.100.100.090.070.100.060.080.090.100.100.090.070.100.060.080.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.00111111110.000.000.000.000.000.000.000.0006.253116111111111180080.00100.9600.9101.01-01.3-00.8701.67-01.15-0
826libggml-cpu.so - binary-ops.cpp:10-32 [...]ggml_compute_forward_add_non_quantizedInnermost0.140.240.220.270.290.190.270.320.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.050.090.070.090.100.060.090.120.050.090.070.090.100.060.090.120.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.0011111111539.03320.53363.98287.69273.01421.53288.88238.7206.2511.06161111111103000100.00100.5900.6800.5300.5100.7800.5400.440
×