options

Loops Index

44 loops have been discarded from the report because their ratio ((Max Inclusive Time Over Threads * 100) / Max Thread Active Time) is lower than the threshold set by object_coverage_threshold (0.1%). It represents about 0.12% of the application. To include them, change the value of object_coverage_threshold in the experiment directory configuration file, then rerun the command with the additionnal parameter --force-static-analysis

Columns Filter

Level Max Thread Time / Walltime 1x8 (%) Max Thread Time / Walltime 1x64 (%) Max Thread Time / Walltime 1x96 (%) Max Thread Time / Walltime 1x128 (%) Max Thread Time / Walltime 1x160 (%) Max Thread Time / Walltime 1x192 (%) Exclusive Coverage 1x8 (%) Exclusive Coverage 1x64 (%) Exclusive Coverage 1x96 (%) Exclusive Coverage 1x128 (%) Exclusive Coverage 1x160 (%) Exclusive Coverage 1x192 (%) Inclusive Coverage 1x8 (%) Inclusive Coverage 1x64 (%) Inclusive Coverage 1x96 (%) Inclusive Coverage 1x128 (%) Inclusive Coverage 1x160 (%) Inclusive Coverage 1x192 (%) Max Exclusive Time Over Threads 1x8 (s) Max Exclusive Time Over Threads 1x64 (s) Max Exclusive Time Over Threads 1x96 (s) Max Exclusive Time Over Threads 1x128 (s) Max Exclusive Time Over Threads 1x160 (s) Max Exclusive Time Over Threads 1x192 (s) Max Inclusive Time Over Threads 1x8 (s) Max Inclusive Time Over Threads 1x64 (s) Max Inclusive Time Over Threads 1x96 (s) Max Inclusive Time Over Threads 1x128 (s) Max Inclusive Time Over Threads 1x160 (s) Max Inclusive Time Over Threads 1x192 (s) Exclusive Time w.r.t. Wall Time 1x8 (s) Exclusive Time w.r.t. Wall Time 1x64 (s) Exclusive Time w.r.t. Wall Time 1x96 (s) Exclusive Time w.r.t. Wall Time 1x128 (s) Exclusive Time w.r.t. Wall Time 1x160 (s) Exclusive Time w.r.t. Wall Time 1x192 (s) Inclusive Time w.r.t. Wall Time 1x8 (s) Inclusive Time w.r.t. Wall Time 1x64 (s) Inclusive Time w.r.t. Wall Time 1x96 (s) Inclusive Time w.r.t. Wall Time 1x128 (s) Inclusive Time w.r.t. Wall Time 1x160 (s) Inclusive Time w.r.t. Wall Time 1x192 (s) Nb Threads 1x8 Nb Threads 1x64 Nb Threads 1x96 Nb Threads 1x128 Nb Threads 1x160 Nb Threads 1x192 GFLOPS 1x8 GFLOPS 1x64 GFLOPS 1x96 GFLOPS 1x128 GFLOPS 1x160 GFLOPS 1x192 Vectorization Ratio (%) Vector Length Use (%) Speedup If No Scalar Integer Speedup If FP Vectorized Speedup If Fully Vectorized Speedup If Perfect Load Balancing 1x8 Speedup If Perfect Load Balancing 1x64 Speedup If Perfect Load Balancing 1x96 Speedup If Perfect Load Balancing 1x128 Speedup If Perfect Load Balancing 1x160 Speedup If Perfect Load Balancing 1x192 Stride 0 Stride 1 Stride n Stride Unknown Stride Indirect Array Access Efficiency (1x8) Efficiency (1x8) Potential Speed-Up (%) (1x64) Efficiency (1x64) Potential Speed-Up (%) (1x96) Efficiency (1x96) Potential Speed-Up (%) (1x128) Efficiency (1x128) Potential Speed-Up (%) (1x160) Efficiency (1x160) Potential Speed-Up (%) (1x192) Efficiency (1x192) Potential Speed-Up (%) Level Max Thread Time / Walltime Exclusive Coverage Inclusive Coverage Max Exclusive Time Over Threads Max Inclusive Time Over Threads Exclusive Time w.r.t. Wall Time Inclusive Time w.r.t. Wall Time Nb Threads GFLOPS Vectorization Ratio Vector Length Use Speedup If No Scalar Integer Speedup If FP Vectorized Speedup If Fully Vectorized Speedup If Perfect Load Balancing Stride 0 Stride 1 Stride n Stride Unknown Stride Indirect Array Access Efficiency Efficiency Potential Speed-Up
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6
Loop idSource LocationSource FunctionLevelMax Thread Time / Walltime 1x8 (%)Max Thread Time / Walltime 1x64 (%)Max Thread Time / Walltime 1x96 (%)Max Thread Time / Walltime 1x128 (%)Max Thread Time / Walltime 1x160 (%)Max Thread Time / Walltime 1x192 (%)Exclusive Coverage 1x8 (%)Exclusive Coverage 1x64 (%)Exclusive Coverage 1x96 (%)Exclusive Coverage 1x128 (%)Exclusive Coverage 1x160 (%)Exclusive Coverage 1x192 (%)Inclusive Coverage 1x8 (%)Inclusive Coverage 1x64 (%)Inclusive Coverage 1x96 (%)Inclusive Coverage 1x128 (%)Inclusive Coverage 1x160 (%)Inclusive Coverage 1x192 (%)Max Exclusive Time Over Threads 1x8 (s)Max Exclusive Time Over Threads 1x64 (s)Max Exclusive Time Over Threads 1x96 (s)Max Exclusive Time Over Threads 1x128 (s)Max Exclusive Time Over Threads 1x160 (s)Max Exclusive Time Over Threads 1x192 (s)Max Inclusive Time Over Threads 1x8 (s)Max Inclusive Time Over Threads 1x64 (s)Max Inclusive Time Over Threads 1x96 (s)Max Inclusive Time Over Threads 1x128 (s)Max Inclusive Time Over Threads 1x160 (s)Max Inclusive Time Over Threads 1x192 (s)Exclusive Time w.r.t. Wall Time 1x8 (s)Exclusive Time w.r.t. Wall Time 1x64 (s)Exclusive Time w.r.t. Wall Time 1x96 (s)Exclusive Time w.r.t. Wall Time 1x128 (s)Exclusive Time w.r.t. Wall Time 1x160 (s)Exclusive Time w.r.t. Wall Time 1x192 (s)Inclusive Time w.r.t. Wall Time 1x8 (s)Inclusive Time w.r.t. Wall Time 1x64 (s)Inclusive Time w.r.t. Wall Time 1x96 (s)Inclusive Time w.r.t. Wall Time 1x128 (s)Inclusive Time w.r.t. Wall Time 1x160 (s)Inclusive Time w.r.t. Wall Time 1x192 (s)Nb Threads 1x8Nb Threads 1x64Nb Threads 1x96Nb Threads 1x128Nb Threads 1x160Nb Threads 1x192GFLOPS 1x8GFLOPS 1x64GFLOPS 1x96GFLOPS 1x128GFLOPS 1x160GFLOPS 1x192Vectorization Ratio (%)Vector Length Use (%)Speedup If No Scalar IntegerSpeedup If FP VectorizedSpeedup If Fully VectorizedSpeedup If Perfect Load Balancing 1x8Speedup If Perfect Load Balancing 1x64Speedup If Perfect Load Balancing 1x96Speedup If Perfect Load Balancing 1x128Speedup If Perfect Load Balancing 1x160Speedup If Perfect Load Balancing 1x192Stride 0Stride 1Stride nStride UnknownStride IndirectArray Access Efficiency(1x8) Efficiency(1x8) Potential Speed-Up (%)(1x64) Efficiency(1x64) Potential Speed-Up (%)(1x96) Efficiency(1x96) Potential Speed-Up (%)(1x128) Efficiency(1x128) Potential Speed-Up (%)(1x160) Efficiency(1x160) Potential Speed-Up (%)(1x192) Efficiency(1x192) Potential Speed-Up (%)
5766libggml-cpu.so - quants.c:108-1042 [...]ggml_vec_dot_q8_0_q8_0Single75.0481.7383.3083.0082.3682.7586.8661.3062.2461.1460.9961.3886.8661.3062.2461.1460.9961.3837.9051.3756.7056.5256.7057.3037.9051.3756.7056.5256.7057.3036.7033.5937.3436.6537.0237.5236.7033.5937.3436.6537.0237.528649612816019253.3460.0554.2155.2854.7353.94NANANANANA1.031.541.531.551.551.54NANANANANA0.00100.1452.930.0857.140.0657.320.0557.960.0458.88
1779libggml-cpu.so - vec.cpp:311-316ggml_vec_dot_f16Single0.510.300.240.260.260.270.380.100.050.050.030.040.380.100.050.050.030.040.260.190.160.180.180.190.260.190.160.180.180.190.160.050.030.030.020.020.160.050.030.030.020.028323332343335.89189.26409.77557.26719.52701.5410066.67111.571.631.781.851.41.831.5102000100.00100.370.060.440.030.310.040.380.020.310.02
5300libggml-cpu.so - sgemm.cpp:138-1044 [...]_ZN12_GLOBAL__N_115tinyBLAS_Q0_AVXI10block_q8_0S1_fE7gemm4xNILi4EEEvllll.AInnermost0.320.190.180.180.190.200.280.110.110.110.110.110.280.110.110.110.110.110.160.120.120.120.130.140.160.120.120.120.130.140.120.060.070.070.070.070.120.060.070.070.070.0786495128156189119.90234.46210.51212.47216.36227.92NANANANANA1.361.951.761.821.932.11NANANANANA0.00100.240.090.140.10.110.10.090.10.070.1
4189libggml-cpu.so - vec.h:491-497ggml_compute_forward_flash_attn_extInnermost0.320.370.210.210.310.250.270.110.040.030.030.030.270.110.040.030.030.030.160.230.140.140.210.170.160.230.140.140.210.170.120.060.030.020.020.020.120.060.030.020.020.0283332343233157.26368.13853.701427.131495.161562.6310075111.451.391.981.872.082.151.5802000100.00100.240.080.380.030.40.020.290.020.260.02
5326libggml-cpu.so - sgemm.cpp:138-1044 [...]_ZN12_GLOBAL__N_115tinyBLAS_Q0_AVXI10block_q8_0S1_fE7gemm4xNILi2EEEvllll.AInnermost0.240.180.180.180.190.190.210.110.110.110.110.110.210.110.110.110.110.110.120.110.120.120.130.130.120.110.120.120.130.130.090.060.070.070.070.070.090.060.070.070.070.078649512815618982.29118.92107.66108.16101.93105.92NANANANANA1.351.841.821.861.861.96NANANANANA0.00100.180.090.110.10.090.10.060.110.060.1
901libggml-cpu.so - binary-ops.cpp:10-32 [...]ggml_compute_forward_add_non_quantizedInnermost0.510.270.280.100.130.130.080.000.000.000.000.000.080.000.000.000.000.000.260.170.190.070.090.090.260.170.190.070.090.090.030.000.000.000.000.000.030.000.000.000.000.002121124.0054.1554.73240.12251.02266.1106.2511.3316211.9111.803000100.00101.52-01.29-03.68-02.86-02.58-0
236libggml-cpu.so - ggml-cpu.c:1164-1198 [...]ggml_compute_forward_mul_matInBetween0.100.300.220.220.220.330.060.120.090.070.090.120.210.150.120.090.100.140.050.190.150.150.150.230.110.220.190.170.160.260.030.070.060.040.050.070.090.080.070.050.060.088649612816019292.7667.8961.3464.3353.6654.38011.14119.61.912.922.683.742.883.18NANANANANA0.00100.050.110.040.090.040.060.020.080.020.12
1793libggml-cpu.so - vec.h:1084-1116 [...]ggml_vec_swiglu_f32Single0.260.130.210.240.310.160.040.000.000.000.000.000.040.000.000.000.000.000.130.080.140.160.210.110.130.080.140.160.210.110.020.000.000.000.000.000.020.000.000.000.000.00777777295.653553.383376.793694.813722.567455.7010010011176.2276.5975.9203000100.00101.44-00.9200.7600.6100.990
×