Loop id | Source Location | Source Function | Level | Coverage run_0 (%) | Max Time Over Threads run_0 (s) | Time w.r.t. Wall Time run_0 (s) | Nb Threads run_0 | Vectorization Ratio (%) | Vectorization Efficiency (%) | Speedup If No Scalar Integer | Speedup If FP Vectorized | Speedup If Fully Vectorized | Speedup If Perfect Load Balancing run_0 | Stride 0 | Stride 1 | Stride n | Stride Unknown | Stride Indirect | Speedup If Data in L1 run_0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | convf32_avx512 - codelet.c:78-101 [...] | cache_block_conv_bp | Innermost | 94.69 | 7.67 | 7.67 | 1 | 58.62 | 63.58 | 1.7 | 1 | 1.25 | 1 | 1 | 0 | 3 | 1 | 6 | 0.96 |
3 | convf32_avx512 - codelet.c:77-101 [...] | cache_block_conv_bp | InBetween | 4.87 | 0.39 | 0.39 | 1 | 36.36 | 44.03 | 1.74 | 1 | 1.12 | 1 | 4 | 0 | 3 | 0 | 0 | NA |
2 | convf32_avx512 - codelet.c:76-101 [...] | cache_block_conv_bp | InBetween | 0.25 | 0.02 | 0.02 | 1 | 0 | 8.06 | 1 | 1 | 12.27 | 1 | 5 | 0 | 1.33 | 2.67 | 0 | NA |
0 | convf32_avx512 - codelet.c:75-101 [...] | cache_block_conv_bp | Outermost | 0.06 | 0 | 0 | 1 | 0 | 9.11 | 1 | 1 | 12.4 | 0 | 1 | 0 | 0 | 0 | 0 | NA |
6 | convf32_avx512 - driver.c:339-340 | main | Single | 0.06 | 0 | 0 | 1 | 0 | 10.94 | 1 | 1 | 16 | 0 | 1 | 0 | 0 | 0 | 0 | NA |