- r_1 - engine_NEON1M11-0001_o2_m26_ifx_full_30loops/ - 30 analyzed loop(s)
- Loop 255174 - engine_linux64_intel_ifx_impi
- Loop 18566 - engine_linux64_intel_ifx_impi
- Loop 39475 - engine_linux64_intel_ifx_impi
- Loop 37916 - engine_linux64_intel_ifx_impi
- Loop 256165 - engine_linux64_intel_ifx_impi
- Loop 19710 - engine_linux64_intel_ifx_impi
- Loop 38054 - engine_linux64_intel_ifx_impi
- Loop 121792 - engine_linux64_intel_ifx_impi
- Loop 19918 - engine_linux64_intel_ifx_impi
- Loop 167135 - engine_linux64_intel_ifx_impi
- Loop 39361 - engine_linux64_intel_ifx_impi
- Loop 16649 - engine_linux64_intel_ifx_impi
- Loop 129166 - engine_linux64_intel_ifx_impi
- Loop 16652 - engine_linux64_intel_ifx_impi
- Loop 38910 - engine_linux64_intel_ifx_impi
- Loop 19924 - engine_linux64_intel_ifx_impi
- Loop 38255 - engine_linux64_intel_ifx_impi
- Loop 129964 - engine_linux64_intel_ifx_impi
- Loop 121790 - engine_linux64_intel_ifx_impi
- Loop 24558 - engine_linux64_intel_ifx_impi
- Loop 167621 - engine_linux64_intel_ifx_impi
- Loop 39025 - engine_linux64_intel_ifx_impi
- Loop 256166 - engine_linux64_intel_ifx_impi
- Loop 38251 - engine_linux64_intel_ifx_impi
- Loop 18565 - engine_linux64_intel_ifx_impi
- Loop 167340 - engine_linux64_intel_ifx_impi
- Loop 37997 - engine_linux64_intel_ifx_impi
- Loop 121788 - engine_linux64_intel_ifx_impi
- Loop 129946 - engine_linux64_intel_ifx_impi
- Loop 38932 - engine_linux64_intel_ifx_impi
- r_2 - engine_NEON1M11-0001_o2_m26_ifort_full_30loops/ - 30 analyzed loop(s)
- Loop 15282 - engine_linux64_intel_impi
- Loop 193162 - engine_linux64_intel_impi
- Loop 30046 - engine_linux64_intel_impi
- Loop 28971 - engine_linux64_intel_impi
- Loop 97970 - engine_linux64_intel_impi
- Loop 15758 - engine_linux64_intel_impi
- Loop 97971 - engine_linux64_intel_impi
- Loop 98506 - engine_linux64_intel_impi
- Loop 92421 - engine_linux64_intel_impi
- Loop 15966 - engine_linux64_intel_impi
- Loop 29120 - engine_linux64_intel_impi
- Loop 97948 - engine_linux64_intel_impi
- Loop 129229 - engine_linux64_intel_impi
- Loop 97950 - engine_linux64_intel_impi
- Loop 14003 - engine_linux64_intel_impi
- Loop 15970 - engine_linux64_intel_impi
- Loop 14012 - engine_linux64_intel_impi
- Loop 29898 - engine_linux64_intel_impi
- Loop 29269 - engine_linux64_intel_impi
- Loop 92419 - engine_linux64_intel_impi
- Loop 29752 - engine_linux64_intel_impi
- Loop 19322 - engine_linux64_intel_impi
- Loop 98529 - engine_linux64_intel_impi
- Loop 29265 - engine_linux64_intel_impi
- Loop 129922 - engine_linux64_intel_impi
- Loop 92417 - engine_linux64_intel_impi
- Loop 194156 - engine_linux64_intel_impi
- Loop 129257 - engine_linux64_intel_impi
- Loop 129681 - engine_linux64_intel_impi
- Loop 29660 - engine_linux64_intel_impi
| Analysis | Count | Percentage | Weighted Count |
| ▼Loop Computation Issues– | 85 | | |
| ○Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 48 | 80.00 | 0.69 |
| ○Presence of a large number of scalar integer instructions | 17 | 28.33 | 0.25 |
| ○Presence of expensive FP instructions | 16 | 26.67 | 0.16 |
| ○Large loop body over microp cache size | 2 | 3.33 | 0.03 |
| ○Bottleneck in the front-end | 2 | 3.33 | 0.03 |
| ▼Control Flow Issues– | 26 | | |
| ○Presence of 2 to 4 paths | 15 | 25.00 | 0.18 |
| ○Presence of calls | 6 | 10.00 | 0.24 |
| ○Non-innermost loop | 3 | 5.00 | 0.03 |
| ○Presence of more than 4 paths | 2 | 3.33 | 0.03 |
| ▼Data Access Issues– | 97 | | |
| ○More than 10% of the vector loads instructions are unaligned | 30 | 50.00 | 0.31 |
| ○More than 20% of the loads are accessing the stack | 23 | 38.33 | 0.45 |
| ○Presence of indirect access | 15 | 25.00 | 0.21 |
| ○Presence of constant non-unit stride data access | 12 | 20.00 | 0.19 |
| ○Presence of special instructions executing on a single port | 9 | 15.00 | 0.12 |
| ○Presence of expensive instructions: scatter/gather | 8 | 13.33 | 0.09 |
| ▼Vectorization Roadblocks– | 62 | | |
| ○Presence of indirect access | 15 | 25.00 | 0.21 |
| ○Presence of 2 to 4 paths | 15 | 25.00 | 0.18 |
| ○Presence of constant non-unit stride data access | 12 | 20.00 | 0.19 |
| ○Presence of calls | 6 | 10.00 | 0.24 |
| ○Presence of more than 4 paths | 6 | 10.00 | 0.24 |
| ○ERROR | 5 | 8.33 | 0.23 |
| ○Non-innermost loop | 3 | 5.00 | 0.03 |
| ▼Inefficient Vectorization– | 27 | | |
| ○Use of masked instructions | 10 | 16.67 | 0.10 |
| ○Presence of special instructions executing on a single port | 9 | 15.00 | 0.12 |
| ○Presence of expensive instructions: scatter/gather | 8 | 13.33 | 0.09 |
| Analysis | r_1 | r_2 |
| Loop Computation Issues | Presence of expensive FP instructions | 8 | 8 |
|---|
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 24 | 24 |
| Large loop body over microp cache size | 1 | 1 |
| Presence of a large number of scalar integer instructions | 6 | 11 |
| Bottleneck in the front-end | 1 | 1 |
| Control Flow Issues | Presence of calls | 2 | 4 |
|---|
| Presence of 2 to 4 paths | 8 | 7 |
| Presence of more than 4 paths | 1 | 1 |
| Non-innermost loop | 2 | 1 |
| Data Access Issues | Presence of constant non-unit stride data access | 5 | 7 |
|---|
| Presence of indirect access | 7 | 8 |
| More than 10% of the vector loads instructions are unaligned | 17 | 13 |
| Presence of expensive instructions: scatter/gather | 3 | 5 |
| Presence of special instructions executing on a single port | 8 | 1 |
| More than 20% of the loads are accessing the stack | 9 | 14 |
| Vectorization Roadblocks | Presence of calls | 2 | 4 |
|---|
| Presence of 2 to 4 paths | 8 | 7 |
| Presence of more than 4 paths | 3 | 3 |
| Non-innermost loop | 2 | 1 |
| Presence of constant non-unit stride data access | 5 | 7 |
| Presence of indirect access | 7 | 8 |
| ERROR | 3 | 2 |
| Inefficient Vectorization | Presence of expensive instructions: scatter/gather | 3 | 5 |
|---|
| Presence of special instructions executing on a single port | 8 | 1 |
| Use of masked instructions | 5 | 5 |