- r_1 - engine_NEON1M11-0001_o2_m26_ifx_full_30loops/ - 30 analyzed loop(s)
- Loop 255174 - engine_linux64_intel_ifx_impi
- Loop 18566 - engine_linux64_intel_ifx_impi
- Loop 39475 - engine_linux64_intel_ifx_impi
- Loop 37916 - engine_linux64_intel_ifx_impi
- Loop 256165 - engine_linux64_intel_ifx_impi
- Loop 19710 - engine_linux64_intel_ifx_impi
- Loop 38054 - engine_linux64_intel_ifx_impi
- Loop 121792 - engine_linux64_intel_ifx_impi
- Loop 19918 - engine_linux64_intel_ifx_impi
- Loop 167135 - engine_linux64_intel_ifx_impi
- Loop 39361 - engine_linux64_intel_ifx_impi
- Loop 16649 - engine_linux64_intel_ifx_impi
- Loop 129166 - engine_linux64_intel_ifx_impi
- Loop 16652 - engine_linux64_intel_ifx_impi
- Loop 38910 - engine_linux64_intel_ifx_impi
- Loop 19924 - engine_linux64_intel_ifx_impi
- Loop 38255 - engine_linux64_intel_ifx_impi
- Loop 129964 - engine_linux64_intel_ifx_impi
- Loop 121790 - engine_linux64_intel_ifx_impi
- Loop 24558 - engine_linux64_intel_ifx_impi
- Loop 167621 - engine_linux64_intel_ifx_impi
- Loop 39025 - engine_linux64_intel_ifx_impi
- Loop 256166 - engine_linux64_intel_ifx_impi
- Loop 38251 - engine_linux64_intel_ifx_impi
- Loop 18565 - engine_linux64_intel_ifx_impi
- Loop 167340 - engine_linux64_intel_ifx_impi
- Loop 37997 - engine_linux64_intel_ifx_impi
- Loop 121788 - engine_linux64_intel_ifx_impi
- Loop 129946 - engine_linux64_intel_ifx_impi
- Loop 38932 - engine_linux64_intel_ifx_impi
- r_2 - engine_NEON1M11-0001_o2_m26_ifort_full_30loops/ - 30 analyzed loop(s)
- Loop 15282 - engine_linux64_intel_impi
- Loop 193162 - engine_linux64_intel_impi
- Loop 30046 - engine_linux64_intel_impi
- Loop 28971 - engine_linux64_intel_impi
- Loop 97970 - engine_linux64_intel_impi
- Loop 15758 - engine_linux64_intel_impi
- Loop 97971 - engine_linux64_intel_impi
- Loop 98506 - engine_linux64_intel_impi
- Loop 92421 - engine_linux64_intel_impi
- Loop 15966 - engine_linux64_intel_impi
- Loop 29120 - engine_linux64_intel_impi
- Loop 97948 - engine_linux64_intel_impi
- Loop 129229 - engine_linux64_intel_impi
- Loop 97950 - engine_linux64_intel_impi
- Loop 14003 - engine_linux64_intel_impi
- Loop 15970 - engine_linux64_intel_impi
- Loop 14012 - engine_linux64_intel_impi
- Loop 29898 - engine_linux64_intel_impi
- Loop 29269 - engine_linux64_intel_impi
- Loop 92419 - engine_linux64_intel_impi
- Loop 29752 - engine_linux64_intel_impi
- Loop 19322 - engine_linux64_intel_impi
- Loop 98529 - engine_linux64_intel_impi
- Loop 29265 - engine_linux64_intel_impi
- Loop 129922 - engine_linux64_intel_impi
- Loop 92417 - engine_linux64_intel_impi
- Loop 194156 - engine_linux64_intel_impi
- Loop 129257 - engine_linux64_intel_impi
- Loop 129681 - engine_linux64_intel_impi
- Loop 29660 - engine_linux64_intel_impi
Analysis | Count | Percentage | Weighted Count |
▼Loop Computation Issues– | 85 | | |
○Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 48 | 80.00 | 0.69 |
○Presence of a large number of scalar integer instructions | 17 | 28.33 | 0.25 |
○Presence of expensive FP instructions | 16 | 26.67 | 0.16 |
○Large loop body over microp cache size | 2 | 3.33 | 0.03 |
○Bottleneck in the front-end | 2 | 3.33 | 0.03 |
▼Control Flow Issues– | 26 | | |
○Presence of 2 to 4 paths | 15 | 25.00 | 0.18 |
○Presence of calls | 6 | 10.00 | 0.24 |
○Non-innermost loop | 3 | 5.00 | 0.03 |
○Presence of more than 4 paths | 2 | 3.33 | 0.03 |
▼Data Access Issues– | 97 | | |
○More than 10% of the vector loads instructions are unaligned | 30 | 50.00 | 0.31 |
○More than 20% of the loads are accessing the stack | 23 | 38.33 | 0.45 |
○Presence of indirect access | 15 | 25.00 | 0.21 |
○Presence of constant non-unit stride data access | 12 | 20.00 | 0.19 |
○Presence of special instructions executing on a single port | 9 | 15.00 | 0.12 |
○Presence of expensive instructions: scatter/gather | 8 | 13.33 | 0.09 |
▼Vectorization Roadblocks– | 62 | | |
○Presence of indirect access | 15 | 25.00 | 0.21 |
○Presence of 2 to 4 paths | 15 | 25.00 | 0.18 |
○Presence of constant non-unit stride data access | 12 | 20.00 | 0.19 |
○Presence of calls | 6 | 10.00 | 0.24 |
○Presence of more than 4 paths | 6 | 10.00 | 0.24 |
○ERROR | 5 | 8.33 | 0.23 |
○Non-innermost loop | 3 | 5.00 | 0.03 |
▼Inefficient Vectorization– | 27 | | |
○Use of masked instructions | 10 | 16.67 | 0.10 |
○Presence of special instructions executing on a single port | 9 | 15.00 | 0.12 |
○Presence of expensive instructions: scatter/gather | 8 | 13.33 | 0.09 |
Analysis | r_1 | r_2 |
Loop Computation Issues | Presence of expensive FP instructions | 8 | 8 |
---|
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 24 | 24 |
Large loop body over microp cache size | 1 | 1 |
Presence of a large number of scalar integer instructions | 6 | 11 |
Bottleneck in the front-end | 1 | 1 |
Control Flow Issues | Presence of calls | 2 | 4 |
---|
Presence of 2 to 4 paths | 8 | 7 |
Presence of more than 4 paths | 1 | 1 |
Non-innermost loop | 2 | 1 |
Data Access Issues | Presence of constant non-unit stride data access | 5 | 7 |
---|
Presence of indirect access | 7 | 8 |
More than 10% of the vector loads instructions are unaligned | 17 | 13 |
Presence of expensive instructions: scatter/gather | 3 | 5 |
Presence of special instructions executing on a single port | 8 | 1 |
More than 20% of the loads are accessing the stack | 9 | 14 |
Vectorization Roadblocks | Presence of calls | 2 | 4 |
---|
Presence of 2 to 4 paths | 8 | 7 |
Presence of more than 4 paths | 3 | 3 |
Non-innermost loop | 2 | 1 |
Presence of constant non-unit stride data access | 5 | 7 |
Presence of indirect access | 7 | 8 |
ERROR | 3 | 2 |
Inefficient Vectorization | Presence of expensive instructions: scatter/gather | 3 | 5 |
---|
Presence of special instructions executing on a single port | 8 | 1 |
Use of masked instructions | 5 | 5 |