- r_1 - engine_NEON1M11-0001_o2_m26_ifx_full/ - 10 analyzed loop(s)
- Loop 255174 - engine_linux64_intel_ifx_impi
- Loop 18566 - engine_linux64_intel_ifx_impi
- Loop 39475 - engine_linux64_intel_ifx_impi
- Loop 37916 - engine_linux64_intel_ifx_impi
- Loop 256165 - engine_linux64_intel_ifx_impi
- Loop 19710 - engine_linux64_intel_ifx_impi
- Loop 38054 - engine_linux64_intel_ifx_impi
- Loop 121792 - engine_linux64_intel_ifx_impi
- Loop 19918 - engine_linux64_intel_ifx_impi
- Loop 167135 - engine_linux64_intel_ifx_impi
- r_2 - engine_NEON1M11-0001_o2_m26_ifort_full - 10 analyzed loop(s)
- Loop 15282 - engine_linux64_intel_impi
- Loop 193162 - engine_linux64_intel_impi
- Loop 30046 - engine_linux64_intel_impi
- Loop 28971 - engine_linux64_intel_impi
- Loop 97970 - engine_linux64_intel_impi
- Loop 15758 - engine_linux64_intel_impi
- Loop 97971 - engine_linux64_intel_impi
- Loop 98506 - engine_linux64_intel_impi
- Loop 92421 - engine_linux64_intel_impi
- Loop 15966 - engine_linux64_intel_impi
Analysis | Count | Percentage | Weighted Count |
▼Loop Computation Issues– | 20 | | |
○Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 11 | 55.00 | 0.35 |
○Presence of a large number of scalar integer instructions | 5 | 25.00 | 0.13 |
○Presence of expensive FP instructions | 2 | 10.00 | 0.03 |
○Large loop body over microp cache size | 1 | 5.00 | 0.01 |
○Bottleneck in the front-end | 1 | 5.00 | 0.01 |
▼Control Flow Issues– | 8 | | |
○Presence of calls | 3 | 15.00 | 0.20 |
○Presence of 2 to 4 paths | 3 | 15.00 | 0.05 |
○Presence of more than 4 paths | 1 | 5.00 | 0.02 |
○Non-innermost loop | 1 | 5.00 | 0.01 |
▼Data Access Issues– | 22 | | |
○More than 20% of the loads are accessing the stack | 7 | 35.00 | 0.30 |
○Presence of indirect access | 4 | 20.00 | 0.12 |
○Presence of constant non-unit stride data access | 4 | 20.00 | 0.13 |
○Presence of special instructions executing on a single port | 3 | 15.00 | 0.06 |
○More than 10% of the vector loads instructions are unaligned | 3 | 15.00 | 0.06 |
○Presence of expensive instructions: scatter/gather | 1 | 5.00 | 0.03 |
▼Vectorization Roadblocks– | 19 | | |
○Presence of more than 4 paths | 4 | 20.00 | 0.22 |
○Presence of constant non-unit stride data access | 4 | 20.00 | 0.13 |
○Presence of indirect access | 4 | 20.00 | 0.12 |
○Presence of 2 to 4 paths | 3 | 15.00 | 0.05 |
○Presence of calls | 3 | 15.00 | 0.20 |
○Non-innermost loop | 1 | 5.00 | 0.01 |
▼Inefficient Vectorization– | 5 | | |
○Presence of special instructions executing on a single port | 3 | 15.00 | 0.06 |
○Use of masked instructions | 1 | 5.00 | 0.01 |
○Presence of expensive instructions: scatter/gather | 1 | 5.00 | 0.03 |
Analysis | r_1 | r_2 |
Loop Computation Issues | Presence of expensive FP instructions | 1 | 1 |
---|
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 5 | 6 |
Large loop body over microp cache size | 1 | 0 |
Presence of a large number of scalar integer instructions | 2 | 3 |
Bottleneck in the front-end | 1 | 0 |
Control Flow Issues | Presence of calls | 2 | 1 |
---|
Presence of 2 to 4 paths | 2 | 1 |
Presence of more than 4 paths | 0 | 1 |
Non-innermost loop | 1 | 0 |
Data Access Issues | Presence of constant non-unit stride data access | 2 | 2 |
---|
Presence of indirect access | 2 | 2 |
More than 10% of the vector loads instructions are unaligned | 3 | 0 |
Presence of expensive instructions: scatter/gather | 0 | 1 |
Presence of special instructions executing on a single port | 3 | 0 |
More than 20% of the loads are accessing the stack | 3 | 4 |
Vectorization Roadblocks | Presence of calls | 2 | 1 |
---|
Presence of 2 to 4 paths | 2 | 1 |
Presence of more than 4 paths | 2 | 2 |
Non-innermost loop | 1 | 0 |
Presence of constant non-unit stride data access | 2 | 2 |
Presence of indirect access | 2 | 2 |
Inefficient Vectorization | Presence of expensive instructions: scatter/gather | 0 | 1 |
---|
Presence of special instructions executing on a single port | 3 | 0 |
Use of masked instructions | 1 | 0 |