Function: hypre_LowerBound | Module: exec | Source: binsearch.c:95-107 [...] | Coverage: 0.01% |
---|
Function: hypre_LowerBound | Module: exec | Source: binsearch.c:95-107 [...] | Coverage: 0.01% |
---|
/scratch_na/users/xoserete/qaas_runs/171-415-3872/intel/AMG/build/AMG/AMG/utilities/binsearch.c: 95 - 107 |
-------------------------------------------------------------------------------- |
95: { |
96: HYPRE_Int *it; |
97: size_t count = last - first, step; |
98: |
99: while (count > 0) { |
100: it = first; step = count/2; it += step; |
101: if (*it < value) { |
[...] |
107: return first; |
0x4e5280 MOV %RSI,%RAX |
0x4e5283 SUB %RDI,%RSI |
0x4e5286 JE 4e52bf |
0x4e5288 PUSH %RBP |
0x4e5289 MOV %RSP,%RBP |
0x4e528c SAR $0x3,%RSI |
0x4e5290 MOV %RDI,%RAX |
0x4e5293 MOV %RSI,%RCX |
0x4e5296 JMP 4e52a8 |
0x4e5298 NOPL (%RAX,%RAX,1) |
(4424) 0x4e52a0 MOV %RCX,%RSI |
(4424) 0x4e52a3 TEST %RCX,%RCX |
(4424) 0x4e52a6 JE 4e52be |
(4424) 0x4e52a8 SHR $0x1,%RCX |
(4424) 0x4e52ab CMP %RDX,(%RAX,%RCX,8) |
(4424) 0x4e52af JGE 4e52a0 |
(4424) 0x4e52b1 LEA 0x8(%RAX,%RCX,8),%RAX |
(4424) 0x4e52b6 NOT %RCX |
(4424) 0x4e52b9 ADD %RSI,%RCX |
(4424) 0x4e52bc JMP 4e52a0 |
0x4e52be POP %RBP |
0x4e52bf RET |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►73.33+ | hypre_CSRMatrixGetLoadBalanced[...] | csr_matrix.c:663 | exec |
○ | hypre_CSRMatrixMatvecOutOfPlac[...] | csr_matvec.c:247 | exec |
○ | __kmp_invoke_microtask | libiomp5.so | |
○ | __kmp_invoke_task_func | libiomp5.so | |
►22.22+ | hypre_CSRMatrixGetLoadBalanced[...] | csr_matrix.c:663 | exec |
○ | hypre_CSRMatrixMatvecOutOfPlac[...] | csr_matvec.c:247 | exec |
○ | __kmp_invoke_microtask | libiomp5.so | |
○ | __kmp_invoke_task_func | libiomp5.so | |
►2.22+ | hypre_CSRMatrixGetLoadBalanced[...] | csr_matrix.c:663 | exec |
○ | hypre_CSRMatrixTranspose.extra[...] | csr_matop.c:470 | exec |
○ | __kmp_invoke_microtask | libiomp5.so | |
○ | __kmp_invoke_task_func | libiomp5.so | |
►2.22+ | hypre_BoomerAMGCreate2ndS.extr[...] | par_strength.c:1625 | exec |
○ | __kmp_invoke_microtask | libiomp5.so | |
○ | __kmp_invoke_task_func | libiomp5.so |
Path / |
Source file and lines | binsearch.c:95-107 |
Module | exec |
nb instructions | 12 |
nb uops | 12 |
loop length | 34 |
used x86 registers | 6 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
micro-operation queue | 2.00 cycles |
front end | 2.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 0.40 | 0.67 | 0.67 | 0.50 | 0.40 | 1.50 | 0.50 | 0.50 | 0.50 | 0.20 | 0.67 |
cycles | 1.50 | 0.40 | 0.67 | 0.67 | 0.50 | 0.40 | 1.50 | 0.50 | 0.50 | 0.50 | 0.20 | 0.67 |
Cycles executing div or sqrt instructions | NA |
FE+BE cycles | 2.25-3.14 |
Stall cycles | 0.00-0.30 |
Front-end | 2.00 |
Dispatch | 1.50 |
Overall L1 | 2.00 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
all | 12% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 12% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MOV %RSI,%RAX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
SUB %RDI,%RSI | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 1 | 0.20 |
JE 4e52bf <hypre_LowerBound+0x3f> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
PUSH %RBP | 1 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50 | 0.50 | 0.50 | 0 | 0 | 5-12 | 0.50 |
MOV %RSP,%RBP | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
SAR $0x3,%RSI | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0-2 | 0.50 |
MOV %RDI,%RAX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
MOV %RSI,%RCX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
JMP 4e52a8 <hypre_LowerBound+0x28> | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5.84 |
NOPL (%RAX,%RAX,1) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
POP %RBP | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1-6 | 0.33 |
RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 |
Source file and lines | binsearch.c:95-107 |
Module | exec |
nb instructions | 12 |
nb uops | 12 |
loop length | 34 |
used x86 registers | 6 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
micro-operation queue | 2.00 cycles |
front end | 2.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 0.40 | 0.67 | 0.67 | 0.50 | 0.40 | 1.50 | 0.50 | 0.50 | 0.50 | 0.20 | 0.67 |
cycles | 1.50 | 0.40 | 0.67 | 0.67 | 0.50 | 0.40 | 1.50 | 0.50 | 0.50 | 0.50 | 0.20 | 0.67 |
Cycles executing div or sqrt instructions | NA |
FE+BE cycles | 2.25-3.14 |
Stall cycles | 0.00-0.30 |
Front-end | 2.00 |
Dispatch | 1.50 |
Overall L1 | 2.00 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
all | 12% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 12% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MOV %RSI,%RAX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
SUB %RDI,%RSI | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 1 | 0.20 |
JE 4e52bf <hypre_LowerBound+0x3f> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
PUSH %RBP | 1 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50 | 0.50 | 0.50 | 0 | 0 | 5-12 | 0.50 |
MOV %RSP,%RBP | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
SAR $0x3,%RSI | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0-2 | 0.50 |
MOV %RDI,%RAX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
MOV %RSI,%RCX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
JMP 4e52a8 <hypre_LowerBound+0x28> | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5.84 |
NOPL (%RAX,%RAX,1) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
POP %RBP | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1-6 | 0.33 |
RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼hypre_LowerBound– | 0.01 | 0 |
○Loop 4424 - binsearch.c:99-101 - exec | 0.01 | 0.01 |