I am trying to reverse engineer a binary and the following instruction is confusing me, can anyone clarify what exactly this does?
=>0x804854e: repnz scas al,BYTE PTR es:[edi]
0x8048550: not ecx
Where:
EAX: 0x0
ECX: 0xffffffff
EDI: 0xbffff3dc ("aaaaaa\n")
ZF: 1
I see that it is somehow decrementing ECX by 1 each iteration, and that EDI is incrementing along the length of the string. I know it calculates the length of the string, but as far as exactly HOW it's happening, and why "al" is involved I'm not quite sure.
I'll try to explain it by reversing the code back into C.
Intel's Instruction Set Reference (Volume 2 of Software Developer's Manual) is invaluable for this kind of reverse engineering.
The logic for REPNE and SCASB combined:
while (ecx != 0) {
temp = al - *(BYTE *)edi;
SetStatusFlags(temp);
if (DF == 0) // DF = Direction Flag
edi = edi + 1;
else
edi = edi - 1;
ecx = ecx - 1;
if (ZF == 1) break;
}
Or more simply:
while (ecx != 0) {
ZF = (al == *(BYTE *)edi);
if (DF == 0)
edi++;
else
edi--;
ecx--;
if (ZF) break;
}
However, the above is insufficient to explain how it computes the length of a string. Based on the presence of the not ecx
in your question, I'm assuming the snippet belongs to this idiom (or similar) for computing string length using REPNE SCASB
:
sub ecx, ecx
sub al, al
not ecx
cld
repne scasb
not ecx
dec ecx
Translating to C and using our logic from the previous section, we get:
ecx = (unsigned)-1;
al = 0;
DF = 0;
while (ecx != 0) {
ZF = (al == *(BYTE *)edi);
if (DF == 0)
edi++;
else
edi--;
ecx--;
if (ZF) break;
}
ecx = ~ecx;
ecx--;
Simplifying using al = 0
and DF = 0
:
ecx = (unsigned)-1;
while (ecx != 0) {
ZF = (0 == *(BYTE *)edi);
edi++;
ecx--;
if (ZF) break;
}
ecx = ~ecx;
ecx--;
Things to note:
ecx
is equivalent to -1 - ecx
.ecx
is decremented before the loop breaks, so it decrements by length(edi) + 1
in total.ecx
can never be zero in the loop, since the string would have to occupy the entire address space.So after the loop above, ecx
contains -1 - (length(edi) + 1)
which is the same as -(length(edi) + 2)
, which we flip the bits to give length(edi) + 1
, and finally decrement to give length(edi)
.
Or rearranging the loop and simplifying:
const char *s = edi;
size_t c = (size_t)-1; // c == -1
while (*s++ != '\0') c--; // c == -1 - length(s)
c = ~c; // c == length(s)
And inverting the count:
size_t c = 0;
while (*s++ != '\0') c++;
which is the strlen
function from C:
size_t strlen(const char *s) {
size_t c = 0;
while (*s++ != '\0') c++;
return c;
}