By Andreas Schuster
Copyright © 2008 int for(ensic){blog;}. All rights reserved. Reproduction for commercial purposes (including online advertisement) interdicted.
The introductory post already provided some indications of (dis-)similarities in "independent" implementations of a certain function by three vendors. In this post I'm comparing two of the three implementations. Also I will reveal the identities of their vendors.
The reference implementation is the Microsoft Windows NTLDR version 5.1.2600.2180 (XP SP2). I've also checked with an earlier version (5.1.2600.1106), but didn't find any significant differences. So I'll stick with that version.
Vendor "S" is Sandman, an open source project by Nicolas Ruff and Matthieu Suiche. They released their code under version 3 of the GNU Public License on February 26, 2008.
The function in question is "XpressDecode". It is used to decompress the hibernation file. The routine is called by "HbReadNextCompressedBlock". It then calls an anonymous decoder routine. I'll stick with Sandman's nomenclature and call it "Decode" henceforth.
For your convenience, here's the flow graph from the introductory post again. Microsoft's code is shown to the left and the implementation by the Sandman project in the middle.

There seem to be significant differences between both implementations. So I decided to analyze the original function on my own and compare the results with the code from the Sandman library. Also, I analyzed Sandman's hibinfo.exe sample program in the same way as NTLDR for later usage. Unless otherwise noted, line numbers refer to file source/src/compression.c of the Sandman library. If you're not interested in (admittedly very technical and boring) details you might want to skip down to the conclusions.
Now on to the differences:
No. 1: In file source/include/compression.h Sandman declares the _DECODE_DATA structure. It is 0x38 bytes in size.
Microsoft's implementation reserves 0x3c bytes on the stack for local variables. The differing 4 bytes are not accessed, though. So I conclude that the structure's size is 0x3c bytes. By chance the last DWORD is reserved for future use.
No. 2: Sandman initializes its DecodeData variable with null bytes, as it is common practice among programmers (line 72). Microsoft, however, does not.
No. 3: Sandman then continues and assigns to DecodeData.DataOffset first and then does some comparisons and branches. The Microsoft implementation does it just the other way round: First it checks whether Size1 and CompressedBlockSize are equal and then, if they're not, it assigns to DecodeData.DataOffset. This could save an unneeded assignment operation now and then.
No. 4: Sandman (in line 79) contains a little hack that increments Size2 by 0x108. This is a bit too early. Therefore line 83 will return a wrong value.
No. 5: Now follows a series of conditions. First the implementation from NTLDR in pseudo-code:
if (
(Size1 < CompressedBlockSize) ||
(CompressedBlockSize < 0) ||
(Size1 <=
||
(CompressedBlockSize <
)
{
return -1;
}
if ((Size1 > 0x10000) || (Size2 <= 0))
{
return Size2;
}
This gets somewhat over-simplified in Sandman. Please note that conditions are ordered differently.
if ((Size1 == CompressedBlockSize) || (Size1 > 0x10000) || (!Size2))
{
return Size2;
}
if (
(Size1 < CompressedBlockSize) ||
(!CompressedBlockSize) ||
(Size1 <
||
(CompressedBlockSize <
)
{
return -1;
}
No. 6: Beside the different ordering there were also some conditional jump instructions slightly misinterpreted by the Sandman project, for example
mov esi, [ebp+Size2]
test esi, esi ; bitwise AND of esi with itself
jle return_size2 ; jump if less or equal, Size2 <= 0
was interpreted as (!Size2), but that could be expected to look more like
cmp [ebp+Size2], 0
jnz somewhere ; jump if not zero, Size2 != 0
mov eax, [ebp+Size2] ; return Size2
jmp done
This affects the following expressions:
- (!Size2) instead of (Size2 <= 0) in line 81
- (!CompressedBlockSize) instead of (CompressedBlockSize<0) in line 86
- (Size1<8) instead of (Size1<=8) on line 86
No. 7: Some parts of the Sandman implementation lack a BlockEntry term. This affects the calculation of
- DecodeData.CompressedBlockSize1 in line 33
- DecodeData.CompressedBlockSize3 in line 34
- DecodeData.PageLimit in line 38
- DecodeData.CompressedBlockSize2 in line 39
No. 8: Finally there are again some differences in the ordering of conditions. First Microsoft's implementation:
if (
(DecodeData.Status == 0) ||
(DecodeData.DecodedSize > DecodeData.Size2) ||
(DecodeData.PageOffset > DecodeData.PageLimit)
)
{
return -1;
}
if (DecodeData.Size2 != DecodeData.BufferLimit)
{
return Size2;
}
if (DecodeData.Status2 == 0)
{
return -1;
}
return Size2;
And now Sandman:
if (
(DecodeData.PageOffset > DecodeData.PageLimit) ||
(DecodeData.Status == FALSE) ||
(DecodeData.Status2 == FALSE) ||
(DecodeData.DecodedSize > PtrToUlong(DecodeData.Size2))
)
{
return -1;
}
if (DecodeData.Size2 != DecodeData.BufferLimit)
{
return Size2;
}
return Size2;
Due to the wrong ordering the function will return -1 instead of Size2 in one out of 32 conditions. However, I don't know if this is of relevance in real-world conditions.
No. 9: Also please note that in the Sandman implementation the last comparison is superfluous, as it will return Size2 in any case.
Conclusion: The examined implementation of XpressDecode by the Sandman project differs from Microsoft's implementation in a dozen places. Some differences are marginal, e.g. the initialization of a variable with null bytes. However, other differences are likely to affect the proper decompression of the data stream.
What does that tell us? First, tools could benefit from peer review and (independent) tool testing. Second, bugs and subtle differences may make for a good fingerprint. What do you think are the odds that the very same differences can be found in an implementation that was independently researched and coded?