Reverse-Engineering a CRC Algorithm
By Gregory Ewing
greg.ewing@canterbury.ac.nz
http://www.cosc.canterbury.ac.nz/greg.ewing
March 2010
I
had occasion to want to write a program to make some changes to some
files that were protected by a CRC. The program that created the files
was closed-source and too big for disassembly to be practical, so I
wondered whether it would be possible to work out the polynomial and
other parameters being used to generate the CRC from examining examples
of the files.
After a rather long and interesting journey, I did
find a way. I'm writing this essay to document the techniques that I
developed, in case anyone else finds them useful.
CRC Parameters
First, a bit of background. I will be discussing CRC algorithms in terms of what is known as the "Rocksoft model", described in "A Painless Guide to CRC Error Detection Algorithms"
by Ross N. Williams. If you're not familiar with the way CRC generation
algorithms work, it would be a good idea to go and read that first
before coming back here.
The relevant parameters can be summarised as follows. Here I'm using the names that the Python program pycrc uses for them.
- Width - the number of bits in the final CRC result.
- Poly
- the polynomial being used to generate the CRC, expressed as a bit
string. (I will omit the leading 1 bit that is always understood to be
present.)
- ReflectIn
(boolean) - Whether to reverse the input bytes before applying the
algorithm to them. (This is equivalent to processing the input bytes
LSB-first instead of MSB-first.)
- XorIn - the initial value of the CRC register.
- ReflectOut
(boolean) - whether to reverse the bits of the CRC before presenting it
as the final result. (This is equivalent to mirroring the algorithm so
that the register is shifted right instead of left.)
- XorOut - a value to be exclusive-ored with the final CRC value.
There
are a number of equivalent ways in which a CRC algorithm can be
implemented. I will be discussing things in terms of a so-called "direct"
bit-by-bit implementation, in which incoming bits are xored with the
leading bit shifted out of the CRC register, rather than being shifted
through the register.
The Data Files
The
files concerned were user-defined data tables for a proprietary
application having a database component. Examination of the files
revealed that they consisted of three parts: a 48-byte header, a
description of the schema, and the data records. The schema section
appeared to be padded with zeroes to bring the header plus schema up to
a multiple of 512 bytes. The test files I was working with were
newly-created tables containing no records, so the end of the schema
was also the end of the file.
There were two bytes in the header that changed
in an apparently random fashion, which I surmised was a 16-bit CRC. I
was assuming it was a CRC rather than some other kind of checksum,
because the application would report a "CRC mismatch" when attempting
to open a modified file.
Since this was on Windows, it also
seemed a fair bet that the CRC would be stored little-endian, and this
later proved to be correct.
Another thing I needed to know was
how much of the file the CRC covered. Most of the header appeared to be
unused, and I found that I could change the last byte of the
header without the application complaining. However, if I changed the
last byte in the schema, it complained of a CRC mismatch. I concluded
that the CRC covered everything in the schema area, including the pad
bytes. (This turned out to be almost correct, but not quite.)
Initial Attempts
From
some googling and newsgroup enquiries, it appeared at first that there
would be no easy way of finding out the CRC parameters, and I would
have to rely on trial and error.
I found a program, pycrc,
that among other things allows you to calculate a CRC for supplied data
given the above parameters. The first thing I did was to try all the
standard 16-bit CRC models that pycrc knows about, none of which worked.
Next
I considered trying a brute-force search of all possible 16-bit
polynomials. This seemed feasible, although not quite as
straightforward as it might appear at first, because of the XorIn and
XorOut parameters. Without knowing them, I wouldn't be able to tell
when I'd found the right polynomial, and trying all possible
combinations of Poly+XorIn+XorOut would obviously not be practical.
Fortunately,
there are some properties of CRC algorithms that provide a way of getting around this difficulty.
Some CRC Theory
Because
the whole CRC algorithm is based on exclusive-or operations, CRCs obey
a kind of superposition principle. You can think of the CRC as being
made up of the exclusive-or of a set of component CRCs, each of which
depends on just one bit in the message.
We can also treat the
initial value of the CRC register (the XorIn parameter) as generating a
component of its own that gets xored into the final CRC. So we can
express the CRC of a message in terms of three components:
C = TnI + D + F
where + represents exclusive-or, and
I = the initial register contents (XorIn)
F = the final xor value (XorOut)
n = the length of the message
D = the homogeneous CRC of the data in the message
Ti x = the result of applying i shift-xor steps to register contents x with zero data
Here I'm borrowing a term from linear algebra and using "homogeneous CRC" to mean a CRC calculated using XorIn = XorOut = 0.
Now suppose we take two messages of the same length and exclusive-or them together. The exclusive-or of their CRCs is given by
C1 + C2 | = | (TnI + D1 + F) + (TnI + D2 + F) |
| = | D1 + D2 |
The I and F
terms have cancelled out, leaving us with just the homogeneous CRC of
the exclusive-ored data. So we can ignore the XorIn and XorOut
parameters initially, as long as we confine ourselves to studying this
kind of message, which I will call a difference message.
Applying some Brute Force
I constructed a difference message, and tried all possible 16-bit polys on it (there are
actually only 2**15 of them, because it doesn't make sense to use a poly
whose LSB isn't 1), and all four combinations of the ReflectIn and
ReflectOut parameters. I got some "hits" -- polys that produced the
expected CRC.
Encouraged by this, I
constructed another message with a different length and did the same
thing again. I got some more hits. But unfortunately, none of them
matched any of the ones I got the first time.
At
this point, I was stuck. I didn't have any more parameters to vary, so
I was beginning to doubt that I was dealing with a standard CRC
algorithm at all. If I wasn't, it seemed unlikely that I would ever be
able to figure it out.
Then
I got into a conversation with
Patrick Maupin, who suggested a test that
might help to clarify whether it was a true CRC or not. Due to the
superposition principle, if changing a message by xoring it with a bit
pattern B1 causes its CRC to change by
C1, and another bit pattern B2 causes the CRC to change by C2, then
xoring the message with (B1 xor B2) should change the CRC by (C1 xor
C2). If that doesn't happen, the algorithm can't be an ordinary
CRC algorithm.
I constructed some suitable test messages, and found that this property did
seem to hold. So there was hope that it could be a standard CRC algorithm, or something very similar to it.
I
also began to wonder what I could learn by studying what happens to
the CRC when individual bits of the file are changed. At first I
thought that the CRC corresponding to a single 1 bit might be a
rotation of the polynomial, but it turns out to be more complicated
than that. However, after some more thinking along those lines, I came
up with a possible plan of attack.
Deducing the Polynomial
Consider
what the CRC algorithm does when applied to a message containing a
single 1 bit with the rest zeroes. We are considering difference
messages here, so the register starts off with a value of zero. As long
as the input bits are zero, the register remains zero.
When the
1 bit arrives, the polynomial gets loaded into the register. After
that, it gets transformed by the shift-xor operation once for each
remaining 0 bit in the message, with the leading bit of the register
determining whether to xor the polynomial in or not.
So the final CRC value will be equal to the polynomial transformed by n shift-xor cycles, where n depends on the position of the 1 bit in the message.
Now
consider two CRC values obtained from two 1-bit messages, where the 1
bits are in adjacent positions. The resulting CRCs will differ by just one shift-xor cycle. To be precise, if C1 corresponds to the message with a 1 in position i, and C2 corresponds to the message with a 1 in position i+1,
then C1 is derived from applying one shift-xor cycle to C2. (If this
seems backwards, it's because the further the 1 bit is from the end of
the message, the more shift-xor cycles get applied to the CRC.)
There
are two possibilities. If the leading bit of C2 (the one about to be
shifted out) is 0, then C1 will be equal to C2 shifted by one place. If
it is 1, then C2 will be equal to C1 shifted one place and xored with
the polynomial.
So all we need to do is find a C1 and C2 such
that the leading bit of C2 is 1, shift C2 by one place, and xor it with
C1. The result will be equal to the polynomial!
Putting Theory into Practice
All
this seemed almost too magical to be true, so I had to try it out.
Although I didn't have complete control over the contents of the files,
I was able to construct a set of 1-bit difference messages for several
adjacent bits, spanning two consecutive bytes somewhere in the middle
of the data.
The byte values I came up with, and their
corresponding CRC values (after byte swapping to correct for
little-endianness) were as follows:
Bytes CRC
----- ----
02 00 763c
04 00 ec78
08 00 98f3
10 00 71e5
20 00 e3ca
40 00 8797
80 00 4f2d
00 01 9e5a
00 02 7cb7
00 04 f96e
00 08 b2df
00 10 25bd
Working
upwards through the list of CRCs, it is apparent that whenever the LSB
of a CRC is 0, the preceding one is obtained by simply right-shifting it,
which is consistent with the theory outlined above.
Now let's
see if we can extract the polynomial. Applying the shift-xor operation
to adjacent CRC values gives the following results, shown in blue:
02 00 763c
0000
04 00 ec78
a001
08 00 98f3
a001
10 00 71e5
0000
20 00 e3ca
a001
40 00 8797
a001
80 00 4f2d
0000
00 01 9e5a
a001
00 02 7cb7
0000
00 04 f96e
a001
00 08 b2df
a001
00 10 25bd
From this it is clear that the polynomial is 0xa001.
A
few other things are also apparent. The shifting direction indicates
that the ReflectOut parameter should be True, since shifting to the
right is equivalent to using the canonical left-shifting version of the
algorithm with the polynomial 0x8005 and then reflecting the resulting
CRC. It is notable that 0x8005 is one of the standard 16-bit
polynomials -- the one that pycrc calls "crc-16".
The fact that
the 1 bits proceed from right to left within each byte as we go down
the sequence indicates that bits are being processed LSB-first. In
terms of the model parameters, this means ReflectIn = True.
I had now established all except two of the CRC algorithm parameters. I was making considerable progress!
Clearing Up some Puzzles
At
this point there were some puzzling things that I needed to sort out.
Having just discovered that one of the standard polynomials was
apparently being used after all, I couldn't understand why I hadn't
found it during my initial experiments with pycrc.
The first
thing I did was go back and try pycrc again, in case I had made a
mistake of some kind the first time around. Armed with knowledge of the
polynomial, I should have been able to run pycrc over one of my difference messages and obtain the correct CRC for it. But, it still didn't work.
I
was convinced by now that I was dealing with a very standard CRC
algorithm, so this failure was perplexing. Thinking about what
could cause it, it occurred to me that my assumption about what region
of the file was covered by the CRC might be wrong. Perhaps not all of
the pad bytes were included, in which case I was going too far and
ending up with the wrong CRC.
Trying to think how I could tell
how far I should be going, I came up with the following idea. Start by
initialising the register with the polynomial -- this corresponds to
the state just after encountering the 1 in a 1-bit difference message.
Then run the algorithm and count the number of steps required before
the known CRC value is reached. Assuming it was eventually reached, that would tell me how many 0 bits following the 1 were included in the CRC.
The
test file I used had a 1 in the byte at offset 0xab. Running the
algorithm told me that a further 0x358 zero bytes were needed to reach
the correct CRC, meaning that the last byte was at offset 0x403...
which was 4 bytes past the end of the file!
So
an extra 4 bytes from somewhere were being included in the CRC. To
check this, I added another 4 zero bytes to the end of my difference
file and ran pycrc over it. This time, I got the correct checksum.
In Search of the Extra Bytes
Where
were the extra bytes coming from? One possibility was that they were in
the header somewhere. Another was that they were just padding.
Previously
I had assumed that the range of checked bytes was contiguous, but now I
retracted that assumption and went back to investigate the header area
more thoroughly. Poking around in it, I found that apart from the CRC
itself, there were three other bytes that would provoke a CRC
mismatch if I changed them.
One of them had a constant value of 0x04 in all the files I looked at. The other two bytes appeared to contain the
length of the schema portion of the file, excluding the header.
My
first thought was that perhaps these were three of the bytes
making up the extra data, or that the extra data was derived
from them in
some way. However, I realised that there was another likely
explanation for the CRC depending on the length bytes. If the
application was using these bytes to tell how much
schema data to read, then changing them would cause it to calculate a
CRC for the wrong amount of data. So the contents of the extra bytes
needn't have anything to do with the length bytes at all.
There
was another curious thing about the extra bytes. The fact that I was
able to successfully calculate CRCs for my difference files meant that
the unseen contents of these bytes must have been the same for both of
the files that went into each difference file. Otherwise, they would
have differed by more than one bit, and my technique for deducing the
polynomial wouldn't have worked.
This suggested that the extra
bytes might be constant, or at least they might depend only on the
length of the file and not the data in it. However, it might only have
been a coincidence, because I had only been working with files having
very small differences, and I might just not have happened to change
anything that would trigger a difference in the extra bytes.
It
would be fortuitous if the extra bytes were constant, because since
they appear at the end of the data being checked, their effect on the
CRC depends only on their contents and not on the length of the file.
If they were constant, their actual contents wouldn't matter, because I
would be able to assume they were zero and compensate for their effect
by making an adjustment to the XorOut parameter.
In any case, I
had gone about as far as I could go using difference files, and it was
time to start working with the original files.
Guessing Parameters
When
an XorIn or XorOut parameter is employed, the value almost always
used is 0xffff, so the first thing I tried was calculating CRCs for
some real files using both 0 and 0xffff for XorIn, and assuming zeroes
for the extra bytes. This gave me the XorOut value that I would need to
use in order to get a CRC matching the official one in the file.
The
XorOut values that I obtained this way were not 0 or 0xffff, but this
was to be expected, because the actual contents of the extra bytes may
not have been zero. However, all of the
files of a given length seemed to require the same XorOut value, providing more support for the idea that the extra bytes are the same for all files of a particular length.
To
further check this, I created some more test files with very different
contents. The hypothesis still seemed to hold up -- the XorOut values
required for a particular length was always the same.
I needed different
XorOut values for different lengths, however. This could have been
because the extra bytes varied somehow according to the length of the
file, but it could also have been because a non-standard XorIn was
being used. This would get transformed in different ways as it passed
through different numbers of shift-xor steps and change the resulting
CRC.
I decided to stick with the hypothesis that the extra bytes
were constant, and try to find an XorIn value consistent with it. This
meant finding a single pair of XorIn/XorOut values that would work
across files of different sizes.
Now, I could have used brute
force again here. There are only 2**16 possibilities, because choosing
an XorIn for some file uniquely determines the XorOut needed, and then
it's just a matter of trying the same combination on another file and
seeing if it works.
However, I wanted to see if I could come up
with a more direct approach, partly for the challenge, and partly
because it would give me a technique that could be applied to larger
CRCs for which exhaustive searching would not be feasible.
Deducing XorIn
Time
for some more theory. Earlier we considered exclusive-oring together
two messages of the same length. Now let's look at what happens if we
exclusive-or two messages of different lengths together.
Let n1 and n2 be the lengths of the messages. The exclusive-or of their CRCs will be
C1 + C2 | = | (Tn1I + D1 + F) + (Tn2I + D2 + F) |
| = | Tn1I + Tn2I + (D1 + D2) |
Notice that F has cancelled out again, but not I. We can further rearrange the equation to give
Tn1I + Tn2I | = | C1 + C2 + D1 + D2 |
or
where K = C1 + C2 + D1 + D2. We now have an equation in which I is the only unknown quantity.
Adding Tn1 and Tn2 together may look a bit strange, but it makes perfectly good sense when you realise that T is a linear operator, and can be manipulated using the techniques of linear algebra.
In particular, we can represent T using a matrix, and calculate Tx using matrix multiplication with mod-2 arithmetic.
To calculate T, we decompose x into a sum of basis elements bi, each of which corresponds to one bit: b1 = 0x0001, b2 = 0x0002, etc. Then feed each bi through one step of the shift-xor operation. The result becomes column i of the matrix T.
Having got T, we can calculate M = Tn1+ Tn2. All that remains then is to solve the matrix equation M I = K.
Solving the Equation
Solving
a matrix equation like this is a standard linear algebra problem, so it
seemed like it should be possible to use an algorithm such as
Gauss-Jordan elimination, modified to use mod-2 arithmetic.
When
I tried to do this, the elimination algorithm got stuck, unable to find
a pivot element for the last row. Examining the matrix revealed that it
had ended up with an all-zero row. When this happens, it is an
indication that the equation has more than one solution.
The final state of the matrix looked like this:
0: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
3: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
7: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
9: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
10: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
11: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
12: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
13: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
14: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
15: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
This corresponds to the following set of equations for the bits of I:
b0 + b15 | = | 0 | | b8 | = | 0 |
b1 | = | 1 | | b9 | = | 0 |
b2 | = | 1 | | b10 | = | 0 |
b3 | = | 0 | | b11 | = | 0 |
b4 | = | 0 | | b12 | = | 1 |
b5 | = | 0 | | b13 | = | 0 |
b6 | = | 1 | | b14 + b15 | = | 1 |
b7 | = | 0 | |
| | |
By
inspection, there are two bit patterns that satisfy these constraints,
0x5046 and 0x9047. Trying them on the relevant files revealed the
corresponding XorOut values required:
XorIn | | XorOut |
5046 | | 3e64 |
5047 | | fe65 |
I tried these values on the rest of my test files, and both combinations worked for all of them, regardless of the file size.
An Aside
It's interesting to note that the difference between
these two XorIn values (0xc001) is also the difference between the
corresponding XorOut values.
It turns out that this is not a
coincidence, because 0xc001 is a fixed point of the shift-xor
operation: (0xc001 >> 1) ^ 0xa001 == 0xc001. So if you change the
initial register value by 0xc001, you change its value on all
subsequent steps by 0xc001 also.
In terms of linear algebra, the value e = 0xc001 is an eigenvector of the matrix T with eigenvalue 1, i.e. T e = e. This also means that Tn e = e for any n. So (Tn1 + Tn2) (x + e) = (Tn1 + Tn2) x + e + e = (Tn1 + Tn2) x.
Consequently, if X is a solution of (Tn1 + Tn2) x = K, then X + e must also be a solution. |
Success
Encouraged by this, I tried one of these pairs on a collection of about 50 real files collected from the wild. It worked for all of them.
I
never found out exactly what was going on with the extra bytes, or why
such strange-looking XorIn/XorOut values were needed, but I had found
an algorithm which seemed to be good enough, and I was satisfied with
that.
Conclusion
As
well as solving my immediate problem, I
had also developed a general method for determining all of the Rocksoft
parameters for a CRC algorithm, given a small collection of
specially-chosen example files. My case was complicated by the presence
of unknown data, but if the data being checked is fully known, the
method requires no searching or guesswork and can be used for CRCs
of any size.
So
if I ever need to figure out a 64-bit CRC, I'll be able to do it
without needing a Blue Gene. And now that you've read this, you will
too!