Reverse-Engineering a CRC Algorithm

By Gregory Ewing
greg.ewing@canterbury.ac.nz
http://www.cosc.canterbury.ac.nz/greg.ewing
March 2010

I had occasion to want to write a program to make some changes to some files that were protected by a CRC. The program that created the files was closed-source and too big for disassembly to be practical, so I wondered whether it would be possible to work out the polynomial and other parameters being used to generate the CRC from examining examples of the files.

After a rather long and interesting journey, I did find a way. I'm writing this essay to document the techniques that I developed, in case anyone else finds them useful.

CRC Parameters

First, a bit of background. I will be discussing CRC algorithms in terms of what is known as the "Rocksoft model", described in "A Painless Guide to CRC Error Detection Algorithms" by Ross N. Williams. If you're not familiar with the way CRC generation algorithms work, it would be a good idea to go and read that first before coming back here.

The relevant parameters can be summarised as follows. Here I'm using the names that the Python program pycrc uses for them.

Width - the number of bits in the final CRC result.
Poly - the polynomial being used to generate the CRC, expressed as a bit string. (I will omit the leading 1 bit that is always understood to be present.)
ReflectIn (boolean) - Whether to reverse the input bytes before applying the algorithm to them. (This is equivalent to processing the input bytes LSB-first instead of MSB-first.)
XorIn - the initial value of the CRC register.
ReflectOut (boolean) - whether to reverse the bits of the CRC before presenting it as the final result. (This is equivalent to mirroring the algorithm so that the register is shifted right instead of left.)
XorOut - a value to be exclusive-ored with the final CRC value.

There are a number of equivalent ways in which a CRC algorithm can be implemented. I will be discussing things in terms of a so-called "direct" bit-by-bit implementation, in which incoming bits are xored with the leading bit shifted out of the CRC register, rather than being shifted through the register.

The Data Files

The files concerned were user-defined data tables for a proprietary application having a database component. Examination of the files revealed that they consisted of three parts: a 48-byte header, a description of the schema, and the data records. The schema section appeared to be padded with zeroes to bring the header plus schema up to a multiple of 512 bytes. The test files I was working with were newly-created tables containing no records, so the end of the schema was also the end of the file.

There were two bytes in the header that changed in an apparently random fashion, which I surmised was a 16-bit CRC. I was assuming it was a CRC rather than some other kind of checksum, because the application would report a "CRC mismatch" when attempting to open a modified file.

Since this was on Windows, it also seemed a fair bet that the CRC would be stored little-endian, and this later proved to be correct.

Another thing I needed to know was how much of the file the CRC covered. Most of the header appeared to be unused, and I found that I could change the last byte of the header without the application complaining. However, if I changed the last byte in the schema, it complained of a CRC mismatch. I concluded that the CRC covered everything in the schema area, including the pad bytes. (This turned out to be almost correct, but not quite.)

Initial Attempts

From some googling and newsgroup enquiries, it appeared at first that there would be no easy way of finding out the CRC parameters, and I would have to rely on trial and error.

I found a program, pycrc, that among other things allows you to calculate a CRC for supplied data given the above parameters. The first thing I did was to try all the standard 16-bit CRC models that pycrc knows about, none of which worked.

Next I considered trying a brute-force search of all possible 16-bit polynomials. This seemed feasible, although not quite as straightforward as it might appear at first, because of the XorIn and XorOut parameters. Without knowing them, I wouldn't be able to tell when I'd found the right polynomial, and trying all possible combinations of Poly+XorIn+XorOut would obviously not be practical.

Fortunately, there are some properties of CRC algorithms that provide a way of getting around this difficulty.

Some CRC Theory

Because the whole CRC algorithm is based on exclusive-or operations, CRCs obey a kind of superposition principle. You can think of the CRC as being made up of the exclusive-or of a set of component CRCs, each of which depends on just one bit in the message.

We can also treat the initial value of the CRC register (the XorIn parameter) as generating a component of its own that gets xored into the final CRC. So we can express the CRC of a message in terms of three components:

C = TⁿI + D + F

where + represents exclusive-or, and

I = the initial register contents (XorIn)
F = the final xor value (XorOut)
n = the length of the message
D = the homogeneous CRC of the data in the message
Tⁱ x = the result of applying i shift-xor steps to register contents x with zero data

Here I'm borrowing a term from linear algebra and using "homogeneous CRC" to mean a CRC calculated using XorIn = XorOut = 0.

Now suppose we take two messages of the same length and exclusive-or them together. The exclusive-or of their CRCs is given by

C₁ + C₂	=	(TⁿI + D₁ + F) + (TⁿI + D₂ + F)
	=	D₁ + D₂

The I and F terms have cancelled out, leaving us with just the homogeneous CRC of the exclusive-ored data. So we can ignore the XorIn and XorOut parameters initially, as long as we confine ourselves to studying this kind of message, which I will call a difference message.

Applying some Brute Force

I constructed a difference message, and tried all possible 16-bit polys on it (there are actually only 2**15 of them, because it doesn't make sense to use a poly whose LSB isn't 1), and all four combinations of the ReflectIn and ReflectOut parameters. I got some "hits" -- polys that produced the expected CRC.

Encouraged by this, I constructed another message with a different length and did the same thing again. I got some more hits. But unfortunately, none of them matched any of the ones I got the first time.

At this point, I was stuck. I didn't have any more parameters to vary, so I was beginning to doubt that I was dealing with a standard CRC algorithm at all. If I wasn't, it seemed unlikely that I would ever be able to figure it out.

Then I got into a conversation with Patrick Maupin, who suggested a test that might help to clarify whether it was a true CRC or not. Due to the superposition principle, if changing a message by xoring it with a bit pattern B1 causes its CRC to change by C1, and another bit pattern B2 causes the CRC to change by C2, then xoring the message with (B1 xor B2) should change the CRC by (C1 xor C2). If that doesn't happen, the algorithm can't be an ordinary CRC algorithm.

I constructed some suitable test messages, and found that this property did seem to hold. So there was hope that it could be a standard CRC algorithm, or something very similar to it.

I also began to wonder what I could learn by studying what happens to the CRC when individual bits of the file are changed. At first I thought that the CRC corresponding to a single 1 bit might be a rotation of the polynomial, but it turns out to be more complicated than that. However, after some more thinking along those lines, I came up with a possible plan of attack.

Deducing the Polynomial

Consider what the CRC algorithm does when applied to a message containing a single 1 bit with the rest zeroes. We are considering difference messages here, so the register starts off with a value of zero. As long as the input bits are zero, the register remains zero.

When the 1 bit arrives, the polynomial gets loaded into the register. After that, it gets transformed by the shift-xor operation once for each remaining 0 bit in the message, with the leading bit of the register determining whether to xor the polynomial in or not.

So the final CRC value will be equal to the polynomial transformed by n shift-xor cycles, where n depends on the position of the 1 bit in the message.

Now consider two CRC values obtained from two 1-bit messages, where the 1 bits are in adjacent positions. The resulting CRCs will differ by just one shift-xor cycle. To be precise, if C1 corresponds to the message with a 1 in position i, and C2 corresponds to the message with a 1 in position i+1, then C1 is derived from applying one shift-xor cycle to C2. (If this seems backwards, it's because the further the 1 bit is from the end of the message, the more shift-xor cycles get applied to the CRC.)

There are two possibilities. If the leading bit of C2 (the one about to be shifted out) is 0, then C1 will be equal to C2 shifted by one place. If it is 1, then C2 will be equal to C1 shifted one place and xored with the polynomial.

So all we need to do is find a C1 and C2 such that the leading bit of C2 is 1, shift C2 by one place, and xor it with C1. The result will be equal to the polynomial!

Putting Theory into Practice

All this seemed almost too magical to be true, so I had to try it out. Although I didn't have complete control over the contents of the files, I was able to construct a set of 1-bit difference messages for several adjacent bits, spanning two consecutive bytes somewhere in the middle of the data.

The byte values I came up with, and their corresponding CRC values (after byte swapping to correct for little-endianness) were as follows:

Bytes CRC
----- ----
02 00    763c
04 00    ec78
08 00    98f3
10 00    71e5
20 00    e3ca
40 00    8797
80 00    4f2d
00 01    9e5a
00 02    7cb7
00 04    f96e
00 08    b2df
00 10    25bd

Working upwards through the list of CRCs, it is apparent that whenever the LSB of a CRC is 0, the preceding one is obtained by simply right-shifting it, which is consistent with the theory outlined above.

Now let's see if we can extract the polynomial. Applying the shift-xor operation to adjacent CRC values gives the following results, shown in blue:

02 00 763c
     0000
04 00 ec78
     a001
08 00 98f3
     a001
10 00 71e5
     0000
20 00 e3ca
     a001
40 00 8797
     a001
80 00 4f2d
     0000
00 01 9e5a
     a001
00 02 7cb7
     0000
00 04 f96e
     a001
00 08 b2df
     a001
00 10 25bd

From this it is clear that the polynomial is 0xa001.

A few other things are also apparent. The shifting direction indicates that the ReflectOut parameter should be True, since shifting to the right is equivalent to using the canonical left-shifting version of the algorithm with the polynomial 0x8005 and then reflecting the resulting CRC. It is notable that 0x8005 is one of the standard 16-bit polynomials -- the one that pycrc calls "crc-16".

The fact that the 1 bits proceed from right to left within each byte as we go down the sequence indicates that bits are being processed LSB-first. In terms of the model parameters, this means ReflectIn = True.

I had now established all except two of the CRC algorithm parameters. I was making considerable progress!

Clearing Up some Puzzles

At this point there were some puzzling things that I needed to sort out. Having just discovered that one of the standard polynomials was apparently being used after all, I couldn't understand why I hadn't found it during my initial experiments with pycrc.

The first thing I did was go back and try pycrc again, in case I had made a mistake of some kind the first time around. Armed with knowledge of the polynomial, I should have been able to run pycrc over one of my difference messages and obtain the correct CRC for it. But, it still didn't work.

I was convinced by now that I was dealing with a very standard CRC algorithm, so this failure was perplexing. Thinking about what could cause it, it occurred to me that my assumption about what region of the file was covered by the CRC might be wrong. Perhaps not all of the pad bytes were included, in which case I was going too far and ending up with the wrong CRC.

Trying to think how I could tell how far I should be going, I came up with the following idea. Start by initialising the register with the polynomial -- this corresponds to the state just after encountering the 1 in a 1-bit difference message. Then run the algorithm and count the number of steps required before the known CRC value is reached. Assuming it was eventually reached, that would tell me how many 0 bits following the 1 were included in the CRC.

The test file I used had a 1 in the byte at offset 0xab. Running the algorithm told me that a further 0x358 zero bytes were needed to reach the correct CRC, meaning that the last byte was at offset 0x403... which was 4 bytes past the end of the file!

So an extra 4 bytes from somewhere were being included in the CRC. To check this, I added another 4 zero bytes to the end of my difference file and ran pycrc over it. This time, I got the correct checksum.

In Search of the Extra Bytes

Where were the extra bytes coming from? One possibility was that they were in the header somewhere. Another was that they were just padding.

Previously I had assumed that the range of checked bytes was contiguous, but now I retracted that assumption and went back to investigate the header area more thoroughly. Poking around in it, I found that apart from the CRC itself, there were three other bytes that would provoke a CRC mismatch if I changed them.

One of them had a constant value of 0x04 in all the files I looked at. The other two bytes appeared to contain the length of the schema portion of the file, excluding the header.

My first thought was that perhaps these were three of the bytes making up the extra data, or that the extra data was derived from them in some way. However, I realised that there was another likely explanation for the CRC depending on the length bytes. If the application was using these bytes to tell how much schema data to read, then changing them would cause it to calculate a CRC for the wrong amount of data. So the contents of the extra bytes needn't have anything to do with the length bytes at all.

There was another curious thing about the extra bytes. The fact that I was able to successfully calculate CRCs for my difference files meant that the unseen contents of these bytes must have been the same for both of the files that went into each difference file. Otherwise, they would have differed by more than one bit, and my technique for deducing the polynomial wouldn't have worked.

This suggested that the extra bytes might be constant, or at least they might depend only on the length of the file and not the data in it. However, it might only have been a coincidence, because I had only been working with files having very small differences, and I might just not have happened to change anything that would trigger a difference in the extra bytes.

It would be fortuitous if the extra bytes were constant, because since they appear at the end of the data being checked, their effect on the CRC depends only on their contents and not on the length of the file. If they were constant, their actual contents wouldn't matter, because I would be able to assume they were zero and compensate for their effect by making an adjustment to the XorOut parameter.

In any case, I had gone about as far as I could go using difference files, and it was time to start working with the original files.

Guessing Parameters

When an XorIn or XorOut parameter is employed, the value almost always used is 0xffff, so the first thing I tried was calculating CRCs for some real files using both 0 and 0xffff for XorIn, and assuming zeroes for the extra bytes. This gave me the XorOut value that I would need to use in order to get a CRC matching the official one in the file.

The XorOut values that I obtained this way were not 0 or 0xffff, but this was to be expected, because the actual contents of the extra bytes may not have been zero. However, all of the files of a given length seemed to require the same XorOut value, providing more support for the idea that the extra bytes are the same for all files of a particular length.

To further check this, I created some more test files with very different contents. The hypothesis still seemed to hold up -- the XorOut values required for a particular length was always the same.

I needed different XorOut values for different lengths, however. This could have been because the extra bytes varied somehow according to the length of the file, but it could also have been because a non-standard XorIn was being used. This would get transformed in different ways as it passed through different numbers of shift-xor steps and change the resulting CRC.

I decided to stick with the hypothesis that the extra bytes were constant, and try to find an XorIn value consistent with it. This meant finding a single pair of XorIn/XorOut values that would work across files of different sizes.

Now, I could have used brute force again here. There are only 2**16 possibilities, because choosing an XorIn for some file uniquely determines the XorOut needed, and then it's just a matter of trying the same combination on another file and seeing if it works.

However, I wanted to see if I could come up with a more direct approach, partly for the challenge, and partly because it would give me a technique that could be applied to larger CRCs for which exhaustive searching would not be feasible.

Deducing XorIn

Time for some more theory. Earlier we considered exclusive-oring together two messages of the same length. Now let's look at what happens if we exclusive-or two messages of different lengths together.

Let n1 and n2 be the lengths of the messages. The exclusive-or of their CRCs will be

C₁ + C₂	=	(Tⁿ¹I + D₁ + F) + (Tⁿ²I + D₂ + F)
	=	Tⁿ¹I + Tⁿ²I + (D₁ + D₂)

Notice that F has cancelled out again, but not I. We can further rearrange the equation to give

Tⁿ¹I + Tⁿ²I

C₁ + C₂ + D₁ + D₂

(Tⁿ¹+ Tⁿ²) I

where K = C₁ + C₂ + D₁ + D₂. We now have an equation in which I is the only unknown quantity.

Adding Tⁿ¹and Tⁿ² together may look a bit strange, but it makes perfectly good sense when you realise that T is a linear operator, and can be manipulated using the techniques of linear algebra.

In particular, we can represent T using a matrix, and calculate Tx using matrix multiplication with mod-2 arithmetic.

To calculate T, we decompose x into a sum of basis elements b_i, each of which corresponds to one bit: b₁ = 0x0001, b₂ = 0x0002, etc. Then feed each b_i through one step of the shift-xor operation. The result becomes column i of the matrix T.

Having got T, we can calculate M = Tⁿ¹+ Tⁿ². All that remains then is to solve the matrix equation M I = K.

Solving the Equation

Solving a matrix equation like this is a standard linear algebra problem, so it seemed like it should be possible to use an algorithm such as Gauss-Jordan elimination, modified to use mod-2 arithmetic.

When I tried to do this, the elimination algorithm got stuck, unable to find a pivot element for the last row. Examining the matrix revealed that it had ended up with an all-zero row. When this happens, it is an indication that the equation has more than one solution.

The final state of the matrix looked like this:

0: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
3: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
7: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
9: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
10: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
11: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
12: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
13: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
14: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
15: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This corresponds to the following set of equations for the bits of I:

b₀ + b₁₅	=	0	b₈	=	0
b₁	=	1	b₉	=	0
b₂	=	1	b₁₀	=	0
b₃	=	0	b₁₁	=	0
b₄	=	0	b₁₂	=	1
b₅	=	0	b₁₃	=	0
b₆	=	1	b₁₄ + b₁₅	=	1
b₇	=	0

By inspection, there are two bit patterns that satisfy these constraints, 0x5046 and 0x9047. Trying them on the relevant files revealed the corresponding XorOut values required:

XorIn		XorOut
5046		3e64
5047		fe65

I tried these values on the rest of my test files, and both combinations worked for all of them, regardless of the file size.

An Aside

It's interesting to note that the difference between these two XorIn values (0xc001) is also the difference between the corresponding XorOut values.

It turns out that this is not a coincidence, because 0xc001 is a fixed point of the shift-xor operation: (0xc001 >> 1) ^ 0xa001 == 0xc001. So if you change the initial register value by 0xc001, you change its value on all subsequent steps by 0xc001 also.

In terms of linear algebra, the value e = 0xc001 is an eigenvector of the matrix T with eigenvalue 1, i.e. T e = e. This also means that Tⁿ e = e for any n. So (Tⁿ¹ + Tⁿ²) (x + e) = (Tⁿ¹ + Tⁿ²) x + e + e = (Tⁿ¹ + Tⁿ²) x.

Consequently, if X is a solution of (Tⁿ¹ + Tⁿ²) x = K, then X + e must also be a solution.

Success

Encouraged by this, I tried one of these pairs on a collection of about 50 real files collected from the wild. It worked for all of them.

I never found out exactly what was going on with the extra bytes, or why such strange-looking XorIn/XorOut values were needed, but I had found an algorithm which seemed to be good enough, and I was satisfied with that.

Conclusion

As well as solving my immediate problem, I had also developed a general method for determining all of the Rocksoft parameters for a CRC algorithm, given a small collection of specially-chosen example files. My case was complicated by the presence of unknown data, but if the data being checked is fully known, the method requires no searching or guesswork and can be used for CRCs of any size.

So if I ever need to figure out a 64-bit CRC, I'll be able to do it without needing a Blue Gene. And now that you've read this, you will too!