This is archived content, mostly untouched since 2003. For newer content/updated versions, see netfuture.ch.

  W and AWaldvogel & Aschwanden
   Roman Pletka
   Nicola Aschwanden
   Lars Waldvogel
    Kinderlieder+Spiele
   Marcel Waldvogel
    Contact
    Publications
    Research
    Tutorials
     CRC
      Background
      Description
    Classes
    Software
    Archive
    Fun

  

CRC Tutorial: Other Forms of Checking

[ Tutorial/Introduction | Cyclic Redundancy Check ]

Background: Asynchronous Serial Communications (RS-232C)

SB0B1B2B3B4 B5B6 PS
1 0 0 1 1 0 0 1 0 1 1 ?

Serial Line Signal: empty line, start bit, 7 data bits, parity, stop bit, (empty line)

One of the oldest forms of computer-based error checking is the use of parity bits, predominantly used in asynchronous serial line communications, such as the RS-232 connection used between computer and modem. For RS-232 and some other serial asynchronous protocols, each symbol (cf. upper half of the image) is preceded by a start bit (S) indicating the beginning of symbol, a series of data bits (0-6, least-significant bit first; i.e., the bit sequence shown is actually the seven-digit binary (base-2) number <01001102), optional parity (P) to provide limited transmission verification, and 1-2 stop bits (S) concluding the symbol. The value of the start bit (cf. bottom half of the image) is the opposite of the "idle" bit which is visible on the line if no symbol is currently transmitted. The transition from "idle" to "start" raises the attention of the receiver on the incoming symbol. The stop bit(s) have the same value as the "idle" signal. Their purpose is to guarantee a minimum idle period, such that the next start bit is guaranteed to have a recognizable 0-1 transition. To further facilitate this recognition, 1.5 start bits were popular in earlier times, as half bit-lengths could not occur within a symbol and therefore it must be a stop bit. The start and stop bits are also referred to as framing.

Parity Bits

XOR of data bits leads to even parity

XOR of data bits leads to even parity

Parity comes in two flavors, even and odd. Even parity is defined that the sum of all transmitted bits in a symbol (including the parity bit) is even; odd parity is defined correspondingly. That is, even parity is the exclusive-or of all the data bits (see image on the right), while odd parity is its inverse. Consider the bit sequence 10101102. It already contains an even number of ones (four, to be precise), thus even parity would be 0 (otherwise, it would make the entire symbol contain an odd number of ones); vice versa for odd parity.1

Error detection capability: Adding a single parity bit to a code of length b bits makes a code of length b+1 bits. The number of valid codewords remains at 2b, even though the number of possible code words has doubled to 2b+1. In other words, only half of the possible codewords are still valid: we have added 1 bit of entropy. This implies that if a random codeword is detected, it will pass the parity test with a 50% chance. If ten random codewords have been received, the probability is only (1/2)10=1/1024 that all pass the test and none raises the "parity mismatch" flag.

But the main property is that the receiver can reliably detect if a single bit has been flipped during transmission of a codeword (even though it will not be able to find out which bit it was): If we flip any of the bits (data or parity3), the number of bits that are set to one increases or decreases by one. As a result, the number will change between even and odd, failing the parity check. Note that it does not matter whether we change a data or the parity bit, the effect will be the same.

This property is achieved because the 50% of the codewords that are illegal are interspersed into the legal code words such that any single-bit change in a legal code word will result in an illegal codeword and vice versa. As a consequence, any two legal codewords are at least two modifications away, giving this code a Hamming distance (least number of modifications between any two codewords) of two.

Typical error modes: Typical errors that occur in environments are misrecognition of data bits or line noise as start bits, line noise interfering with data bits, and bit-rate (bits per second, in such "ASYNC" environments often also known as "baud-rate") mismatch between sender and receiver. Parity bits are helpful in these scenarios, even though they do not catch every problem. When reliable communication is required, such when transferring program files, additional mechanisms need to be added to increase the reliability, such as checksums or CRCs.

Checksums

While parity is adequate to protect a single character or byte, its detection capabilities lack in power when applied to larger messages: Messages often span thousands of bits, and if just two bits are flipped, the corruption will not be detectable. And the chance of having multiple bit errors within a message increases exponentially with the message length.4

Therefore, there is a need for something more powerful. Parity is already a simple form of a checksum (it sums up all the bits in the data, modulo 2). The natural extension is to create a more powerful checksum. Typical versions are summing up the bytes (or 2-/4-byte words) in the message into a checkbyte (or 2-/4-byte checksums). This is again done modulo 256 (or 216 or 232, respectively), which can essentially be implemented by ignoring carry and overflow conditions due to the addition.

Error detection capability: Even though the checksum is much larger than the parity bit and it generally detects many more errors than parity, it is not able to detect all 2-bit modifications to the message: If it clears a bit at position i in one word, but changes the ith bit to one in another word, the sum of these two words and thus the total checksum remains the same. But if two bits at different positions in the same word (or two different words) are flipped, this error will be detected.

As a consequence of the undetected 2-bit modification above, we can also exchange entire words, especially swap adjacent words, and the result will pass the checksum test. (In effect, both the same-bit-swapping and the word-swapping are due to the commutative nature of the addition and will also apply to any other check method with commutative properties.)

Intermission: Classes of errors: For comparison of error detection/correction capabilities, two properties are prevalent: the number of random errors (how many one-bit errors the code is guaranteed to handle, independent of where in the message the errors are introduced) and the size of the largest burst error (in a burst, any non-zero number of distinct bits can be changed, but they have to be closer together than the burst size).

Checksums that are b bits wide are only guaranteed to detect a single random bit error (recall that two well-placed errors can go undetected) and a burst of size b: As long as the error changes only bits in a single word, it will be detected, as it cannot cancel itself out. The same still applies when the boundaries of the burst are no longer limited to coincide with the word boundaries, resulting in an upper part of one number and a lower part of an adjoining number being modified. As long as the two parts do not overlap, no subset of the bits to be modified can cancel each other out. A burst of size b+1 could go undetected, if only its border bits would be flipped (this again is the undetectable 2-bit modification).

The Internet protocol suite (such as IP, TCP, and UDP) use a variant of this basic checksum, the one's complement sum, which wraps the carry from the most significant digit around to the least significant digit. The influence on the detection capabilities is marginal, and the weaknesses of this checksum are widely discusses; for example, a paper by Stone and Partridge provides a good experimental analysis.

Check Digits

While reordering of words may be uncommon in data communication networks, it is quite common for humans to swap digits in large numbers consisting of seemingly random digits, such as those used in the ISBN system or UPC/EAN product codes. Therefore, a checksum is not suitable for these numbers. A common algorithm is to multiply each digit with a different number before it is added to the sum. In addition to digit substitutions, this helps detect many swapped digits, as the position now becomes significant as well.

Using ISBN as an example, we find the following definition: The number consists of 9 decimal digits, followed by the check digit which is decimal or X, a digit representing 10. Each of these digits is weighted: The first has weight 10, the second, 9, and so on until the last data digit with weight 2 and the check digit with weight 1. The check digit is chosen such that the weighted sum is zero (modulo 11).

ISBN check digits

ISBN Check Digit Rule

What is the check digit of 3-423-10735-_ (the dashes in the sequence are for grouping only)? We have to ensure that 3*10 + 4*9 + 2*8 + 3*7 + 1*6 + 0*5 + 7*4 + 3*3 + 5*2 + c*1 = 0 (mod 11). Simplified, this is 156 + c = 0 (mod 11). As 156 mod 11 = 2, we get c = 9 and thus an ISBN of 3-423-10735-9.

This behavior is much closer to CRCs than you probably would expect. If you understand this, you are a good step into understanding CRCs.

ISBN typo detection

Adding 1 to the digit weighted 7 adds ((1*7) modulo 11)=7 to the checksum

Error detection capability: As 11 is relatively prime to each of the weights used, the check digit will protect against any single-digit modification. A single check digit is unable to detect all two-digit modifications, the simplest example being the adjustment of the check digit after a modification of a data digit.5

ISBN swap detection

Swapping adjacent digits differing by 2 adds or subtracts 2 to the checksum, depending on which digit was larger

It is obvious that the check digit protects against at least some digit swaps, but does it protect against all? The answer lies in the solution to e*d - f*d = 0 (mod 11), where e and f are the distinct weights of the positions being swapped and d being the (non-zero) difference between the digits. When changing the formula to (e-f)*d = 0 (mod 11), it becomes apparent that with e-f and d both being nonzero and relatively prime to 11, we cannot find a solution fulfilling the formula, therefore any swapping of digits will be detected.

Error Correction Codes (ECC)

In addition to the error detection codes discussed so far, there also exist error-correcting codes. For a particular class of errors, these not only raise the flag that an error has occurred, they are able to indicate where the bad bit(s) are. This tutorial does not deal with ECCs, as they are generally even more complex than CRCs and frequently are based on Reed-Solomon codes. However, the interested reader may find single-bit-correcting Hamming codes to be relatively easy to understand and fascinating piece of information, which you might plan to casually introduce in your next geeky small talk.

Footnotes

1 The number of data bits, the type of parity, and the number of stop bits are often encoded in a form [Data bits][Parity type][Stop bits], where [Data bits] is typically between 5 and 8, inclusive; [Parity type] is one of E (even), O (odd), N (none: no parity bit is transmitted), M (mark: send a one), S (space: send a zero); [Stop bits] is one of 1, 1.5, or 2. In the age of keyboard/text terminal communication, 7E1 (7 data bits, an even parity bit, 1 stop bit) was common2. Now that a lot of binary data (or national characters) are transmitted, 8N1 (8 data bits, no parity bit, 1 stop bit) is predominant.

2 One of the error conditions discussed is the possibility of a line noise in "idle" state, which may be misinterpreted as a start bit. To avoid accepting the following idle bits as a character, parity can also be used. When the number of data bits is odd, the parity should be chosen odd as well: the character will have an odd number of ones, followed by a parity bit which is set to one, resulting in an even number of ones and so failing the parity test. For even numbers of data bits, this property is achieved by choosing odd parity. (Correct parity selection when idle is zero and the start bit is one is left to the reader.) Why then was 7E1 much more popular than 7O1? I do not know the answer, but I imagine that the lack of ambiguity in the notation (O (Oh) vs. 0 (zero)) and the saving of an inverter in the circuitry may have contributed to the choice.

3 Only the data bits are protected by the parity, the framing information is not. But flipping any of the framing bits generally results in a framing error. It is important to note that any error detection mechanism also needs to be able to handle modifications of the parity bit.

4 For analysis of error behavior, it is often assumed that bit errors are distributed using an independent identical random process. When the chance of any single bit being flipped is p, the chance of a message of size m not being corrupted is (1-p)m, an exponentially decreasing function in the message size. The probability of the message surviving with at most one bit error is p1=(1-p)m-1, resulting in a probability of 1-p1 for getting more than the one error which parity can detect.

5 Most modifications to one data digit can be compensated with a modification to another data digit, it is not necessary to revert to updating the check digit. It is only "most", not "all", as only the check digit allows for all necessary 11 values, the data digits lack the X.


Copyright 2004 Marcel Waldvogel.