RC4 explained – with C implementation

Symmetric cryptography algorithms are utilized when a single key is used to both encrypt and decrypt data. RC4 (Ronald’s Code Version 4) designed in 1987 and published in 1994 is one of such algorithms which performs relatively fast and for many years had widespread usage in cryptography protocols such as in SSL/TLS and also in WiFi security. (In the recent years it has been suggested to be deprecated or be replaced by some revised versions but still in extensive use)

Although very popular and effective,  the algorithm itself is relatively simple (but smart). In this post we step through different stages to encrypt and decrypt data using RC4.

  • RC4 is a stream cipher algorithm. The bytes are encrypted one by one and decrypted one by one.
  • The encrypted and decrypted version of data in RC4 scheme are equal in size.
  • The key is variable in size and is between 1 to 256 bytes long.
  • The algorithm strength is only based on permutations of a vector using which the overall encryption/decryption operations are performed.
  • The algorithm shows fast running in software and so is appropriate for the occasions that the encryption/decryption must be done within short time limits to avoid the users feeling delays. One of the most familiar examples would be in TLS communication.
  • The algorithm is generally resistant against cryptanalysis attacks.

Cryptanalysis

Sometimes it’s possible to decrypt some encrypted data without having the associated key. This may be possible by by analyzing the cipher data and leveraging some known characteristics of the corresponding plain text. Consider an example where we use a simple alphabetic map to encrypt plain English text. The encryption/decryption in this example works by substituting each letter with what is specified in the map for that letter. So if our map is something like this:

Plain:    a b c d e f g h i j k l m n o p q r s t u v w x y z
Cipher: u h z y m j w c f v k x n r q d s b i p l a g e t o

Consequently encrypting a sample text like “Weather is hot today” will give this result as the corresponding cipher text:
“gmuicmbficqppqyut”
At the first look it might seem hard to guess the plain text (i.e encrypt the cipher) without having the related map using which the cipher text is generated. But since in this encryption scheme an invariant character is always used to replace another specific character (for example ‘a’ is always replaced by ‘u’ and vice versa) breaking the encryption is possible by cryptanalysis attacks. The reason is that the cipher text shares common nature characteristics with the plain language. These characteristics are related for example to the frequencies of characters in English or other regularities. Ergo if character “t” includes approximately 20% of the characters in an English text then its superseded character (which is “p” here) will also include almost 20% of the characters of a cipher text generated using this encryption system. Now if the attacker knows these relative frequencies of the characters in English texts, by calculating the frequencies of the characters of the generated cipher texts, he can correlate each cipher character to its plain version. The longer the cipher text is, the more accurate the attack can be performed. In the figure below you can see the relative frequencies of all characters in English[1]:

Having this, if the attacker sees for example the character “y” includes 12% of the cipher text he can infer this character is mapped to “e”. Continuing this strategy he can build the complete map and thus break the encryption.

This kind of encryption is usually simple to break since it reflects the same regularities of the original alphabet. But we see that this problem doesn’t exist for RC4 algorithm since a same byte maps to different values based on the permutation of S vector.

RC4 algorithm

A 256-byte state vector is used to hold the permutation of all 8-bit numbers (0 to 255). The vector is initialized to a permutation based on the key which can be 1 to 255 bytes wide. To calculate this initial permutation we use a temporary vector named “T”. T is also 256 bytes wide and contains a repeated sequence of key bytes. for example if key is “1234” then T would be “123412341234…..”. The key and also T are only used for the initialization phase and the encryption/decryption process is only based on S. This process includes XORing a value selected from S with the next byte of input; generating the next cipher/plain byte. As you notice, the process generates output, by reading one byte at a time, calculating one byte as the result. This is why we call these algorithms stream oriented as opposed to block cipher algorithms which work on blocks of data not single bytes.

The initial permutation is done using pseudo algorithm below:

 

 for i = 0 to 255
   S[i] = i;
   T[i] = Key[i mode KeyLength]

 temp = 0
 for i = 0 to 255
   temp = (temp + S[i] + T[i]) mod 256
   swap(S[i],S[temp])

After that we need neither T nor Key. Stream generation is done using this pseudo algorithm:

 i=0
 j=0
 while (true)
   i = (i + 1) mod 256 
   j = (j + S[i]) mod 256
   swap(S[i],S[j])
   t = (S[i] + S[j]) mod 256
   val = S[t]

val  is the output on each iteration and it is XORed against the input byte to produce the next output byte. So to encrypt, the next input byte which is considered plain data is XORed with val to produce the next cipher byte and to decrypt, the next input byte is XORed with val to produce the next plain byte. (If number A is XORed with number B twice, the result is A, i.e (A xor B) xor B = A)

To be able to produce the plain data from the encrypted data you need to know with what values the encrypted bytes should be XORed. Having the correct key generates those values so provides the algorithm with correct series of S permutations and correct sequence of bytes to XOR with input hence making it possible to reproduce the plain data. You can see stream generation involves a swap operation with generating each new byte and also the selected value (the val at the last line) does not depend on the input byte. That means a same input byte (consider character ‘A’ exists in different location of plain data to be encrypted) is XORed with different value each time. This is different than the example encryption scenario which we introduced at the beginning of the article which used a static mapping between plain and cipher text making the cryptanalysis attacks possible.

I implemented RC4 as a demo program which generates a random key using Linux’s “/dev/urandom” (so can be run on Linux only) at the beginning of execution and then encrypts a message using the proposed RC4 algorithm with that key, shows the encrypted message in HEX format then decrypts the data with the same key producing the original plain message. The algorithm surely is not limited to ASCII data. The input can be anything but for the purpose of demonstration I used a an English phrase as input in this program. By a little modification you can encrypt any file using this same code. You can see a sample output of running the program below:

> Generated sample key: AJAOAALGAA
> Secret text: Find the treasure 3 meters away from that tall tree where the kid passes by!
> Encryted message: 17144C9CC8C66D4AEEFFADABEEB7342E352C6E47BB320DF3BFF596D8C69AFF731760B9B8289660A9766A63762276636E6E227670676722756A67706722766A6722696B662272637171677122607B23
> Decryted RC4 data with the same key: Find the treasure at 3 meters away from that tall tree where the kid passes by!

The code is available for download from here

I have also created a static PHP page using the same algorithm to perform RC4 encryption/decryption for your test cases. You can visit this page at RC4 Encryptor Page.

 

________________________________________________________________________________________

[1] The figure is taken from “William Stallings, Cryptography and Network Security

One thought on “RC4 explained – with C implementation

Leave a Reply

Your email address will not be published. Required fields are marked *