A portable implementation of a modified RC5 encryption method ============================================================= Reinhard Wobst wobst@ifw-dresden.de Mar 25, 1997 1. General 2. The algorithm RC5 3. Cryptanalysis of RC5 4. The modification RC5a 5. Usage 6. Practical results 7. Implementation, porting to new machines, user's modifications 1. General ---------- In Summer 1995, Ron Rivest (MIT Laboratory for Computer Science) proposed a new encryption algorithm based on strongly data-dependend rotations. There is a pending patent, but his intention is that the non-commercial use would be free whereas the commercial use incurs a fixed, one-time licensing fee to support continued research at RSA Laboratories. RC5 is interesting because of its simplicity, its speed and its variability. It has a hardware-friendly design. You find here a ready-to-use implementation of RC5 as well as of RC5a which is a superset of RC5. This program has been tested under many UNIX systems and should be portable to other OS as well. COPYRIGHT You are free to apply, copy and modify this program provided that the copyright mark in the header is kept and the changes are clearly documented. Users should be aware of possible juristic consequences because of patent rights and export restrictions within the USA. 2. The algorithm RC5 -------------------- A detailed description of RC5 can be found in: ftp.rsa.com/pub/rc5/rc5.ps (or send mail to rc5-paper@rsa.com) ftp.cert.dfn.de/pub/virus/texts/crypto/rc5.zip Therefore, the procedure will only be sketched here. RC5 encrypts word pairs. A word is the canonic data unit of the machine; in our case, a 32 bit word. This pair (A,B) is encrypted in r rounds with the help of a key S[0...2*r+1] (i.e., 2*r+2 words) as follows: A = A + S[0]; B = B + S[1]; for i=1 to r do A = ((A ^ B) <<< B) + S[2*i]; B = ((B ^ A) <<< A) + S[2*i+1]; Here "^" denotes the bitwise exclusive OR (XOR), and "X <<< Y" means that X is rotated left by Y bits. (In our case, only the 5 least bits of Y are significant). The key S[] is produced from a "real" key K consisting of b bytes in a rather arbitrary manner, again with usage of data-dependend rotations. The RC5 algorithm using 32 bit words, r rounds and b-byte keys K is denoted by RC5(32,r,b). Rivest recommends RC5(32,12,16), for 64 bit processors RC5(64,16,16). 3. Cryptanalysis of RC5 ----------------------- There are two papers about cryptanalysis of RC5. In [1], Kaliski and Yin show that differential attacks against 12-round-RC5 are impractical, and linear cryptanalysis will probably be still less efficient. In [2], Knudsen finds "differentially weak keys", but this is neither a practical attack against the algorithm. All attacks try to find probable paths where at least in some rounds no rotations take place. So I generated 100 millions of random word pairs (each word with 32 bits) and for each pair, all 64 word pairs with exactly 1 bit toggled. Among these 6.4 billion comparisons were 34732 cases where rotations remained unchanged until the 7th round, 1915 cases where rotations remained unchanged until the 8th round, 104 cases where rotations remained unchanged until the 9th round, 15 cases where rotations remained unchanged until the 10th round. At the first glance, there were no obvious properties of the used keys, plaintext pairs, rotations amounts for each round, and bit positions. Based on these results, I investigated the following modification. 4. The modification RC5a ------------------------ My modification generates a KEYBOXSZ times longer key field by the same method. This field is considered as a collection of 2*r+2 "key boxes" of KEYBOXSZ words each. The difference to RC5 is merely that in a data-dependend manner, in the n-th half round we choose one key from the n-th keybox. In the underlying implementation, KEYBOXSZ equals to 16. The algorithm becomes the form A = A + S[B>>KEYBOXSHIFT]; B = B + S[KEYBOXSZ + (A>>KEYBOXSHIFT)]; for i=1 to r do A = ((A ^ B) <<< B) + S[2*i*KEYBOXSZ + (B>>KEYBOXSHIFT)]; B = ((B ^ A) <<< A) + S[(2*i+1)*KEYBOXSZ + (A>>KEYBOXSHIFT)]; Here KEYBOXSHIFT is 32-ld(KEYBOXSZ) = 28, i.e. if KEYBOXSZ = 2^n then the upper n bits are used to determine the key within the key box, whereas the lower 5 bits determine rotation amounts. Computer experiments showed that the mixing is better than for KEYBOXSHIFT = 5, at any rate much better than for the original algorithm. In detail, for n=4 (KEYBOXSZ=16) and 100 million word pairs, the above table looks like 5179 cases with unchanged rotations after 7 rounds, 220 cases with unchanged rotations after 8 rounds, 11 cases with unchanged rotations after 9 rounds, 0 cases with unchanged rotations after 10 rounds. The algorithm that generates the key field is strongly mixing and different from the RC5 round operation. So we can say that by the data-dependend key selection from the keyboxes, the key generating algorithm is mixed with RC5 quite strongly now. The number of bits which change when one bit is toggled has approximately a gaussian distribution with expectation value 0.5 (i.e., in the mean 50% of bits change if one bit is changed). And the mathematical approach seems to be more difficult now. As much as I understand, differential and linear cryptanalysis should not give better results for the new algorithm than for RC5. Even a known plaintext attack of RC5a(32,1,*) is not so easy now (cf. rc5_1_crack.txt for comparison). The keybox size (or better its dual logarithm) could be added as an additional parameter to RC5: RC5a(w,r,l,n) (w = word size, r = # of rounds, l = keyword length, 2^n = keybox size). For KEYBOXSIZE=1, we get the RC5 algorithm again. 5. Usage -------- The underlying implementation of RC5a was developped under UNIX and is designed for this operating system (but not restricted to it!). Hints for porting to other systems can be found in 7. rc5a works as an filter. The presence of any argument causes rc5a to decode rather than encode: Encryption: rc5a ciphertext Decryption: rc5a any_string >plaintext <ciphertext Exception: If the argument in the seconde case is "-v", a copyright an version message is shown. The pass phrase will be entered interactively by default; the terminal echo is switched off during input. Pass phrases must contain alphanumeric and non-alpha characters and must be at least 6 characters long, maximally 255. The pass phrase has to be entered twice when encoding. On decryption, there is a built-in consistency check so that the pass phrase has to be entered only once. You should not be irritated by the notion "pass phrase": It is just the same thing as a password, only that it can be composed by many words or odd characters. A whole sentence is much surer than a password only. An alternative but more dangerous way for entering the password is to place it in the environment variable CRYPTKEY. If this variable is set and has a non-empty contents, then the pass phrase request will be suppressed. Pay attention that ciphertexts produced with the old version rc5.c (not valid now) CANNOT be decoded with the rc5a.c! However, ciphertexts produced by this version of rc5a.c can be decoded by future releases of this program. 6. Practical results -------------------- rc5a was compiled and tested under the following systems: ESIX V.4.2MP (very similar to UnixWare 2.0) HP-UX 9.01, 9.05 OSF/1 V3.0 SunOS 4.1.3 Ultrix 4.3 0 Sinix V.42 (in essential System V.4.0) ESIX V.4.0.4 (a pure SVR4) FreeBSD 2.1.5 The measured speeds range: from about 1.5MB/sec. on an Pentium 133 (compiling option: -K pentium -K no_frame) as well as on a 200MHz Alpha machine (Dec 3000-600), via 600..1000 KB/sec. on HP 9000/720 and 735, down to 240KB/sec. on an PC 486-33 (ISA, 8MB RAM) under ESIX V.4.2. I encoded 8MB of plaintext and measured times first from disk to disk, then from memory to memory. Significant speed differences between these two methods could be observed under FreeBSD, SunOS, Ultrix and ESIX V.4.2 on a PC 486-66 (ISA, with 16 MB RAM) only (but not on other ESIX PC's). Under FreeBSD (with a 200MHz PentiumPro), the speed rose from about 1MB/sec. to 3MB/sec. The speed of RC5a is practically the same as of RC5. 7. Implementation, porting to new machines, user's modifications ---------------------------------------------------------------- +++ Portability: Byte order: The byte order is detected by the program. So you need not to care about binary input and output of plain- resp. ciphertext. When you intend to port rc5a to other UNIX systems, you should first look at the #define of PS_CMD in the program header. Place that command options that produce a long, state-sensitive output. Under System V and OSF/1, these options are "-elf". You can also combine several commands, for example, ps -elf; w; netstat -A; Next you MUST provide that WORD is a unsigned 32 bit data type. Normally, one would choose 'unsigned long', but because of OSF/1 on the Alpha machine I took "unsigned int' - the other type has 64 bits on Alpha. Finally, check if your system supports the POSIX terminal control: We need the include file termios.h and the library functions tcgetattr() and tcsetattr() here. They are used to switch the terminal echo off during pass phrase entry. All that is used exclusively in the final, machine-dependent part of the program. +++ Ciphering mode: If you translate rc5a with -DSIMPLE_CBC, it works in usual CBC mode. (Remark: In the former version, SIMPLE_CBC generated a different result; I hope this mode has not been used yet). The initialisation vector is generated automatically by help of the PS_CMD command defined in the program header. Otherwise, a modified CBC mode is used: The left resp. right half blocks of ciphertext are rotated left resp. right by amounts determined by the right resp. left plaintext block (for details, cf. function w_encode()), and summed up over all ciphertext (alternating for the right halves). This is a proposition only. You need not used it if you don't trust it, but I think it is not worse than original CBC mode, probably much better (for file encryption). +++ Implementation details: A length detection is implemented in the simplest way (padding): Bytes are appended after the plaintext end until a block boundary is reached (here a block consists of a word pair). The last byte contains the number of bytes appended. Bytes between plain text end and the last byte (if there are any) are arbitrary and meaningless. Note that 8 bytes will be appended if the plaintext length is a multiple of 8. The so-called ciphertext stealing would be more elegant but also technically more difficult because of the blockwise plaintext reading. It has less adavantage over the above method since 6 word pairs are added as header anyway, so why should we make efforts to eliminate eventually 1 word pair at the end? The header is built up as follows: After 4 random word pairs and before the checksum, a word pair with version and release number of the algorithm is inserted. The background is that RC5a could be further developped. But these 6 first word pairs should be always encoded with the present method. A newer algorithm can so detect the older encoding method and still decrypt the ciphertext. Changing to the next release so reduces to decrypting and then encrypting all files. I think this is necessary since cryptanalysis will never stop. +++ Possible problem: If you redirect the output of rc5a through 'more' or 'less': rc5a a <ciph.txt | more , you can get odd terminal settings after finishing the pipeline. The reason is that rc5a and 'more' both influence the terminal settings. Up to now I could not avoid such situations. Some more details are explained in my book "Abenteuer Kryptologie" (in German; Addison-Wesley 1997). +++ Using RC5: Set the macro KEYBOX_BITS to 0 in the header. No other changes are necessary since RC5a is as fast as RC5. References ---------- [1] B.S.Kaliski, Y.L.Yin: On Differential and Linear Cryptanalysis of the RC5 Encryption Algorithm; Advances in Cryptology - CRYPTO '95 (D.Coppersmith, ed.) Lecture Notes in Computer Science vol.963 Springer 1995 [2] Knudsen, L.R., Meier, W. Improved Differential Attacks von RC5 Advances in Cryptology - CRYPTO '96 (D.Coppersmith, ed.) Springer-Verlag 1996