A portable implementation of a modified RC5 encryption method
      =============================================================

			  Reinhard Wobst
		       wobst@ifw-dresden.de
			   Mar 25, 1997


1. General
2. The algorithm RC5
3. Cryptanalysis of RC5
4. The modification RC5a
5. Usage
6. Practical results
7. Implementation, porting to new machines, user's modifications


1. General
----------

In  Summer  1995,  Ron  Rivest  (MIT  Laboratory  for  Computer Science)
proposed a new  encryption algorithm  based  on strongly  data-dependend
rotations. There  is a  pending  patent, but  his intention is  that the
non-commercial  use would be  free whereas the  commercial use incurs  a
fixed,  one-time licensing  fee to  support  continued  research  at RSA
Laboratories.

RC5  is  interesting  because  of  its  simplicity,  its  speed  and its
variability. It has a hardware-friendly design.

You find here  a ready-to-use  implementation of RC5 as well  as of RC5a
which is a superset of RC5. This program has been tested under many UNIX
systems and should be portable to other OS as well.

COPYRIGHT

You are free  to apply,  copy and modify this program  provided that the
copyright  mark in  the  header is  kept  and the  changes  are  clearly
documented.  Users  should be  aware  of possible  juristic consequences
because of patent rights and export restrictions within the USA.


2. The algorithm RC5
--------------------

A detailed description of RC5 can be found in:

ftp.rsa.com/pub/rc5/rc5.ps (or send mail to rc5-paper@rsa.com)
ftp.cert.dfn.de/pub/virus/texts/crypto/rc5.zip

Therefore, the procedure will only be sketched here.

RC5 encrypts word pairs. A word is the canonic data unit of the machine;
in our  case, a 32  bit word. This  pair (A,B) is  encrypted in r rounds
with the help of a key S[0...2*r+1] (i.e., 2*r+2 words) as follows:

A = A + S[0];
B = B + S[1];

for i=1 to r do
	A =  ((A ^ B) <<< B) + S[2*i];
	B =  ((B ^ A) <<< A) + S[2*i+1];

Here "^" denotes  the bitwise  exclusive OR (XOR),  and "X  <<< Y" means
that X is rotated left by Y bits. (In our case, only the 5 least bits of
Y are significant).

The key S[] is produced from a  "real" key K consisting  of b bytes in a
rather arbitrary  manner, again with  usage of data-dependend rotations.
The  RC5 algorithm using  32  bit words, r  rounds and b-byte  keys K is
denoted by RC5(32,r,b). Rivest recommends RC5(32,12,16), for 64 bit
processors RC5(64,16,16).


3. Cryptanalysis of RC5
-----------------------

There are two papers about cryptanalysis of RC5. In [1], Kaliski and Yin
show that differential attacks against 12-round-RC5 are impractical, and
linear  cryptanalysis will  probably be  still less  efficient.  In [2],
Knudsen  finds  "differentially  weak  keys",  but  this  is  neither  a
practical attack against the algorithm.

All attacks try to find probable paths where at least  in some rounds no
rotations take  place. So I  generated 100 millions of random word pairs
(each  word  with 32 bits)  and for each  pair, all 64  word  pairs with
exactly 1 bit toggled. Among these 6.4 billion comparisons were

34732	cases where rotations remained unchanged until the 7th round,
1915	cases where rotations remained unchanged until the 8th round,
104	cases where rotations remained unchanged until the 9th round,
15	cases where rotations remained unchanged until the 10th round.

At the first glance, there  were no obvious properties of the used keys,
plaintext pairs, rotations amounts for each round, and bit positions.

Based on these results, I investigated the following modification.


4. The modification RC5a
------------------------

My modification generates a KEYBOXSZ times longer key  field by the same
method. This field is considered as a collection of 2*r+2 "key boxes" of
KEYBOXSZ  words  each.  The  difference  to  RC5  is merely  that  in  a
data-dependend manner, in the n-th half round we choose one key from the
n-th keybox. In  the underlying  implementation, KEYBOXSZ  equals to 16.
The algorithm becomes the form

A = A + S[B>>KEYBOXSHIFT];
B = B + S[KEYBOXSZ + (A>>KEYBOXSHIFT)];

for i=1 to r do
	A =  ((A ^ B) <<< B) + S[2*i*KEYBOXSZ + (B>>KEYBOXSHIFT)];
	B =  ((B ^ A) <<< A) + S[(2*i+1)*KEYBOXSZ + (A>>KEYBOXSHIFT)];

Here KEYBOXSHIFT  is 32-ld(KEYBOXSZ) =  28, i.e. if  KEYBOXSZ = 2^n then
the  upper  n bits are  used to determine  the key within  the  key box,
whereas   the  lower  5   bits  determine   rotation  amounts.  Computer
experiments showed that  the mixing is  better than for KEYBOXSHIFT = 5,
at any rate much better than for the original  algorithm. In detail, for
n=4 (KEYBOXSZ=16) and 100 million word pairs, the above table looks like

5179	cases with unchanged rotations after 7 rounds,
220	cases with unchanged rotations after 8 rounds,
11	cases with unchanged rotations after 9 rounds,
0	cases with unchanged rotations after 10 rounds.

The  algorithm  that generates  the  key field  is  strongly  mixing and
different from  the RC5  round  operation. So  we can  say  that by  the
data-dependend key  selection  from  the  keyboxes, the  key  generating
algorithm is mixed with RC5 quite strongly now. The number of bits which
change when one bit is toggled has approximately a gaussian distribution
with expectation value 0.5 (i.e., in  the mean 50% of bits change if one
bit  is  changed). And  the  mathematical  approach  seems  to  be  more
difficult  now.  As  much  as  I  understand,  differential  and  linear
cryptanalysis should not  give better results for the new algorithm than
for RC5. Even  a known plaintext attack of  RC5a(32,1,*) is  not so easy
now (cf. rc5_1_crack.txt for comparison).

The  keybox size  (or better  its dual logarithm)  could be  added as an
additional parameter  to RC5:  RC5a(w,r,l,n) (w  = word size,  r =  # of
rounds, l = keyword length, 2^n = keybox size).

For KEYBOXSIZE=1, we get the RC5 algorithm again.


5. Usage
--------

The underlying implementation of  RC5a was developped  under UNIX and is
designed  for this operating system  (but not restricted  to it!). Hints
for porting to other systems can be found in 7.

rc5a works as  an filter.  The presence of  any argument  causes rc5a to
decode rather than encode:

Encryption:   rc5a <plaintext >ciphertext
Decryption:   rc5a any_string >plaintext <ciphertext

Exception: If the argument in  the seconde case is "-v",  a copyright an
version message is shown.

The pass phrase will  be entered interactively  by default; the terminal
echo   is  switched  off  during  input.   Pass  phrases   must  contain
alphanumeric and non-alpha characters  and must be at least 6 characters
long,  maximally  255. The  pass phrase  has  to be  entered  twice when
encoding. On  decryption, there is a built-in consistency check  so that
the pass phrase has to be entered only once. You should not be irritated
by the notion  "pass phrase": It  is just the  same thing as a password,
only that it can be composed by  many words or  odd characters. A  whole
sentence is much surer than a password only.

An alternative  but more dangerous  way for entering  the password is to
place it in the environment variable CRYPTKEY.  If this variable  is set
and  has a  non-empty  contents, then  the pass phrase  request will  be
suppressed.

Pay attention that ciphertexts produced with the  old version rc5.c (not
valid now)  CANNOT  be  decoded  with the  rc5a.c! However,  ciphertexts
produced by this version of rc5a.c can be decoded by future  releases of
this program.


6. Practical results
--------------------

rc5a was compiled and tested under the following systems:

ESIX V.4.2MP		(very similar to UnixWare 2.0)
HP-UX 9.01, 9.05
OSF/1 V3.0
SunOS 4.1.3
Ultrix 4.3 0
Sinix V.42		(in essential System V.4.0)
ESIX V.4.0.4		(a pure SVR4)
FreeBSD 2.1.5

The measured speeds range:

from about 1.5MB/sec. on an Pentium 133
	(compiling option: -K pentium -K no_frame)
as well as on a 200MHz Alpha machine (Dec 3000-600),

via 600..1000 KB/sec. on HP 9000/720 and 735,

down to 240KB/sec. on an PC 486-33 (ISA, 8MB RAM) under ESIX V.4.2.

I encoded 8MB of plaintext  and measured times first from  disk to disk,
then from memory to memory.  Significant speed differences between these
two  methods could  be  observed under FreeBSD,  SunOS, Ultrix and  ESIX
V.4.2 on  a PC 486-66 (ISA, with 16 MB RAM) only (but not  on other ESIX
PC's). Under  FreeBSD (with a  200MHz  PentiumPro), the speed  rose from
about 1MB/sec. to 3MB/sec.

The speed of RC5a is practically the same as of RC5.


7. Implementation, porting to new machines, user's modifications
----------------------------------------------------------------

+++ Portability:

Byte order: The byte order is detected  by the program.  So you need not
to care about binary input and output of plain- resp. ciphertext.

When you intend  to port  rc5a to other  UNIX systems,  you should first
look  at the #define of PS_CMD in the program header. Place that command
options that produce a long, state-sensitive  output. Under System V and
OSF/1, these options  are "-elf". You can also combine several commands,
for example,

ps -elf; w; netstat -A;

Next  you MUST  provide  that  WORD  is a  unsigned  32  bit  data type.
Normally, one would choose 'unsigned  long', but because of OSF/1 on the
Alpha  machine I took  "unsigned  int' - the  other type has  64 bits on
Alpha.

Finally,  check if your  system supports the  POSIX terminal control: We
need the include  file termios.h  and the  library functions tcgetattr()
and  tcsetattr()  here. They  are used to  switch the terminal  echo off
during pass phrase entry.

All that is used exclusively in the final, machine-dependent part of the
program.

+++ Ciphering mode:

If you  translate rc5a with  -DSIMPLE_CBC, it works  in usual  CBC mode.
(Remark: In the former version, SIMPLE_CBC generated a different result;
I hope  this mode  has not been used yet).  The initialisation vector is
generated  automatically by help  of the PS_CMD  command  defined in the
program header. Otherwise,  a modified CBC  mode is used: The left resp.
right half blocks of ciphertext are rotated left resp. right by  amounts
determined  by the right  resp. left plaintext  block  (for details, cf.
function w_encode()), and summed up over all ciphertext (alternating for
the right halves). This is  a proposition only.  You need not used it if
you don't trust  it, but I think it is not worse than original CBC mode,
probably much better (for file encryption).

+++ Implementation details:

A length detection is  implemented in the  simplest way (padding): Bytes
are appended  after the plaintext end until a block boundary  is reached
(here a  block consists  of  a word  pair). The  last byte contains  the
number of bytes appended. Bytes between plain text end and the last byte
(if there are any) are arbitrary and meaningless. Note that 8 bytes will
be appended if the plaintext length is a multiple of 8.

The  so-called  ciphertext  stealing  would  be  more  elegant  but also
technically more difficult  because of the  blockwise plaintext reading.
It has less  adavantage over  the above method  since 6  word pairs  are
added  as header  anyway,  so why  should we make  efforts to  eliminate
eventually 1 word pair at the end?

The header is built up as follows: After 4 random  word pairs and before
the  checksum,  a word  pair  with  version  and release  number  of the
algorithm is  inserted.  The background is  that RC5a could  be  further
developped. But these 6 first  word pairs should  be always encoded with
the present  method. A newer algorithm can so detect the  older encoding
method and still decrypt the ciphertext. Changing to the next release so
reduces  to decrypting and  then encrypting all  files. I  think this is
necessary since cryptanalysis will never stop.

+++ Possible problem:

If you redirect the output of rc5a through 'more' or 'less':

rc5a a <ciph.txt | more		,

you  can get odd  terminal  settings after  finishing  the pipeline. The
reason is that  rc5a and 'more' both influence the terminal settings. Up
to now I could not avoid such situations.

Some more details are  explained in my  book "Abenteuer Kryptologie" (in
German; Addison-Wesley 1997).


+++ Using RC5:

Set  the macro  KEYBOX_BITS to  0  in the  header. No  other changes are
necessary since RC5a is as fast as RC5.


References
----------

[1] B.S.Kaliski, Y.L.Yin: On Differential and Linear Cryptanalysis of the
	RC5 Encryption Algorithm;
	Advances in Cryptology - CRYPTO '95 (D.Coppersmith, ed.)
	Lecture Notes in Computer Science vol.963
	Springer 1995

[2] Knudsen, L.R., Meier, W.
	Improved Differential Attacks von RC5
	Advances in Cryptology - CRYPTO '96 (D.Coppersmith, ed.)
	Springer-Verlag 1996