[File; TRANSGID.TXT Revision date; April 23, 1990] A SHORT GUIDE TO NETWORKING AND FILE TRANSMISSION Erich Neuwirth Institute of Statistics and Computer Science University of Vienna Austria (A4422DAB@AWIUNI11.BITNET) GENERAL PRINCIPLES OF SENDING FILES IN ELECTRONIC NETWORKS Networking is mainly used in 2 ways: Electronic mail Sending (binary) files This paper tries to explain what some of the differences are and how one of the two transmission methods sometimes can be (mis)used for tasks which seem to belong to the other method. Electronic Mail Electronic mail means you are sending text from one computer site to another site. Letters of text are coded as numbers internally within computers. Problems arise from the fact that the same letter may be represented by different numbers on different computer systems and vice versa the same number may yield a different letter on different computer systems. Mostly we are concerned with two such representation systems for letters by numbers. ASCII (which is used on IBM-compatible PCs and on most non-IBM mainframe computers) EBCDIC (which is used on IBM (and compatible) mainframe computers) When you are sending text from one computer to another computer the computers "think" they only are sending numbers. People reading or writing text, on the other hand, expect characters, so some interpretation of the numbers producing the text must take place. Simply transferring the text file as a sequence of numbers (which is what it looks like to the computers involved) would result in an unreadable file on the receiving computer system. Therefore when using computers with different character representation systems the transmission usually involves a "translation process" which has the net effect of yielding a different "sequence of numbers" (= file) on the receiving machine, but this file usually gives the same letters when read as a text file. Usually these translation processes work quite well for letters (lowercase and uppercase) and digits. Quite often you will encounter problems with special characters like parentheses, brackets, tildes, carets and so on. If you are interested in merely transferring texts this is not much of a problem, because even if some special characters get scrambled it is usually not too hard to reconstruct the original text by normal editing. If you are setting up a new communications link it is a good idea to send a file containing all printable characters with descriptions and to test if they arrive at the other end as they should. At the end of this paper you will find an example of how such a test file could look. Of course such a file should be sent from both ends of the line because the scrambling process in many cases is asymmetrical, so different transpositions happen in the two different communication directions. Closely inspecting the file you receive will show you which characters are changed during the transmission process. Now three different events can happen: 1) You receive all the characters as they should be: Action: Don't worry, be happy 2) Some characters are not what they should be, but different characters still are different (even when not identical with their original) Action: Do worry, but not too much. In this case you can use the FIND and REPLACE function of your text editing program to restore the original meaning of the file. You even could program a macro in your text editor (if you don't know what that means just ignore this sentence) which automatically performs the "retranslation" process. 3) Some characters are scrambled and different characters in the source text file come out as identical characters at the receiving end. Action: Do worry, because this is the worst possible situation. It is not possible to construct an automatic "retranslation" process. As long as you are only concerned with text you will not have too many problems, because letters, digits, commas and periods usually are not scrambled when sent between different computer systems. If these characters also are scrambled the transmission process does not deserve the name "communication process" any more and you should talk to the technical people in charge of the transmission channel to take care of these problems. Things become more difficult when you want to send data files or program source files. Files of this kind usually contain special characters like parentheses and to reconstruct the original text of the file you usually have to edit the file you received by hand and to infer from the context the original meaning of a recognizably incorrect character. The automatic file transfer usually takes place between mainframe computers. So the most simple situation with text file transfer is that you use the editor on your mainframe computer to create your text and then you use the mailing program on the mainframe to send the text file (sometimes called e-mail or note) to its destination. At the destination site the receiver then can receive the file and read it with the help of the text editor program on the receiving mainframe computer. Sometimes the situation is more difficult. The file you want to send may exist on your PC, but not yet on the mainframe which is your entrance to the international computer networks. There is an important detail you have to take care of here. Usually you can write texts on a PC using two different kinds of programs to write with: Text editor programs or word processing programs Text files produced by text editing programs usually give no problems when you try to send them over a network. With most word processor files you will experience difficulties. But most word processing programs have a special way of saving your text as a "plain ASCII file". Remember to save your texts with this option if you intend to send them over networks. And if you are still considering which word processing program you should select for your personal use, only select a program which offers this option. If you do not know yourself how to verify the existence of such an option ask somebody more experienced than you to help you to find out. Now you have to find a way to transfer the file from your PC to your mainframe computer. For this purpose you need a file transfer program on the PC and on the mainframe. Different varieties of programs of this kind exist, but the prevalent program in an academic environment at the moment is KERMIT. To use KERMIT to transfer files you need the version of KERMIT for your PC and an installed version of KERMIT on the mainframe. The mainframe KERMIT is not your responsibility, you just have to find out from the staff of your computing center if they already have installed this program. If they have not done so yet you should tell them to do so because KERMIT is one of the very few hardware independent standards and it should be supported. Additionally, all KERMIT versions are in the Public Domain, so they do NOT COST MONEY. Your local computing center also should help you to find the version of KERMIT you need for your PC. KERMIT is a program used for 2 purposes; namely for using your PC as a terminal to your mainframe computer and for transferring files between these two systems. Now things start to be complicated (even more complicated? I hear you complain!). In this paper we will not deal with using KERMIT as a terminal emulator. There are many ways to do this and it mainly depends on which kind of mainframe you are using. You should try to get some help from the people from you local computing center who can show you exactly how to use KERMIT for this purpose. An additional remark: If you only want to use KERMIT as a "terminal emulator", which means using your PC as a terminal, you do not need KERMIT on the mainframe computer you are connecting to. The mainframe version is only needed for file transfer between the mainframe and your PC. Now things become really complicated! The PC KERMIT has only one way of transferring files. But the mainframe version usually has two ways (called "modes" by computer scientists). One way is text mode, the other way is binary mode. Text mode is used to transfer text files. E-mail consists of text files so it is this mode you need for downloading e- mail from your mainframe to your PC. Usually you need not care too much because practically all mainframe versions of KERMIT use text mode for file transfer if not told otherwise explicitly. So simply transferring a text file from your PC to the PC of somebody else you want to send it to can be done using the following steps: 1) Upload the text file from your PC to your mainframe with KERMIT in text mode 2) Use the mail facilities of your mainframe to send the text file as mail to the intended receiver 3) The receiver finally has to download this mail file (it still is text) with KERMIT in text mode to his/her PC In most cases the received file is identical with the original file. Letters and digits arrive as they should. The idea behind text mode of KERMIT is that the meaning of characters is preserved, so when transferring in text mode KERMIT automatically adjusts for different systems of character representations on the mainframe and on the PC. You might find that some of the special characters do not arrive as they should, but this usually is no problem when the text is only intended for reading and not as input to some computer program. Later we will see what you can do if you have to send a text file containing special characters and want to make sure that these characters arrive unchanged. TRANSFERRING NON-TEXT FILES It is becoming even more difficult in this section, but if you want to send programs and data files usable on other machines it is important that you understand this section. Networks can also be used to send PC programs over the network. If you want to send a program to somebody with the same kind of PC you have, the basic procedure is very much like the procedure for transferring text files from your PC via the network to somebody else's PC. The steps involved are: Uploading to a mainframe Using the sending facilities of the network Downloading from the target mainframe to the target PC The difficulties arising with program files are that programs contain more different symbols than text files. They especially contain lots of so called "nonprintable" characters. You can see this if you try to look at your program file with a text editor program or a word processing program. The simplest solution to transferring program files and like things (called binary files in computer terminology) is to use the binary transfer mode of your mainframe KERMIT to upload the program to your mainframe. Binary mode means that no translation whatsoever takes place while sending the file (remember, sending text files often involves a translation process). Now you can use the facilities of your mainframe for sending files over the network. Sending a file is not the same as sending a text as mail. Mailing implies that your text is put into the electronic equivalent of an envelope. Sending a files does not add the envelope, so the file being sent is (almost) identical with what you have on your PC. The receiver then can download the file to his/her PC also using the binary transfer mode of his/her mainframe KERMIT and the PC version of KERMIT. This file transfer quite often does not work. Some reasons may be: the two mainframes involved come from different manufacturers, some intermediate mainframe makes problems or the file is passing through different networks. One situation where it makes sense to try this way of sending binaries is when both mainframes are members of the EARN, BITNET or NETNORTH networks. It usually does not work when the mainframes belong to different networks like EARN and JANET. Now what can we do when we want to send a program or a data file from an EARN site to a JANET site? The main idea is translating your binary file (the one you cannot read because it contains nonprintable characters) into a file consisting only of printable characters. The most popular scheme for doing such a translation is the UUENCODE/UUDECODE process. It implies 2 programs, one usually called UUENCODE and the other one UUDECODE. UUENCODE takes a binary file and converts it into a file consisting only of printable characters. UUDECODE reverses this process and restores the original binary files from the encoded file. So what do you need these programs for? You UUENCODE the binary file and upload it to your mainframe (using the text mode of your mainframe KERMIT). Since it consists of printable characters only, you can incorporate it into a mail file you send. This mail file hopefully arrives at its destination and the receiver can download the mail from his/her mainframe to the local PC. Then it is mandatory to remove the "electronic envelope" from the mail file. An appendix will describe how an UUENCODEd file looks and how to recognize the parts forming the "envelope". Then the UUDECODE program can be used to translate the UUENCODEd version of the file back into its binary version. If you want to use this process you have to get hold of a copy of the UUENCODE and UUDECODE program. It is not possible (at least not in an easy way) to send this programs over networks if you have no experience with encoding and decoding binary files. These programs are binary files themselves and we cannot send unencoded binary files. So we would need the binary files already to translate the encoded versions into the binary version. It is a "who is first, the hen or the egg" kind of situation. There are ways of solving these problems, but the solutions involve a nontrivial amount of technical knowledge and also depend very much on the circumstances of the PCs and mainframes involved. (For the more technically inclined: we could send the source files of the translation programs as text files, but then we have to be sure that the recipient has a compiler for the programming language we are using.) So quite often the easiest way of setting up an environment where file transfer is possible involves sending a disk with the UUNCODE/UUDECODE programs to the sites involved. Once the programs are available file transfer can start. Now let us look what an UUENCODED file looks like: ------- the file starts directly below this line ------------ begin 644 erich.com MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@ M=&AE('1E Subject: File transfer demonstration To: The catcher in the rye begin 644 erich.com MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@ M=&AE('1E greater ? question mark @ at-sign uppercase letters ABCDEFGHIJKLMNOPQRSTUVWXYZ [ left bracket \ backslash ] right bracket ^ caret _ underscore ` left single quote lowercase letters abcdefghijklmnopqrstuvwxyz { left curly brace : vertical bar } right curly bracket ~ tilde ASCII 127 is nonprintable APPENDIX B: TECHNICAL DETAILS OF ENCODING AND DECODING The rest of the paper is very technical, so you should read it only if you have some knowledge of the mathematics underlying the functioning of computers. How do UUECODE and UUDECODE work? For UUENCODing, the bytes forming the file are grouped in groups of three. Every byte is an 8-bit binary number, so every group of three bytes is a 24-bit binary number. This number then is split into four groups of 6 bits each, i.e. into 4 6-bit binary numbers. The 6-bit binary numbers give all decimal numbers from 0 to 63. To every such 6-bit number 32 (decimal) is added, giving numbers in the range from 32 to 95. Every number then is replaced by the ASCII character associated with this value. (32 becomes (a blank), 33 becomes !,... 95 becomes _ (an underscore)). So the translation process converts each group of 3 bytes into 4 printable characters. Additionally every group of 45 bytes (giving 60 characters) is grouped into a line in the file to be sent. Then a leading character is added to this line. The leading character is calculated by using the encoding scheme we just discussed onto the number of bytes represented by the line. (45+32=77, so for a line representing 45 bytes the leading character is M (M is ASCII character 77)). Usually the last line is shorter and therefore the leading character of the last line also is different from M. Finally a first line containing "begin", a 3 digit number (giving access privileges on UNIX systems and meaningless on other systems) and the name of the original file and a last line containing the word "end" is added. The decoding program then mainly has to convert each group of 4 characters back into a group of three bytes (using the byte count given by the first character of each line for consistency checks). There are some problems with this scheme. We already discussed the possibility of special characters being scrambled. Additionally some "smart" mailing programs assume that trailing blanks always are unnecessary. Therefore they strip trailing blanks from every mail file. If it is only text you want to read you will not notice the difference. But an UUDECODing program will find out that the lines are too short (the first character of the line gives information about the line length!). There are different solutions for this problem. 1) Replace blanks by ` (the single opening quote having ASCII value 32+64=96) 2) Add an additional nonblank character at the end of each line 3) Make the decoding program smart enough to produce the missing blanks by itself. All the solutions are nonstandardized, so if you have some troubles when decoding you have to analyze them carefully. Solution number 2 usually works better than the two other solutions. So you should try to get an encoding program adding that additional character. Using an editor also makes it possible to transform the different "extended" formats of UUENCODEd files into one another. How do XXENCODE and XXDECODE work? XXENCODE uses the same splitting technique as the UU scheme (3 bytes into 4 6-digit binary numbers). Then every such number is converted into a character according to the following sequence: +-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz So (decimal) 0 becomes +, 1 becomes -, (number) 2 becomes (character) 0, .... 63 becomes z. The mechanism for adding byte counts to lines is identical to the UU scheme with the difference the the numbers again are coded according to the above sequence of letter, digits, + and -. So it even is possible to convert UUENCODEd files into XXENCODEd files using the replace feature of a text editor. ACKNOWLEDGEMENTS The author wishes to thank Ted Werntz whose comments and suggestions helped enourmously to improve the paper.