Articles Archive
Articles Search
Director Wiki
 

Uuencoding/Uudecoding with Director

December 12, 2000
by Krister Olsson

Over the years, developers have created increasingly complex network applications with Director. With the introduction of the postNetText() command, applications were suddenly free to send servers significantly more data than previously possible, without having to URL encode it. (There, however, is an unwritten limitation as to how much data can actually be sent; don't try to send more than 64k.) postNetText() opened many eyes to the possibilities of client-server development with Director.

postNetText() is fine for sending most types of text data. It is, however, a less than adequate solution for sending binary data. There are two potential problems with trying to send binary data via postNetText(): your server probably won't like it; and Director almost certainly won't like it. Whether or not your server will be able to digest binary data depends on how it is configured and what server-side programming language you are using. PHP, for example, may try to add slashes to certain characters (this is something that can affect even text data). Other systems may have a hard time digesting 8-bit data.

Director has a lot of problems dealing with binary data in general. First, the only safe way to store such data is in an array. Strings do not like to have the value zero stored in them, as it is used as an end marker (though you can store any other value from 1 to 255 in a string using numToChar). Data to be sent to a server has to be passed to postNetText() as a string. As such, the value zero cannot be part of your data or it will be truncated.

Type this code snippet into the message window and see what happens:

test="bob"
put numtochar(0) after char 1 of test
put test

After your put test command, the message window should return only the letter "b."

Example 1: Truncating a string by inserting the value zero.

There is another problem, however, with how postNetText() actually sends data. postNetText() inevitably mangles any data containing line breaks and/or carriage returns. The problem stems from the fact that postNetText() attempts to convert line formatting codes from those used on the system that is running the Director application, to those used on the system the data is being sent to. It is impossible to turn this behavior off, even through the setting of the serverOSString parameter. Even if Director passes binary data to an identical system, there is still a good chance that the data will be mangled. For example, line breaks will be stripped from any data sent from a Macintosh to a Macintosh server (as specified by serverOSString), even though both systems are identical.

There is a solution to this dilemma that doesn't involve Xtras or developing complicated schemes to squeeze round data into square holes: uuencoding the data before it is sent, and then uudecoding the data upon retrieval.

What is Uuencoding/Uudecoding?

Uuencode/uudecode is a pair of standard UNIX utilities that are often used to encode and decode files passed between systems in a network. The uuencoding process is simple: 8-bit 3-byte chunks of data are converted to 6-bit 4-byte chunks. 3-byte chunks are written as 24-bit streams. Each stream is prefixed by two bits, 00, and the bits 00 are inserted into each stream every six bits. The streams are then written back as 4-byte chunks.

1. We start with three bytes:

11111111  11111111  11111111

2. We add the prefix bits:

00111111  11111111  11111111  11

3. We finish by adding "00" every six bits:

00111111  00111111  00111111  00111111

Example 2: Converting an 8-bit 3-byte chunk to a 6-bit 4-byte chunk.

Using only six bits, each byte of data in a 4-byte chunk has a value between 0 and 63. The value 32 is added to each byte (the ASCII code for space) unless the byte has the value zero. A byte with the value zero is converted to a back-quote (`). Thus, uuencoded data uses only 64 guaranteed printable characters, which allows it to be sent over text-only connections.

If you're just sending proprietary data to a server for later retrieval, all you need to do is perform the basic encoding, write the output to a string, and then send it via postNetText(). The server-side script that receives the postNetText() data should then write it to a database, the file system, or somewhere else for later retrieval. When it comes time to retrieve your data, simply grab it (how you do this depends on how you've stored it) and run it through your uudecoder.

The great thing about uuencoding your data, however, is that it can be easily decoded on the server side if it is stored in the proper format. If you send properly formatted encoded data to a server-side script using postNetText(), the script can pipe it to a uudecoder application residing on the server, creating a file or data stream that can then be processed by any application that can understand the data. An example: a Director application encodes an image in GIF format, then uuencodes the GIF data. The formatted encoded data is sent to a server-side script using postNetText(), at which point it is piped through uudecode. The uudecoder decodes and then saves the GIF file in a location accessible via a public web address, giving anyone on the Internet access to the file.

The standard format for uuencoded data is quite simple. Encoded data is broken into lines. These lines of data are sandwiched between a "begin line" and two "end lines." The begin line consists of the string "begin <mode> <filename>". When a uuencoded file or data stream is decoded, a file named is created. This file is assigned a UNIX mode (three octal digits signifying the file's permissions) of <mode>. A mode of 666, for example, allows universal read/write access. You can read more about modes in any UNIX systems administration book.

In order to avoid corruption during transmission via e-mail or other protocols that may add line breaks and/or carriage returns, the standard format requires that encoded data be broken up into lines of 61 characters or less, including a length byte prefixing each line. A line's length byte tells a uudecoder the number of unencoded bytes of data that follow it on a line. The length byte is also converted to printable ASCII (by adding 32 and converting a zero value to a back-quote). The most common length byte value is 45 (after conversion to printable ASCII, the character "M"). A line prefixed by an "M" holds 45 unencoded bytes of data, and (45/3)*4+1, or 61 encoded characters, including the length byte.

The last line of actual encoded data in a properly formatted file or data stream will probably not be 61 characters long, unless the total unencoded byte-count of the data happens to be divisible by 45. In fact, some extra unwanted data may be encoded, as the total unencoded byte-count may not be divisible by three. After decoding uuencoded data, it is up to the uudecoder to drop any extra bytes.

After all the lines of data come the end lines. The first line is simply a back-quote character, which decodes to a length byte of zero. The string "end" sits by itself on the second end line, and often must be followed by a line break and/or carriage return.

Uuencoding by Hand

To make the process crystal clear, we can easily simulate uuencoding a file hello.txt which contains one line of text: "Hello World!"

First, write the begin line: "begin 666 hello.txt" Upon decoding the file, a universally-readable/writeable file "hello.txt" would be created to store the data.

Next, process the data. There are 12 characters in the string "Hello World!" so all the data will fit on one line. To get the length character, add 32 to 12, getting 44, which is the ASCII code for comma. Next, the string is broken into 3-character chunks. Taking the ASCII values for each character in each chunk, the chunks are converted to binary bit streams. Then the additional bits are added, and the streams are converted back to 4-byte chunks. The entire string of bytes is then converted to printable characters (again, by adding 32 and converting zeroes to back-quotes). When the process is finished, we have the following string of characters:

 2&5L;&\ / at / 5V]R;&0A

1.

"Hel" 72 101 108  
  01001000 01100101 01101100  
  00010010 00000110 00010101 00101100
  18 6 21 44
  2 & 5 L

2.

"lo" 108 111 32  
  01101100 01101111 00100000  
  00011011 00000110 00111100 00100000
  27 6 60 32
  ; & \ / at /

3.

"Wor" 87 111 114  
  01010111 01101111 01110010  
  00010101 00110110 00111101 00110010
  21 54 61 50
  5 V J R

4.

"ld!" 108 100 33  
  01101100 01100100 00100001  
  00011011 00000110 00010000 00100001
  27 6 16 33
  ; & 0 A

Example 3: Encoding "Hello World!"

By adding the length byte to the beginning of the string, prepending the begin line, and appending the two end lines we have the following:

begin 666 hello.txt
,2&5L;&\ / at / 5V]R;&0A
`
end

You can check the integrity of the encoded data yourself by copying the above text, pasting it into a file, and then running a uudecoder on the file (StuffIt Expander will do).

Coding the Algorithm: Bitwise Operators

At this point, you should have an idea of how to go about writing a uuencoder/uudecoder in Director. You will, however, need to use a couple of relatively new Director functions, so-called bitwise functions. Director 7 introduced these functions, but left them undocumented. With Director 8 they finally found their way into the Lingo Dictionary. Lingo provides bitwise functions to and, or, xor, and not 32-bit values: bitAnd, bitOr, bitXor, and bitNot. It does not provide functions to shift or rotate bits in either direction; these have to be written from scratch. This is a bit frustrating; it should be noted that most other languages, such as C, Pascal, and even ActionScript in Flash 5 have a full suite of bitwise operators. Another irritation is that there is no way to use binary or hexadecimal notation in Director. You'll have to do your thinking in either binary or hex and then convert to decimal prior to using the bitwise functions.

The bitAnd, bitOr, bitXor, and bitNot functions should be easy to understand by reading the Lingo Dictionary. You won't need bit rotation functions, but you will need to write functions to shift bits left and right. Bit shifts are used to insert and remove '00's from bit streams during encoding and decoding.

on bsr2 num
  return (num/4)
end

on bsl2 num
  return bitand(num*4,252)
end

Example 4: Sample bit shifting code.

Example 4 presents us with two routines. The first, bsr2, takes a byte and shifts it two bits to the right. The second, bsl2, takes a byte and shifts it two bits to the left. You'll notice that bit shifting is essentially multiplication and division by two. There are a couple of things to be aware of, however. First, these makeshift shift routines won't function properly if they are passed floating-point values. A byte can only hold integer values between 0 and 255. You won't need to use floating-point numbers to write your encoding or decoding routines, however, so this shouldn't be much of a concern. Second, you'll notice that the left shift routine is wrapped in a bitAnd function. This is to make sure that if we shift bits and the result is greater than 255, the value is "cropped" so we don't wind up with any unwanted data.

Now, you have everything you need to write a fully functional uuencoder/uudecoder in Director!

Optimization

So, what's next? Director isn't the best environment for coding fast data-processing algorithms, so one of the first things you'll want to do is optimize your code. Simple changes, like hardcoding frequently called functions such as bit shifts, can go a long way.

Another thing you can do to dramatically improve processing speed is to create and process strings in chunks. If you are encoding or decoding large amounts of data, Director's string-handling speed can quickly become a liability. When creating a string of encoded data, first write lines of your output to a buffer of fixed size. When the buffer is full, append its contents to a master output string and then clear the buffer. Continue the process until all the data has been encoded. Before decoding a string of encoded data, it should be broken down into sub-strings. Each sub-string should then be processed individually.

A Director 8 movie is available for download in Mac or PC format

Krister Olsson is a partner and Technical Director at Tree-Axis, a San Francisco-based multimedia design firm. He has recently completed interactive projects on the Web for MTVi and Saatchi, among others. Along with programming and web development, his interests include electronic handheld board games, like Yahtzee, and social experimentation. Krister has also written articles for various industry rags and spoken at Macromedia UCon. He received his BA in Computer Science from Swarthmore College.

Copyright 1997-2024, Director Online. Article content copyright by respective authors.