Quick Reference : Base64 and UUEncode
Base64 and uuencode allow us to send arbitrary binary data through systems that only allow plain ASCII (e.g. Email RFC 822).
Base64
It takes 3 bytes and converts them into 4 printable and humanly readable ASCII charcters. It does that by first grouping the 3 bytes into 4 groups of 6-bits each and then using an encoding table to convert the values to text. Why 6-bits and not the whole 7-bits of ASCII, well that's because Base64 only uses the printable and humanly readable characters of ASCII for encoding.
Since Base64 only used 6 bits to encode, thus the total possible encodings are 2^6 = 64, hence the name Base64 encoding.
E.g.
Suppose we need to convert the following 5 bytes 220,230,210,255,240
1. Convert to Binary
220 -> 11011100
230 -> 11100110
210 -> 11010010
255 -> 11111111
240 -> 11110000
Thus the sequence of bits looks like
1101110011100110110100101111111111110000
2. Group into groups of 6 bits each
110111
001110
011011
010010
111111
111111
0000?? <- Not right
Ahh so we run into a problem, since we are trying to encode a sequence of 5 bytes which is not a multiple of 3, we run into a problem while grouping into 6-bits. Base64 solves this problem by adding 0 byte padding to the initial sequence to make it a multiple of 3.
Thus in our case our original sequence will now look like this
220, 230, 210, 255, 240, 0
And our sequence of bits is going to look like this
110111001110011011010010111111111111000000000000
And our grouping now correctly becomes
110111 -> 55
001110 -> 14
011011 -> 27
010010 -> 18
111111 -> 63
111111 -> 63
000000 -> 0
000000 -> 0
3. Convert the values to characters
For that we need to look at the Base64 encoding table
Value | Encoding | Value | Encoding | Value | Encoding | Value | Encoding |
0 | A | 16 | Q | 32 | g | 48 | w |
1 | B | 17 | R | 33 | h | 49 | x |
2 | C | 18 | S | 34 | i | 50 | y |
3 | D | 19 | T | 35 | j | 51 | z |
4 | E | 20 | U | 36 | k | 52 | 0 |
5 | F | 21 | V | 37 | l | 53 | 1 |
6 | G | 22 | W | 38 | m | 54 | 2 |
7 | H | 23 | X | 39 | n | 55 | 3 |
8 | I | 24 | Y | 40 | o | 56 | 4 |
9 | J | 25 | Z | 41 | p | 57 | 5 |
10 | K | 26 | a | 42 | q | 58 | 6 |
11 | L | 27 | b | 43 | r | 59 | 7 |
12 | M | 28 | c | 44 | s | 60 | 8 |
13 | N | 29 | d | 45 | t | 61 | 9 |
14 | O | 30 | e | 46 | u | 62 | + |
15 | P | 31 | f | 47 | v | 63 | / |
From this we get
55 -> 3
14 -> O
27 -> b
18 -> S
63 -> /
63 -> /
0 -> A
0 -> =
Note: '=' is the value used for the padding value 0.
Thus our initial sequence of 5 bytes is now Base64 encoded as
3ObS//A=
The .Net framework exposes this functionality primarily via the
Convert.ToBase64String() and
Convert.ToBase64EncodingArray() methods
So this line of code
System.Console.WriteLine(Convert.ToBase64String(new byte[]{220,230,210,255,240}));
would print
3ObS//A=
UUEncode
UUENCODE stands for (Unix-to-Unix) encoding, it was the predominant system for binary to text encoding before base64 and MIME, I looked up Google to see what were the disadvantages of uuencode, it seems that uuencode depended on the code page of the current locale to encode the data, so if data was being transferred between systems having identical code pages then it worked fine but it broke when the two systems exchanging data used different code pages. The assumption being that the conversion would be taken care of the gateways that did the content transfer, but that was patchy at best. ( Not too sure of this since this was before my time )
Uuencode encoding is quite similar to Base64, you first convert 3 bytes into 4 bytes by grouping into groups of 6-bits each and then making each 6-bit group into a byte by adding 2 zero bits to the front, next we add 32 to bring the byte into the printable and humanly readable range of 32 – 95, next we encode the byte into ASCII characters using the standard ASCII table. The padding byte used in this case is 1(0x01)
E.g.
Suppose we want to encode a file named test.dat, which happens to contain only one byte 254
1. Convert to Binary
254 = 11111110
2. Group into groups of 6-bits each
Add 2 bytes of padding to round off input to multiple of 3 (Padding is 0x01)
Thus input bit stream becomes
11111110 00000001 00000001
Thus our grouping becomes
111111
100000
000100
000001
3. Add two zero bits to get full byte
111111 -> 00111111 = 63
100000 -> 00100000 = 32
000100 -> 00000100 = 4
000001 -> 00000001 = 1
4. Add 32
Thus our bytes now become
95,64,36,33
5. Encode using standard ASCII table
Value | Encoding | Value | Encoding | Value | Encoding | Value | Encoding |
32 | | 48 | 0 | 64 | @ | 80 | P |
33 | ! | 49 | 1 | 65 | A | 81 | Q |
34 | " | 50 | 2 | 66 | B | 82 | R |
35 | # | 51 | 3 | 67 | C | 83 | S |
36 | $ | 52 | 4 | 68 | D | 84 | T |
37 | % | 53 | 5 | 69 | E | 85 | U |
38 | & | 54 | 6 | 70 | F | 86 | V |
39 | ' | 55 | 7 | 71 | G | 87 | W |
40 | ( | 56 | 8 | 72 | H | 88 | X |
41 | ) | 57 | 9 | 73 | I | 89 | Y |
42 | * | 58 | : | 74 | J | 90 | Z |
43 | + | 59 | ; | 75 | K | 91 | [ |
44 | , | 60 | < | 76 | L | 92 | \ |
45 | - | 61 | = | 77 | M | 93 | ] |
46 | . | 62 | > | 78 | N | 94 | ^ |
47 | / | 63 | ? | 79 | O | 95 | _ |
95 -> _
64 -> @
36 -> $
33 -> !
That’s basically how uuencode works, a couple of things to note about the output file format
1. The first line will be ‘begin <unix file access mode> <filename>’ and two extra lines are used to indicate end-of-file, the second last line has a single byte 0x20 and the last line will contain ‘end’.
2. The first charcter of each line is a count of the number of bytes encoded in that line, for e.g if a line has the full 60 charcters (limit imposed by early unix email clients) then that means 60*4/3 = 45 bytes have been encoded. Add 32 to 45 (to get a printable charcter) and we get 77 = M, thus you will find all lines except the last begin with M.
Thus our file will be encoded as
begin 664 test.dat
_@$!
end
In the .Net framework, you can choose to encode your mail attachments using uuencode by passing MailEncoding.UUEncode value to the Encoding property of the MailAttachment object.