ASCII vs ANSI Encoding

A question posted on the Australian DOTNET Developer Mailing List ...

Im having a character encoding problem that surprises me. In my C# code I have a string " 2004" (thats a copyright/space/2/0/0/4). When I convert this string to bytes using the ASCIIEncoding.GetBytes method I get (in hex):

3F 20 32 30 30 34

The first character (the copyright) is converted into a literal '?' question mark. I need to get the result 0xA92032303034, which has 0xA9 for the copyright, just as happens when the text is saved in notepad

An ASCII encoding provides for 7 bit characters and therefore only supports the first 128 unicode characters. All characters outside that range will display an unknown symbol - typically a "?" (0x3f) or "|" (0x7f) symbol.

That explains the first byte returned using ASCIIEncoding.GetBytes()...

> 3F 20 32 30 30 34

What your trying to achieve is an ANSI encoding of the string. To get an ANSI encoding you need to specify a "code page" which prescribes the characters from 128 on up. For example, the following code will produce the result you expect...

string s = " 2004";
Encoding targetEncoding = Encoding.GetEncoding(1252);
foreach (byte b in targetEncoding.GetBytes(s))
Console.Write("{0:x} ", b);

> a9 20 32 30 30 34

1252 represents the code page for Western European (Windows) which is probably what your using (Encoding.Default.EncodingName). Specifying a different code page say for Simplified Chinese (54936) will produce a different result.

Ideally you should use the code page actually in use on the system as follows...

string s = " 2004";
Encoding targetEncoding = Encoding.Default;
foreach (byte b in targetEncoding.GetBytes(s))
Console.Write("{0:x} ", b);

> (can depend on where you are!)

All this is particularly important if your application usesstreams to write to disk. Unless care is taken, someone in another country (represented by a different code page) could write text to disk via a Stream within your application and get unexpected results when reading back the text.

In short,always specify an encoding when creating a StreamReader or StreamWriter - for example...

StreamReader reader = new StreamReader(path, System.Text.Encoding.Default);

No Comments