Alex Hoffman

Perspective on development, management and technology




    IASA Member

January 2004 - Posts

Win32 to .NET Framework API Map

“This article identifies the Microsoft .NET Framework version 1.0 or 1.1 APIs that provide similar functionality to Microsoft Win32 functions. One or more relevant .NET Framework APIs are shown for each Win32 function listed.

The intended audience for this article is experienced Win32 developers who are creating applications or libraries based on the Microsoft .NET Framework, but anyone looking for a managed counterpart for a Win32 function could find this document useful.“

MSDN Library Article Link

Posted Thursday, January 29, 2004 8:54 PM by Alex Hoffman

ASCII vs ANSI Encoding

A question posted on the Australian DOTNET Developer Mailing List ...

Im having a character encoding problem that surprises me. In my C# code I have a string " 2004" (thats a copyright/space/2/0/0/4). When I convert this string to bytes using the ASCIIEncoding.GetBytes method I get (in hex):

3F 20 32 30 30 34

The first character (the copyright) is converted into a literal '?' question mark. I need to get the result 0xA92032303034, which has 0xA9 for the copyright, just as happens when the text is saved in notepad

An ASCII encoding provides for 7 bit characters and therefore only supports the first 128 unicode characters. All characters outside that range will display an unknown symbol - typically a "?" (0x3f) or "|" (0x7f) symbol.

That explains the first byte returned using ASCIIEncoding.GetBytes()...

> 3F 20 32 30 30 34

What your trying to achieve is an ANSI encoding of the string. To get an ANSI encoding you need to specify a "code page" which prescribes the characters from 128 on up. For example, the following code will produce the result you expect...

string s = " 2004";
Encoding targetEncoding = Encoding.GetEncoding(1252);
foreach (byte b in targetEncoding.GetBytes(s))
Console.Write("{0:x} ", b);

> a9 20 32 30 30 34

1252 represents the code page for Western European (Windows) which is probably what your using (Encoding.Default.EncodingName). Specifying a different code page say for Simplified Chinese (54936) will produce a different result.

Ideally you should use the code page actually in use on the system as follows...

string s = " 2004";
Encoding targetEncoding = Encoding.Default;
foreach (byte b in targetEncoding.GetBytes(s))
Console.Write("{0:x} ", b);

> (can depend on where you are!)

All this is particularly important if your application usesstreams to write to disk. Unless care is taken, someone in another country (represented by a different code page) could write text to disk via a Stream within your application and get unexpected results when reading back the text.

In short,always specify an encoding when creating a StreamReader or StreamWriter - for example...

StreamReader reader = new StreamReader(path, System.Text.Encoding.Default);

Posted Monday, January 19, 2004 6:06 PM by Alex Hoffman | 2 comment(s)

More Posts