UTF8 Encoding Changes in Vista (Hashing Gotcha)
If you've used hashing to store passwords for your application, you may want to double-check you code to ensure it works on Vista.
Thanks to information found in Shawn Steele's post from almost a year ago, it seems Microsoft made changes in the .NET Framework for Vista to comply with the Unicode 5.0 specification, which requires invalid characters previously omitted in earlier versions of the .NET Framework for Windows XP, Server 2003, etc. As such, hashes computed on previous versions of Windows may not match hashes created on Vista.
For example, if you hash the string 'Password' on Windows XP using MD5CryptoServiceProvider, the hashed UTF8 string result would be 'd~^g◄U7R↑!+9d'. Now, hashing the same value on Vista (without the fore mentioned solution) would result in '�d~�^g_�U7R_!+9d' as the UTF8 string output. Note the addition of the Unicode Replacement Character (\xFFFD) interspersed in the latter output. This is due to the fact that the Unicode 5.0 specification requires this character be provided rather than omitted, as allowed in Unicode 4.1 (implemented in the aforementioned previous versions of the .NET Framework for Windows.)
Now, as noted in Shawn's post, you can yield the same results on Vista as previous versions of Windows using EncoderFallback derived classes. These will allow you to specify non-default characters to use in place of the Unicode Replacement Character. For example, consider the following setup:
Encoding encoding = Encoding.UTF8.Clone() as Encoding;
encoding.EncoderFallback = new EncoderReplacementFallback(string.Empty);
encoding.DecoderFallback = new DecoderReplacementFallback(string.Empty);
Based on the snippet above, the encoding will use an empty string for encoding/decoding; thus, the result would be the same as that from Windows XP in the previous example (which simply omitted by default.)
While this solution will work (had to make things work in an existing codebase,) I would recommend using Base64-encoding strings to store hashes, as they would not suffer from the same issues of invalid UTF8 characters. The same string, 'Password', hashed using bytes obtained from Encoding.UTF8.GetBytes(byte[]) with or without the fallbacks yields '3GR+tl5nEeFVN1IYISs5ZA==' using Convert.ToBase64String(byte[]).