Loading XML with accented characters breaks System.XmlDocument.Load()

It took me a bit of time but I finally found a solution for loading a XML file that has accented characters (like áéíóúâä) into a UTF-8 format.  I'm loading data for a client and it ended up having a name in it with an accented e character.  For the first time on this project System.XmlDocument.Load() was blowing up.  With a lot of Googling I finally found a link that gave me, what I hope, is the solution for this problem.  For now it is working so I'll go with it.  The link to the article I found is in the code sample below as well as the image to the left.  The magic happens by reading in while enforcing a double-byte encoding then saving out in an encoding that gives the visual representation we, in the US, would expect.

Hope this helps someone else.


   1:  #region Fix Character Encodings
   2:  // Need to drop accented characters back to normal characters...
   3:  // reference:  http://www.codeproject.com/KB/cs/EncodingAccents.aspx //
   4:  StreamReader sr = new StreamReader(_pfInfo.WorkingFile, Encoding.GetEncoding("iso-8859-1"));
   5:  string fileContents = sr.ReadToEnd();
   6:  sr.Close();
   7:  sr = null;
   8:   
   9:  StreamWriter sw = new StreamWriter(_pfInfo.WorkingFile, false, Encoding.GetEncoding("iso-8859-8"));
  10:  sw.Write(fileContents);
  11:  sw.Flush();
  12:  sw.Close();
  13:  sw = null;
  14:  #endregion
Cross posted from my blog at http://schema.sol3.net/kbarrows

No Comments