Loading XML with accented characters breaks System.XmlDocument.Load()
It took me a bit of time but I finally found a solution for loading a XML file that has accented characters (like áéíóúâä) into a UTF-8 format. I'm loading data for a client and it ended up having a name in it with an accented e character. For the first time on this project System.XmlDocument.Load() was blowing up. With a lot of Googling I finally found a link that gave me, what I hope, is the solution for this problem. For now it is working so I'll go with it. The link to the article I found is in the code sample below as well as the image to the left. The magic happens by reading in while enforcing a double-byte encoding then saving out in an encoding that gives the visual representation we, in the US, would expect. Hope this helps someone else. |
1: #region Fix Character Encodings
2: // Need to drop accented characters back to normal characters...
3: // reference: http://www.codeproject.com/KB/cs/EncodingAccents.aspx //
4: StreamReader sr = new StreamReader(_pfInfo.WorkingFile, Encoding.GetEncoding("iso-8859-1"));
5: string fileContents = sr.ReadToEnd();
6: sr.Close();
7: sr = null;
8:
9: StreamWriter sw = new StreamWriter(_pfInfo.WorkingFile, false, Encoding.GetEncoding("iso-8859-8"));
10: sw.Write(fileContents);
11: sw.Flush();
12: sw.Close();
13: sw = null;
14: #endregion