StringWriter Encoding Hack
A few months back I blogged about the problems I had with the System.IO.StringWriter when dealing with XML. This whole dealing with XML things is not very intuitive IMO. There are way too many steps involved in making it work. My other problem with it is kind of complicated to explain, so I will try my best.
I was trying to create an XML string to pass up the various layers of my application, to be later written to a file, or some other destination. Well, when you start to march into the XML world, you have to deal with text encoding. XML is very finicky about that kind of thing, and if you don't do it just right, you're screwed. Now, you may not have known this, but when you deal with strings, they are ALWAYS encoded as UTF-16. It doesn't matter what you do to it, or how you work with it, String = UTF16. XML on the other hand, requires all the text to be encoded in UTF8, otherwise IE and just about every other application on earth totally craps out. Has the problem begun to become apparent to you yet?
In the System.Xml namespace, you can specify the text encoding when dealing with Stream objects, and you can specify the encoding when automatically writing to a file, but you CANNOT change the encoding when you deal with strings. NEVER. You're stuck building UTF16 XML strings even if XML is almost always in UTF8. I don't know about you guys, but this is extremely confusing. It doesn't make one bit of sense to me.
I talked to some of the MS XML gurus (Dare and Joshua) and they told me that my scenario was the first legitimate one they had ever heard for needing to add encoding to the string. That surprised me, because, having never dealt with building XML using the namespace before, I would have expected it would work this way. My other beef is, if a string is always UTF16 and XML is always UTF8, shouldn't it automatically convert internally? Even if it means the Framework has to take the StringWriter, dump it into a StreamReader, change the encoding, and dump the encoded string back into a new StringWriter and pass it back... that's how I would think it would work. I'm hoping this is possible for System.Xml 2.0.
At any rate, I came up with a hack to at least make the XML document header show the right encoding. Now, I'm pretty sure that this code does not change the encoding of the document, but it is effective in that you can now set the encoding yourself, and the doc header will be emitted properly. USE AT YOUR OWN RISK, because the text may still not be encoded properly, and may still break the app. I haven't had any problems so far with IE reading the output.
I'm gonna display the source code here. I broke out stuff like this into my own base library, so that I can use it anywhere, not just in GenX.NET. They reside in the “Interscape” base namespace. I also put in the Data Access source code that I use for all my samples. I got tired of dealing with that over and over again.... but more on that later. I will make the base library source available as soon as it has a few more classes.
Imports System.IO
Imports System.Text
Namespace Text
'''<summary>
Public Class EncodedStringWriter
Inherits StringWriter
'Private property setter
Private _Encoding As Encoding
'''<summary>
'''<param name=“sb“>The formatted result to output.</param>
'''<param name=“Encoding“>A member of the
Public Sub New(ByVal sb As StringBuilder, ByVal Encoding As Encoding)
MyBase.New(sb)
_Encoding = Encoding
End Sub
'''<summary>
'''<param name=“Encoding“>
'''<remarks>
Get
Return _Encoding
End Get
End Property
End Class
End Namespace
Basically what is happening is, I'm creating a new class called EncodedStringWriter that has the same good stuff that the regular StringWriter has. I create a private variable placeholder, and I allow that placeholder variable to be set in the new constructor I created. Then I override the Encoding property (which is ReadOnly for some damned reason) and return the private placeholder that was set on instantiation. Bingo, I've now made my read-only peoperty not so read-only after all. Now, to use this new class to build an XML document (the way I thought I could in the first place, you do this (inside a function that I do not define here):
Dim i As Integer
Dim sb As New StringBuilder
Dim writer As New XmlTextWriter(New EncodedStringWriter(sb, Encoding.UTF8))
Dim dr As IDataReader = YourDataAccessFunctionHere
writer.Formatting = Formatting.Indented
writer.WriteStartDocument() 'Now the proper header will be rendered
writer.WriteStartElement("document")
'Cycle through the rest of the DataReader
While dr.Read()
writer.WriteStartElement("item")
For i = 0 To dr.FieldCount - 1
writer.WriteElementString(dr.GetName(i), HtmlEncode(dr.GetValue(i).ToString))
Next
writer.WriteEndElement()
End While
writer.WriteEndElement()
writer.WriteEndDocument()
writer.Flush()
writer.Close()
Return sb.ToString
Yes, I know this is a hack. I absolutely hate it. At the same time, I love it because I came up with it on my own, without asking anyone for help, and it gets the job done. Hopefully it will be fixed in Whidbey.
OK, my XML rant is over for the day. Later on I'll talk about my Universal Demo DAL, dealing with Access DBs, and why Northwind.mdb is once again the demo app's best friend.