About Reflector, Encoding, And an Article Error

My blog has moved.
You can view this post at the following address:
http://www.osherove.com/blog/2003/6/13/about-reflector-encoding-and-an-article-error.html
Published Saturday, June 14, 2003 12:27 AM by RoyOsherove
Filed under:

Comments

Friday, June 13, 2003 8:00 PM by Stephane

# re: About Reflector, Encoding, And an Article Error

What about either of these :

1) derive the StringWriter class

public class StringWriterWithEncoding : StringWriter
{
Encoding encoding;

public StringWriterWithEncoding (Encoding encoding)
{
this.encoding = encoding;
}

public override Encoding Encoding
{
get { return encoding; }
}
}


with this class you can specify whatever encoding you like.


2) or use an appropriate memory buffer :

using System.IO;
using System.Xml;

MemoryStream ms = new MemoryStream(); XmlTextWriter tw = new XmlTextWriter(ms,new System.Text.UTF8Encoding());
// -- your xml stuff begins here --tw.WriteStartDocument();
tw.WriteStartElement("...");
tw.WriteEndElement();
tw.WriteEndDocument();
tw.Flush();
tw.Close();
// -- your xml stuff ends here

// convert byte[] to String
String s = System.Text.Encoding.UTF8.GetString( ms.GetBuffer() );
MessageBox.Show( s );
Friday, June 13, 2003 10:54 PM by Ziv Caspi

# re: About Reflector, Encoding, And an Article Error

StringWriter writes to strings, whose underlying memory representation is always UTF-16 in the CLR (let's not get into the surrogate pair issue here...).

Thus, if you have StringWriter write to a string, it *will* be UTF-16 whether the first line in the string claims so or not. This is also how MSXML behaves, by the way.

Of course, when you take the string and convert it into a memory buffer in another encoding you need to take care of converting the encoding-providing header, which means extra work to you. Whether this is actually a good design or not is another issue...
Friday, June 13, 2003 10:56 PM by Morten Abrahamsen

# re: About Reflector, Encoding, And an Article Error

I would expect the fact that System.String is unicode has something to do with it :) So, it just won't support another format.

If you want another encoding you should probably use something along the lines of a binary stream.

An UTF8 encoded xml document stored in a UTF16 string doesn't really make much sence now does it :)

Just my 2c!
Saturday, June 14, 2003 12:47 AM by Roy Osherove

# re: About Reflector, Encoding, And an Article Error

Thanks for all the input guys. Like I said, It seems to just reaffirm what I wrote - You can't make you XML document's header UTF-8 encoded without going through some hoops.
Perhaps this is something that would best be solved using a method of the XMLTextWriter, something like 'SetEncodingHeader(Encoding encoding)' that would allow this without too much hassle..
Saturday, June 14, 2003 4:46 AM by David Pickett

# re: About Reflector, Encoding, And an Article Error

"I would expect the fact that System.String is unicode has something to do with it :)"

Technically, UTF-8 is also Unicode--just a different encoding for it ;).
Saturday, June 14, 2003 5:23 AM by Morten Abrahamsen

# re: About Reflector, Encoding, And an Article Error

I know that UTF-8 is unicode as well.

It's just that in the BCL UTF-16 is referred to as Encoding.Unicode... which is why I wrote that statement.

Anyways, the System.String class is an UTF16 string, so an XmlTextWriter with SetEncodingHeader would only enable you to have an encoding mismatch...

Morty :)
Thursday, July 31, 2003 11:17 AM by TrackBack

# XmlTextWriter StringWriter = Headache

Wednesday, March 17, 2004 9:56 AM by Kalyan Krishna

# re: About Reflector, Encoding, And an Article Error

First of all thanks a lot for this piece of code, as its just what I was looking for.

Unfortunately I am able to get only 4096 bytes of data from the MemoryBuffer object.

Is there any way to get around that limitation?

Any help is deeply appreciated.
Friday, June 25, 2004 10:53 AM by PaoloDM

# re: About Reflector, Encoding, And an Article Error

The problem is .NET always encodes strings as UTF-16. This is a documented fact. By using StringWriter, you will always get UTF-16. Microsoft, from what I've been reading on blogs, has no intention of changing this.

I have posted a work-around on this link:

http://www.gotdotnet.com/Community/MessageBoard/Thread.aspx?id=194275&Page=1#237674

Sincerly,

PaoloDM