System.Text.Encoding.Tip - ISerializable - Roy Osherove's Blog

System.Text.Encoding.Tip

I learned this the hard way, so you don't have to. This applies to anyone who is going to be using System.IO to read ANSI files that combine Hebrew And English mixed inside.

Now, when I attempted to do this, I used the StreamReader class, which is pretty simple to use:

StreamReader r = new StreamReader(myFile);

string MyText = r.ReadToEnd();

r.Close();

Ah! but there's one important caveat you'll notice when you try to MyText. The hebrew string inside will vanish without a trace, leaving you with a big heart attack...

To fix this problem, you'll need to specify the encoding in which the file is formatted. This is done by specifying one of the Encoding Classes available through System.Text.Encoding.*  like so:

StreamReader r = new StreamReader(myFile,System.Text.Encoding.Default);

What I found is that passing any other encoding type does not work for these files , and either truncates the text or displays garbage.  Since Encoding.Default automatically gets the encoding with the same codepage used by your system it saves you from the trouble of trying to figure this stuff out. I have no borderline cases to check against, but it seems to work perfectly for me.

Conclusion: when  reading an ANSI file, or if you're having problem reading any text file format, first try to read it by passing in Encoding.Default, and only then all the other types.

Published Thursday, May 15, 2003 2:46 PM by RoyOsherove
Filed under:

Comments

Tuesday, July 29, 2003 10:57 AM by fgh fgh

# re: System.Text.Encoding.Tip

j gfj gfj dfgh hgh gfh
Thursday, October 30, 2003 3:07 PM by Ingrid

# re: System.Text.Encoding.Tip

Thanks ! This really solved a similar problem I had with danish characters :-)
Thursday, January 01, 2004 10:26 PM by rob

# re: System.Text.Encoding.Tip

THANKS! I was having the identical problem with french and english and couldn't find anything in the help to indicate a solution.
Wednesday, March 31, 2004 11:35 AM by Stef@n

# re: System.Text.Encoding.Tip

Thank you very much! It works perfect with Greeks also!!!
Sunday, May 09, 2004 1:52 PM by theMooner

# re: System.Text.Encoding.Tip

works with turkish charset iso-8859-9 by System.Text.Encoding.GetEncoding("iso-8859-9-")
Thursday, June 03, 2004 11:46 AM by Mike Saeger

# re: System.Text.Encoding.Tip

I had the same problems with Microsoft's "Magic Quotes". They were removed without a trace until I added the Text.Encoding.Default
Wednesday, May 24, 2006 7:50 PM by Ferdie Dalida

# re: System.Text.Encoding.Tip

Thanks for this info.
I think I will use this to fix my problem
Friday, August 25, 2006 3:29 PM by John Zhang

# re: System.Text.Encoding.Tip

This is great and solved my issue.
Tuesday, September 26, 2006 11:38 AM by Shalom Sayag

# re: System.Text.Encoding.Tip

Toda Raba. Did the work.
Monday, October 16, 2006 11:47 AM by Levanter30

# re: System.Text.Encoding.Tip

Brilliant Thanks, this solved it for me! :-)