Martin Spedding's Blog

Adventures in a disconnected world

How on earth can regex expression return a match which throws the exception "Input string was not in a correct format" ?

First of all happy new year to everyone where ever you live.

I have not blogged for a while has I have been very busy and also suffering from a viral infection.

I decided to complete a personal project I have been working on : creating an rss feed from my yahoo email. Everything was going wonderfully well until I tried to extract the data I need from the html page received from yahoo when you view you Yahoo inbox. Even though I am no expect in Regular Expressions I decided to have a go. I was using the appropriate .Net framework classes and looping through the collection of matches. The problem was that sometimes an exception would be thrown : "Input string was not in a correct format" . I patiently searched using Google to find the explanation of how a regex expression passing text contained in a string could be in the wrong format. Maybe someone can explain this exception as to me it defeated my ability to use regular expressions. Is it a bug or am I just being stupid.

I spent hours trying to solve this problem. In the end I just used a combination of indexOf and substring to achieve the same effect as I would have done with the regex. Performance was great and it was much simpler to debug and program than the regex version.

My conclusion is if plan A does not work use plan B and often the old fashioned ways work best.

Comments

Sijin Joseph said:

Well i don't think the message "Input string was not in a correct format" means that the input string was not in the format prescribed by the regular expression, this exception is thrown when the input string contains non-printable charcters or invalid charcters. Maybe the input string had non-ASCII chars, not sure what the excat thing is but it basically means that invalid chars were present in the string.

Debug, and check the value of the string when the exception gets thrown, if it contains some boxed chars they might be the culprit.

There is no way an indexOf implementation could have been easier to implement. I use regexes a lot and believe me, once ur hooked u'll absolutely love it.

Btw check out some great regex tools
Espresso http://www12.brinkster.com/ultrapico/Expresso.htm
Regulator http://royo.is-a-geek.com/regulator

Sijin
sijinjoseph@hotmail.com
# January 2, 2004 10:06 AM

Ken Hirsch said:

My wild guess (never having used .Net) is that you are giving it a string that is in the Windows-1252 character set but it is expecting a UTF character string. There are some security issues with invalid UTF character strings.
# January 2, 2004 10:08 AM

Darren Neimke said:

Martin, are you able to provide a cut-down sample of the code that you are using, the regex, and a sample input string that fails?
# January 2, 2004 7:13 PM

AT said:

I don't think your problem is Regexp.

I think you are trying to convert String to some Number and get as result NumberFormatException or
you are using Console.WriteLine with some tricky number formats.
# January 5, 2004 12:43 PM

vznlggp@gmail.com said:

Appreciate, friendship, deference, please don't link men and women as much as a wide-spread hate to point.

nike 6.0 schuhe www.nikeschuhedamendes.com

# January 14, 2013 3:48 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)