Following up on the benefits of continued use of int.Parse...

Alex Campbell followed up with some comments on his blog that he saw the continued use of int.Parse as acceptable because of it's additional parsing abilities to determine if a number truly was an integer.  C# if(!FindFunction(IsNumeric)) { return "this language is retarded"; }, while I find this admirable, I can't see how possibly throwing exceptions could ever be considered a better method.  In the interest of providing faster numeric processing methods than are available I'll be parking a new number processing class under my articles, at DWC.Algorithms.NumberUtilities.

Why would I even bother writing these number processing functions to begin with?  Well, I create a large number of applications that have to process input from streams.  I also write a large number of XML applications.  And further, I use regular expressions quite often.  In all of these cases you have to process the string result into a number on the back-end.  This always has the opportunity to throw an exception.  If you have a few thousand numeric strings, and 50 of them throw an exception, you just killed the performance of your application.  Not to mention, int.Parse, even without exceptions, is VERY slow.  My current algorithm is approximately 4x faster than int.Parse when processing valid integers.  It is even faster when int.Parse would throw an exception, somewhere along the lines of 15-20x faster.  You can draw an average use case based on how many invalid integers you might parse and determine exactly how much more performant my algorithm would be for your use case.

Note the algorithm is complex, and I am only concerned with US english processing.  You should still use int.Parse if you want to numbers under different locales.  This is the first iteration of the algorithm, and while I tested against 40 billion numeric strings, it doesn't mean there aren't issues in the algorithm.  Let me know if you find any please, and if you actually decide to use this function, let me know in the comments, so I can help determine its beneficial impact on the community and further develop it for use in string->number processing.  (Sidenote: This improved a stream processing algorithm I was using from 41 ms down to 28 ms, and that was in the case of no invalidly formatted numbers).

Published Tuesday, March 30, 2004 9:59 PM by Justin Rogers

Comments

Wednesday, March 31, 2004 1:19 AM by Jesse Ezell

# re: Following up on the benefits of continued use of int.Parse...

I've always wondered why this wasn't built into the framework... It's such a common task. I guess Rico didn't stop by this team and talk to them about performance issues with exceptions...
Wednesday, March 31, 2004 2:33 AM by Alex Campbell

# re: Following up on the benefits of continued use of int.Parse...

Hi Jason,

Thanks for the feedback. As usual, I agree with almost all of what you say. Your alternative approaches are certainly faster. However, I think in some particular cases, try... int.Parse()... catch still has a place.

Your description of my (not even remotely original) approach as "admirable" is hard to judge. Did you mean "admirable" in the same way that I often describe really stupid ideas as "creative", "novel", "unique", "fascinating", "breathtaking" etc? :-)

I also agree with Jesse that it would have been great if the .Net Framework team had included this functionality in the framework.
Wednesday, March 31, 2004 2:47 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

Admirable in that you were willing to defend the method based on it's additional power to differentiate int's from non-ints. None of the original routines that I tested were truly capable of discerning ints and it was good that you pointed that out.

Even my newer techniques have issues in that they aren't globalizable since they aren't based on culture variances (I pointed this out above).

However, even if the BCL team had provided better options for programmers, my method would still be faster. Unless they dynamically compiled, based on the culture info, a string to integer parser, it wouldn't be faster than my implementation that is hard-coded to work only in the US english case. I am currently in the process of discovering how this might be done, since fast string->number conversions are very important for several of my own applications.
Wednesday, March 31, 2004 3:17 AM by Jesse Ezell

# re: Following up on the benefits of continued use of int.Parse...

If you really want a flexable implementation that matches the .NET framework's, you could probably just use reflector to get the source and then remove the exception throwing stuff (or get the mono source or rotor source).

In any case, Whidbey adds a TryParse method, so at least you won't have to wait too long for this functionality.
Wednesday, March 31, 2004 3:23 AM by Jesse Ezell

# re: Following up on the benefits of continued use of int.Parse...

PS: If you don't mind casting in .NET 1.0/1.1, you can use Double.TryParse and convert to an integer...
Wednesday, March 31, 2004 3:28 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

As for a flexible implementation, a good deal of the code for numeric types is implemented as part of the VM or virtual machine. There are ee calls that take place (InternalCall) that actually handle the conversion of the string to a number. These calls happen to be where the exceptions are thrown from, so reflector doesn't really help.

To answer your Double.TryParse comments. Double.TryParse is slow. Very slow in fact. It is much slower than int.Parse assuming all numbers are valid. You can read my previous blog entry that covers the speed of many other parsing methods.

<a href="http://weblogs.asp.net/justin_rogers/archive/2004/03/29/100982.aspx" title="Justin Rogers">Performance: Different methods for testing string input for numeric values...</a>
Wednesday, March 31, 2004 3:30 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

A second assertion would be that a user can input non integer values into Double.TryParse, that could then be, validly cast into an integer. This would also enable loss of data, and a bunch of other *bad* things. I honestly don't think that using Double.TryParse and casting down is a valid way to test a string for integer value.

Wednesday, March 31, 2004 3:41 AM by Andy Smith

# re: Following up on the benefits of continued use of int.Parse...

Double.TryParse takes, as one of the parameters, a NumberStyle. One of those NumberStyles is Integer. using the integer numberstyle, you are guarenteed an int in the value parameter when it returns true.

And as for your perf argument... You've said yourself that your custom methods only work on US English. I'm sorry, but sacrificing localization for perf is not acceptable. "Anybody can write a fast program that does the wrong thing."

And as for the perf gain over Int32.Parse, well, that throws exceptions on bad data, which is not what you want for perf.
Wednesday, March 31, 2004 3:57 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

How often do you use a localized parse? You aren't using it in WebForms applications, since in most cases the locale is locked down to that of the server. You most of the time aren't using it in WinForms applications, and your configuration files/files being parsed are most likely in some set format. Not to mention unless you store the locale in the file, it will be mis-parsed if you send it across a locale.

My argument would be that sacrificing perf for localization isn't acceptable in most cases. Presenting a reverse argument never really accomplishes much though.

I will note that NumberStyles.Integer is not linked to a specific bit length. So you can still improperly decode a 64 bit integer into a 32 bit integer. You can also decode a 32/64 bit integer into a 16 bit integer. I really don't see what this Double.TryParse really buys, except for a bit of added and additional complexity.

Hell, you can decode an infinite length string using Double.TryParse. Check the following program.

using System;
using System.Globalization;

public class HackDoubleTryParse {
private static void Main(string[] args) {
double retVal;
string[] integers = new string[] {
"7290847123984721093749812374092137498217309821098472198347",
"-210943712908347091823409812734982370947213098412093741924"
};

Console.WriteLine(integers[0]);
Console.WriteLine(Double.TryParse(integers[0], NumberStyles.Integer, null, out retVal));
Console.WriteLine((int) retVal);
Console.WriteLine();

Console.WriteLine(integers[1]);
Console.WriteLine(Double.TryParse(integers[1], NumberStyles.Integer, null, out retVal));
Console.WriteLine((int) retVal);
Console.WriteLine();
}
}
Wednesday, March 31, 2004 4:05 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

It appears to accept up to 308 characters before it returns false. So:

new string('5', 308) would succeed, while
new string('5', 309) would fail.

However, characters may not be the order of the day, but instead a huge value. Since:
new string('1', 309) appears to work, but
new string('1', 310) doesn't.

Very strange indeed. I won't bother looking at the Rotor code for this method, since I'm not sure it would help stave off any afronts.

Wednesday, March 31, 2004 4:33 AM by Jesse Ezell

# re: Following up on the benefits of continued use of int.Parse...

I just ran your test with the Double.TryParse code inserted... 6 secs for 10 mill. conversions really ain't that bad for most apps. If you pass a pre-constructed NumberFormatInfo into the method call it completes the loop a bit faster than RegEx on my CPU (especially if the numbers get large, like 8 digits or so). I don't think the extra time is really all that bad, considering that you are getting a more powerful parsing algorithm that deals with culture and other things like exponents and commas (2.2 vs 6.6 for 10 mill ints isn't too bad). For 99.9% of all apps that kind of perf is still in the acceptable range (beats IntParse by 50x on my machine... I had to go down to 100000 just to make it complete in a reasonable amount of time with the IntParse method included).

Adding a few sanity checks after the double conversion could easily mitigate any conversion issues that TryParse might lead to.
Wednesday, March 31, 2004 4:37 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

Explain your sanity checks, and then examine the timing with your sanity checks added. Since those would count as part of your *algorithm* for parsing integers.
Wednesday, March 31, 2004 5:05 AM by Jesse Ezell

# re: Following up on the benefits of continued use of int.Parse...

Passing in integer as the number style should take care of the loss of any decimal data issues at almost no cost. Then, all you have to do is check to make sure the double is in range of Int32.MaxValue and Int32.MinValue.
Wednesday, March 31, 2004 5:12 AM by Jesse Ezell

# re: Following up on the benefits of continued use of int.Parse...

It actually runs faster with the sanity checks added.

6.0 secs vs 6.6
Wednesday, March 31, 2004 5:24 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

You'd actually have to post the code so we can make sure the time testing is all in the proper places, and that the JIT isn't optimizing out your checks. You also need to come up with some way to process the method returning 0 and True, when the value really isn't 0. See my above code sample for cases where the string is clearly not an Int, but the method returns True and 0.

Wednesday, March 31, 2004 5:36 AM by Justin Rogers

# re: Following up on the benefits of continued use of int.Parse...

Just to let you know, if you only pass in Integer as your number format style, then my method actually is doing everything that double.TryParse will do for you. Note that Integer actually maps to AllowLeadingSign, AllowLeadingWhite and AllowTrailingWhite. So go ahead and use that method if you would like, but it is quite a bit slower than mine.

I guess the Rotor code examination really does make the difference here since it points out the lack of additional processing that TryParse is doing, and that my algorithm should work on any culture for the purposes of processing integers. If you'd like I can even add the flags for turning on parsing of the thousands separator.

Wednesday, March 04, 2009 6:18 AM by ...

# re: Following up on the benefits of continued use of int.Parse...

Interessante Informationen.

Sunday, March 15, 2009 11:02 PM by ...

# re: Following up on the benefits of continued use of int.Parse...

Sehr wertvolle Informationen! Empfehlen!

Leave a Comment

(required) 
(required) 
(optional)
(required)