Tales from the Evil Empire

Bertrand Le Roy's blog

News


Bertrand Le Roy


Add to Technorati Favorites Tales from the Evil Empire - Blogged

Blogs I read

My other stuff

Archives

More on string concatenation performance

This one is kind of obvious when you think about it, but I've seen code like this so many times it's probably worth showing. Strangely enough, I've seen this kind of code in VB most of the time.
The reason why people write this kind of code is to construct a SQL query while keeping the source code easily readable by splitting the string across several lines of code. The code looks like this:

string 
"Some string that we want to build";
a +" from several strings for better"
;
a +" source code readability:"
;
a +" we don't want a long string on one line.";
 
Notice how the string has the new bit added to itself using concatenation (in VB, you would use &= instead of +=). This is of course not very efficient because real concatenations happen. Here's the IL that this code compiles to:
 
 ldstr "Some string that we want to build"
 stloc.0
 ldloc.0
 ldstr " from several strings for better"
 call string [mscorlib]System.String::Concat(string, string)
 stloc.0
 ldloc.0
 ldstr " source code readability:"
 call string [mscorlib]System.String::Concat(string, string)
 stloc.0
 ldloc.0
 ldstr " we don't want a long string on one line."
 call string [mscorlib]System.String::Concat(string, string)
 
It's much more efficient to really split the line without ending it, this way:

string 
"Some string that we want to build"
    
" from several strings for better"
    
" source code readability:"
    
" we don't want a long string on one line.";
 
The IL that this code compiles to is exactly equivalent to what you'd get if you had left the string on one line because the compiler knows that only static strings are concatenated here:
 
 ldstr "Some string that we want to build from several str"
 + "ings for better source code readability: we don't want a long string on"
 + " one line."
 
+ signs here are an artefact of ILDASM for readability, similar to what we're trying to do in our source code, and are not real concatenations: notice how the lines are split at different places when compared to our source. To make it perfectly clear: the + signs here are *not* real. If you read the exe file with a hex editor, you'll see that the string is in one piece. No quotes, no concatenation, nothing.
 
So you get both readability and optimal performance.
 
In VB, of course, you can use the line continuation symbol (underscore: _ ) and the & operator. The compiler will also optimize it to a single string.

Comments

EtIeNnE said:

Please note a small mistake here: in VB you CAN use += as a concatenation operator!
# February 22, 2005 3:26 PM

Doug Seven said:


This is a great post. Russ Nemhauser and I just covered this exact topic as one of our "small-but-useful" tips in a recent conference workshop on Performance and Scalability. Its little things like this that when multiplied by hundreds and thousands of users really add up to tangible performance gains.

I should add that when you are concatenating strings with variable values, you should use a StringBuilder to "accumulate" the string. Concatenating variable values with "+" or "& _" can add up to a perf hit because at the IL level this is still multiple values that are concatenated at run-time.
# February 22, 2005 4:58 PM

Bertrand Le Roy said:

Etienne: sure, you can. Most VB6 developers are used to the & syntax, though.

Doug: see this other post: http://blogs.msdn.com/bleroy/archive/2005/01/07/348831.aspx on concatenating with variable values. It's not the fact that you concatenate with variable values that really matter in the choice of a StringBuilder, it's really if the number of things you concatenate is variable: the compiler uses String.Concat(object[] a) when you concatenate a fixed number of objects, and that's significantly faster than StringBuilder.
# February 22, 2005 5:06 PM

Sushant Bhatia said:

I have a question on strings and how they are represented in IL. A while back I ran into this.

PROBLEM STATEMENT - Going through my project for work with the ILDASM tool was quite interesting. I noticed the following line:-

IL_000b: ldstr "There was a problem while trying to cancel authent" + "ication. Please inform the trainer."

Why is the IL split? My first though was that the string is being concatenated at runtime by the JIT. Then I looked up the ldstr on MSDN and found this quote:

The ldstr instruction pushes an object reference (type O) to a new string object representing the specific string literal stored in the metadata. The ldstr instruction allocates the requisite amount of memory and performs any format conversion required to convert the string literal from the form used in the file to the string format required at runtime.


So I scratch my head again and wonder why the sting was broken into two parts. First part exactly 50 characters long?

POSSIBLE IMPLICATIONS - Anytime you have a string that is more that 50 characters long, it will automatically be split when written in IL. Then during JIT, it will be concatenated. I also read somewhere that if you have 4 or more string concatenated, then you really should use the StringBuilder. Does this mean that any string 200 characters or above should be created using the StringBuilder?

What effect does this have on Resource Strings from your Resx files?
# February 22, 2005 6:10 PM

Bertrand Le Roy said:

First, the plus sign that you see in ILDASM is *NOT* anything real. I think it's just here to make it more readable. In reality, there's only one string. You can check by opening the exe file in a hex editor instead of with ILDASM.

Second, saying that concatenating more than four strings should be done with a StringBuilder is an oversimplification, and it's simply not true when concatenating a fixed number of strings (see http://blogs.msdn.com/bleroy/archive/2005/01/07/348831.aspx ). Four is a little low, too, even if the number of strings is variable.
# February 22, 2005 6:23 PM

Brian Beatty said:

What about using string builder
sb.append("line1")
sb.append("Line2")

or string. format?
# February 24, 2005 12:58 PM

Bertrand Le Roy said:

Brian: see this other post on why stringbuilder is not always a good choice: http://blogs.msdn.com/bleroy/archive/2005/01/07/348831.aspx

In particular, when concatenating a fixed number of static strings, it's just a waste of time and it will perform significatly worse than just putting the concatenations in a multiline instruction: the compiler optimizes this away perfectly. There is no way anything can be faster, and it's readable.

String.Format is the worst approach. It performs very poorly when compared to both concatenation and stringbuilder.
String.Format has the advantage that it's very readable, but if you care about performance, you should only use it when the format string is not known in advance (for example if it can be set by user code).
If you know the format string in advance, it will be functionally equivalent to a fixed number of concatenations of variable objects.
So this means that string concatenation is what performs the best in this case too, not StringBuilder, not String.Format.

By the way, if you're thinking about inserting variable elements in a Sql query using String.Format or any form of concatenation, think again. The security risk is just too high.
Read http://weblogs.asp.net/bleroy/archive/2004/08/18/216861.aspx for more details on this.
# February 24, 2005 3:10 PM

CumpsD said:

I've done some tests as well, on the memory usage of various string concatenation methods, might be a useful addition to your research: blog.cumps.be/string-concatenation-vs-memory-allocation

# September 16, 2007 1:52 PM