Tales from the Evil Empire

Bertrand Le Roy's blog

News


Bertrand Le Roy

BoudinFatal's Gamercard

Tales from the Evil Empire - Blogged

Blogs I read

My other stuff

Archives

More on string concatenation performance

This one is kind of obvious when you think about it, but I've seen code like this so many times it's probably worth showing. Strangely enough, I've seen this kind of code in VB most of the time.
The reason why people write this kind of code is to construct a SQL query while keeping the source code easily readable by splitting the string across several lines of code. The code looks like this:

string 
"Some string that we want to build";
a +" from several strings for better"
;
a +" source code readability:"
;
a +" we don't want a long string on one line.";
 
Notice how the string has the new bit added to itself using concatenation (in VB, you would use &= instead of +=). This is of course not very efficient because real concatenations happen. Here's the IL that this code compiles to:
 
 ldstr "Some string that we want to build"
 stloc.0
 ldloc.0
 ldstr " from several strings for better"
 call string [mscorlib]System.String::Concat(string, string)
 stloc.0
 ldloc.0
 ldstr " source code readability:"
 call string [mscorlib]System.String::Concat(string, string)
 stloc.0
 ldloc.0
 ldstr " we don't want a long string on one line."
 call string [mscorlib]System.String::Concat(string, string)
 
It's much more efficient to really split the line without ending it, this way:

string 
"Some string that we want to build"
    
" from several strings for better"
    
" source code readability:"
    
" we don't want a long string on one line.";
 
The IL that this code compiles to is exactly equivalent to what you'd get if you had left the string on one line because the compiler knows that only static strings are concatenated here:
 
 ldstr "Some string that we want to build from several str"
 + "ings for better source code readability: we don't want a long string on"
 + " one line."
 
+ signs here are an artefact of ILDASM for readability, similar to what we're trying to do in our source code, and are not real concatenations: notice how the lines are split at different places when compared to our source. To make it perfectly clear: the + signs here are *not* real. If you read the exe file with a hex editor, you'll see that the string is in one piece. No quotes, no concatenation, nothing.
 
So you get both readability and optimal performance.
 
In VB, of course, you can use the line continuation symbol (underscore: _ ) and the & operator. The compiler will also optimize it to a single string.

Comments

EtIeNnE said:

Please note a small mistake here: in VB you CAN use += as a concatenation operator!
# February 22, 2005 3:26 PM

Doug Seven said:


This is a great post. Russ Nemhauser and I just covered this exact topic as one of our "small-but-useful" tips in a recent conference workshop on Performance and Scalability. Its little things like this that when multiplied by hundreds and thousands of users really add up to tangible performance gains.

I should add that when you are concatenating strings with variable values, you should use a StringBuilder to "accumulate" the string. Concatenating variable values with "+" or "& _" can add up to a perf hit because at the IL level this is still multiple values that are concatenated at run-time.
# February 22, 2005 4:58 PM

Sushant Bhatia said:

I have a question on strings and how they are represented in IL. A while back I ran into this.

PROBLEM STATEMENT - Going through my project for work with the ILDASM tool was quite interesting. I noticed the following line:-

IL_000b: ldstr "There was a problem while trying to cancel authent" + "ication. Please inform the trainer."

Why is the IL split? My first though was that the string is being concatenated at runtime by the JIT. Then I looked up the ldstr on MSDN and found this quote:

The ldstr instruction pushes an object reference (type O) to a new string object representing the specific string literal stored in the metadata. The ldstr instruction allocates the requisite amount of memory and performs any format conversion required to convert the string literal from the form used in the file to the string format required at runtime.


So I scratch my head again and wonder why the sting was broken into two parts. First part exactly 50 characters long?

POSSIBLE IMPLICATIONS - Anytime you have a string that is more that 50 characters long, it will automatically be split when written in IL. Then during JIT, it will be concatenated. I also read somewhere that if you have 4 or more string concatenated, then you really should use the StringBuilder. Does this mean that any string 200 characters or above should be created using the StringBuilder?

What effect does this have on Resource Strings from your Resx files?
# February 22, 2005 6:10 PM

Brian Beatty said:

What about using string builder
sb.append("line1")
sb.append("Line2")

or string. format?
# February 24, 2005 12:58 PM