Tales from the Evil Empire

Bertrand Le Roy's blog

News


Bertrand Le Roy

BoudinFatal's Gamercard

Tales from the Evil Empire - Blogged

Blogs I read

My other stuff

Archives

Are StringBuilders always faster than concatenation?

While it's certainly a very good thing that more and more developers are aware of the performance problems of string concatenation and of how StringBuilder can solve them, it's not as simple as switching to StringBuilder every time you're concatenating more than five strings.
So just in case you don't know, here's what's wrong with concatenating a lot of strings using the + operator (& for our VB friends)... A string is an immutable object, which means that every time you change it, the old string object gets thrown away and a new one is created, which means allocating new memory. So if you repeatedly concatenate a large number of strings, the memory you're going to spend on that is going to grow roughly with the square of the number of strings.
A StringBuilder, on the other hand, just keeps references to all the string bits, and only really concatenates when you call ToString(). Everything stays nicely linear and you can safely concatenate several millions of strings with reasonable performance (not that it's a good idea, but you can).
Now, it doesn't mean that you should use StringBuilder all the time. For example, a common piece of code that you write in ASP.NET is constructing a client-side script string dynamically from static and dynamic bits (typically, a control's ClientID can be injected in some script that uses the client-side HTML structure rendered by the control). There are three approches you can take to that.
The first one, probably the easiest to write and to read is to use String.Format. Unfortunately, it performs poorly. Use it if readability is more important to you than performance.
The second one is to use StringBuilder, and the third one is to use string concatenation.
The crucial point here is that in the script generation case, we know in advance how many string bits we want to concatenate, which is very different from concatenating an arbitrary number of strings in a loop. Let's look at the last two approaches in details, as simplified in this sample code:


Console
.WriteLine(DateTime
.Now.ToString("HH:mm:ss.fffffff"));
for (int
c = 0; c < 5000000; c++) {
 
int
i = 0;
 
string
a = "a" + i++ + "a" + i++ + "a" + i++ + "a" + i++ + "a" + i++ + "a" + i++ + "a" + i++ + "a";
}

Console.WriteLine(DateTime
.Now.ToString("HH:mm:ss.fffffff"));
for (int
c = 0; c < 5000000; c++) {
 
int
i = 0;
 
StringBuilder sb = new StringBuilder
(15);
  sb.Append("a");
  sb.Append(i++);
  sb.Append("a");
  sb.Append(i++);
  sb.Append("a");
  sb.Append(i++);
  sb.Append("a");
  sb.Append(i++);
  sb.Append("a");
  sb.Append(i++);
  sb.Append("a");
  sb.Append(i++);
  sb.Append("a");
  sb.Append(i++);
  sb.Append("a");
 
string
a = sb.ToString();
}

Console
.WriteLine(DateTime.Now.ToString("HH:mm:ss.fffffff"));

What's interesting is to look at the compiled IL for the concatenation approach.


newarr [mscorlib]System.Object
... lots of array initialization code...
call string [mscorlib]System.String::Concat(object[])

The compiled code does not concatenate each string bit. Instead, it creates an array with all the bits in it (as it knows the size of the array at compile-time) and then calls String.Concat passing it the array as a parameter. This solution is actually faster than the StringBuilder which doesn't make any assumptions on the number of string bits (even though here, we've used the overloaded constructor that specifies the number of bits: this number is just considered a first estimation by the builder).
I've observed this code to perform about four times faster in the concatenation case than in the StringBuilder case. Of course, this test is very primitive and should not be considered to be some kind of absolute truth. For example, I've chosen the number of strings to concatenate quite arbitrarily, and you may find that your own numbers may vary widely from that. As always, never trust someone on performance, do your own tests in your own context.
But as long as you keep your concatenation instructions all in the same instruction (which doesn't prevent you from making it readable using carriage returns and indentations - our VB friends can use the underscore to continue on the next line -), concatenation should be faster in the fixed number of concatenations case.

Comments

bwaldron said:

So it looks like the compiler made an optimization because it knew that you were always concating the same number of values in your statement. I guess the rule of thumb for using StringBuilder is use it when what you appending varies based on something at runtime.

Good post...
# January 7, 2005 6:04 PM

Aapo Laakkonen said:

The compiler should take care of switching to StringBuilder whenever neccessary. Users' should just concatenate with '+'. That's what Java compilers have almost always done. And it seems like .NET compilers do the same, sort of.
# January 7, 2005 6:57 PM

Fabrice said:

One thing to know when using a StringBuilder is that it's good only when using Append(). If you try to use the Insert() method, you'll get really bad performances, because existing characters get shifted to make room for the new text.
# January 10, 2005 8:05 AM

Deepak V said:

i want to know the reason why CONCAT() is slower to using pipeline operator || in SQL?

does the same array concept apply to this also?

Regards,

Deepak V

# May 19, 2008 2:04 AM

Bertrand Le Roy said:

@Deepak: in SQL I'm sorry but I have no idea, you should try to ask on a SQL forum or to a member of that team.

# May 28, 2008 8:03 PM