CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Raymond Lewallen

Professional Learner

String.Concat versus Text.Stringbuilder

Ok, I'll get straight to the point. Keep in mind, the execution times were not gathered during running the application through the profiler, as that always increases execution time due to the data is has to grab, so you will see a discrepency in how long I say it takes, and what time time is on the graphs. Don't pay attention to the time on the graphs:

Example 1 - String.Concat a single character 1,000 times.
C#
string text1 = "A";
int num1 = 0;
do
{
text1 = text1 + "A";
num1++;
}
while (num1 <= 1000);
IL
// Code Size: 36 byte(s)
.maxstack 2
.locals (
string text1,
int32 num1)
L_0000: nop
L_0001: ldstr "A"
L_0006: stloc.0
L_0007: ldc.i4.0
L_0008: stloc.1
L_0009: ldloc.0
L_000a: ldstr "A"
L_000f: call string string::Concat(string, string)
L_0014: stloc.0
L_0015: nop
L_0016: ldloc.1
L_0017: ldc.i4.1
L_0018: add.ovf
L_0019: stloc.1
L_001a: ldloc.1
L_001b: ldc.i4 10000
L_0020: ble.s L_0009
L_0022: nop
L_0023: ret



This produces a heap size of 1,230,848 with a System.String allocation of 986,112 which is 80.12% of the entire heap allocation. System.Byte[] allocation is 109,568 which is 8.9% of the total heap allocation. All other allocations are too small to mention. The time to execute is < 1 ms. A garbage collection is performed 3 times (not going to get into then GC gens.)

Example 2 - Text.StringBuilder.Append() a single character 1,000 times.
C#
StringBuilder builder1 = new StringBuilder("B");
int num1 = 0;
do
{
builder1.Append("B");
num1++;
}
while (num1 <= 1000);
IL
// Code Size: 41 byte(s)
.maxstack 2
.locals (
[mscorlib]System.Text.StringBuilder builder1,
int32 num1)
L_0000: nop
L_0001: ldstr "B"
L_0006: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor(string)
L_000b: stloc.0
L_000c: ldc.i4.0
L_000d: stloc.1
L_000e: ldloc.0
L_000f: ldstr "B"
L_0014: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0019: pop
L_001a: nop
L_001b: ldloc.1
L_001c: ldc.i4.1
L_001d: add.ovf
L_001e: stloc.1
L_001f: ldloc.1
L_0020: ldc.i4 1000
L_0025: ble.s L_000e
L_0027: nop
L_0028: ret


This produces a heap size of 421,888 with a System.String allocation of 145,408 which is 34.47% of the entire heap allocation. System.Byte[] allocates 118,784 which is 28.16% and System.Char[] allocates 49,152 which is 11.65%. System.Object allocates less than 5% of the total heap size. The time to execute is < 1 ms. A garbage collection is performed twice.

Example 1 allocates a heap size almost 3 times the size of example 2. The System.String has an allocation of more than 4 times the size in example 1 as it does in example 2.

Example 3 - String.Concat a single character 10,000 times.

The source is the same as example 1 but loops 10000 times insead of 1000. This produces a heap size of 686,080 with a System.String allocation of 532,480 which is 77.61% of the entire heap allocation. System.Byte[] allocates 90,112 which is 13.13% and System.Object[] allocates 20,480 which is 2.99%. The mean time to execute is 110 ms. A garbage collection is performed a whopping, but expected, 338 times!

Example 4 - Text.StringBuilder.Append() a single character 10,000 times.

The source is the same as example 2 but 10,000 times insead of 1000. This produces a heap size of 483,328 with a System.String allocation of 206,848 which is 42.8% of the entire heap allocation. System.Byte[] allocates 118,784 which is 24.58% and System.Char[] allocates 49,152 which is 10.17%. System.Object[] allocates 20,480 which is 4.24% of the heap. The time to execute is < 1 ms. A garbage collection is performed twice.

Example 4 runs the loop the same amount of times, 10000, but performs over 100 times faster than example 3! This is mainly do the the allocations and garbage collections that occur when doing concats on string. When you concat a string to another string, you are creating a 3rd string. A = A + B doesn't replace the original A, it actually leaves the original A and produces a C to hold the value of A + B. So for each String.Concat, you are producing a whole new value on the heap. This is why the garbage collector ran 338 times during that 110 ms execution time, and why the execution time took so long. In example 4, the stringbuilder only had to collect twice, because we are not performing new allocations for object space, therefore not requiring the gc to run to reclaim that memory space for abandoned objects.

Example 5 - Text.StringBuilder.Append() a single character 1,000,000 times.

The source is the same as example 2 but loops 1 million (1,000,000) times insead of 1000. This still executes faster than example 3. The mean execution time for 1 million appends is 90 ms. Heap size is where the big difference is, naturally. The heap size is 4,252,637 with a System.String allocation of 4,119,552 which is 96.87% of the entire heap allocation. A garbage collection is only performed 3 times in this case, which is the main reason why we have such an outstanding executiong time. Example 5 can append 100 times the amount of data, when dealing with single characters, than example 3 can concat, and do it 20 ms faster on average.

Here's a bit more explanation on the "why" of the heap allocation and the garbage collections.

In example 1, we see a heap size of 1.2 MB with 3 GC's. System.String allocates 986KB of the heap, and we are only doing 1000 concats. In example 3, where we do 100 times as many concats, 10,000. In example 3, our entire heap allocation is now half of what is was in example 1, in the System.String allocation is 532KB, almost half of what is was in example 1. However, in example 3, we had 338 GC's. Also note, these allocations are what we have left at the end of the process. See the charts in the original post for allocation sizes along the way, or do your own profiling.

So the question was asked, if example 3 is doing 100 times as much work, why is it only using half the amount of memory? GC my friend. In example 1, enough space is allocated to deal with all 1001 strings that will be created (1000 concats plus the 1 original string), but the space used is not so large that a GC wants to have to clean it up to make room for more strings. In example 3, when dealing with 10001 strings, as we loop through the concats, the GC sees that we have a bunch of string objects that we aren't using anymore, such as "A", "AA", "AAA", "AAAA" etc. Since we needs memory to hold the strings we are continuing to create, we need to allocate more memory, but the GC sees that we have old objects that we can throw away to make room for the new strings, so instead of allocating more memory, it reclaims old memory that we aren't using anymore. It does this over and over, hence the 338 garbage collections, instead of 338 increases in heap allocations.

After explaining that, it was asked, "Then why does example 5 allocate over 4 MB of memory then, instead of cleaning it up and reclaiming old memory like example 3 does?" Simple. We don't have any object to throw away. We are dealing with a single Text.StringBuilder object that we are appending a string object to.

String.concat keeps heap allocation low because every time you concatenate, a copy of the string is made. Thus, the older string becomes ready for garbage collection. As you saw, making and discarding many String objects is slow and inefficient.

StringBuilder was designed to overcome this deficiency by using an internal buffer that expands when characters/strings are appended. This buffer is initialized to a capacity of 16 characters. Whenever an append pushes the amount of characters past this capacity, the buffer is doubled. Considering the amount of characters we were appending, it's no surprise that the allocation for StringBuilder had become huge. The garbage collector won't clean this up since the object is still in use.

Note that you can construct a StringBuilder object with an initial capacity and even a maximum capacity.


Comments

Scott said:

Graphs aren't showing
# December 3, 2004 3:58 AM

Scott Hanselman said:

This is interesting, but it doesn't seem to me to be a realistic use case, as you are concatenating a single char, 1000 times. I don't remember ever needing to do that, and the GC characteristcs will be different based on:

"Big string -> small appends
It's substantially likely that appends will fit in the slop and so they're fast, this is the best case (buffer size becomes double the string when it no longer fits so on average the slop is half the current string length) (if there are lots of small appends to a big string you win the most using stringbuilder)

Big string -> big appends
While the string is comparable in size (or smaller) to the appends stringbuilder won't save you much, if this continues to the point where the appends are small compared to the accumlated string you're in the good case

Small string -> big appends
bad case, string builder will just slow you down until enough slop has built up to hold those appends, you move to "big string big appends" as you append and finally to "big string small appends" if/when the buffer becomes collossal

Small string -> small appends
could be ok if you had a good idea how big your string was going to get and preallocated enough so that you have sufficient slop for the appends. You might be able to do better if you just concated all the small appends together in one operation."

Take a look at my post that points to Rico's:
http://www.hanselman.com/blog/PermaLink.aspx?guid=c41857da-5f8a-45a0-99bf-66dd1dbfd5c0
# December 4, 2004 7:06 AM

Raymond Lewallen said:

Sorry about the graphs. I was gone hunting over the weekend and didn't have time to fix them, but they are up now.

Thanks for the information from Rico, Scott. This was really testing the small string/small appends and big string/small appends scenarios. The reason I testing with this many loops is because of some scenarios we deal with on my project, not to the extreme I tested though. I was going to do an additional test this week with the small string/big appends and big string/big appends scenarios.
# December 6, 2004 12:49 AM

Il blog di Alessandro said:

# November 6, 2006 2:32 AM

PureEviL said:

Javascript Tips

# November 19, 2007 6:34 AM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add

About Raymond Lewallen

Working primarily in the public sector during his career, Raymond has designed and built several high profile enterprise level applications for all levels of the government. Raymond now works as a solutions architect for EMC. Raymond is an agile coach, Microsoft MVP C# and also president of the Oklahoma City Developers Group and Oklahoma Agile Developers Group. Raymond spends a lot of his time learning and teaching such things as Test Driven Development, Domain Driven Design, Design Patterns and Extreme Programming practices and principles, to name a few. Raymond is also an advocate of Alt.Net. Raymond is primarily a framework guy, so don't ask him anything about UI :) Check out Devlicio.us!

Our Sponsors

Free Tech Publications