Ok, I'll get straight to the point. Keep in mind, the execution times were not gathered during running the application through the profiler, as that always increases execution time due to the data is has to grab, so you will see a discrepency in how long I say it takes, and what time time is on the graphs. Don't pay attention to the time on the graphs:
Example 1 - String.Concat a single character 1,000 times.C#
string text1 = "A";
int num1 = 0;
do
{
text1 = text1 + "A";
num1++;
}
while (num1 <= 1000);
IL
// Code Size: 36 byte(s)
.maxstack 2
.locals (
string text1,
int32 num1)
L_0000: nop
L_0001: ldstr "A"
L_0006: stloc.0
L_0007: ldc.i4.0
L_0008: stloc.1
L_0009: ldloc.0
L_000a: ldstr "A"
L_000f: call string string::Concat(string, string)
L_0014: stloc.0
L_0015: nop
L_0016: ldloc.1
L_0017: ldc.i4.1
L_0018: add.ovf
L_0019: stloc.1
L_001a: ldloc.1
L_001b: ldc.i4 10000
L_0020: ble.s L_0009
L_0022: nop
L_0023: ret

This produces a heap size of 1,230,848 with a System.String allocation of 986,112 which is 80.12% of the entire heap allocation. System.Byte[] allocation is 109,568 which is 8.9% of the total heap allocation. All other allocations are too small to mention. The time to execute is < 1 ms. A garbage collection is performed 3 times (not going to get into then GC gens.)
Example 2 - Text.StringBuilder.Append() a single character 1,000 times.C#
StringBuilder builder1 = new StringBuilder("B");
int num1 = 0;
do
{
builder1.Append("B");
num1++;
}
while (num1 <= 1000);
IL
// Code Size: 41 byte(s)
.maxstack 2
.locals (
[mscorlib]System.Text.StringBuilder builder1,
int32 num1)
L_0000: nop
L_0001: ldstr "B"
L_0006: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor(string)
L_000b: stloc.0
L_000c: ldc.i4.0
L_000d: stloc.1
L_000e: ldloc.0
L_000f: ldstr "B"
L_0014: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0019: pop
L_001a: nop
L_001b: ldloc.1
L_001c: ldc.i4.1
L_001d: add.ovf
L_001e: stloc.1
L_001f: ldloc.1
L_0020: ldc.i4 1000
L_0025: ble.s L_000e
L_0027: nop
L_0028: ret

This produces a heap size of 421,888 with a System.String allocation of 145,408 which is 34.47% of the entire heap allocation. System.Byte[] allocates 118,784 which is 28.16% and System.Char[] allocates 49,152 which is 11.65%. System.Object allocates less than 5% of the total heap size. The time to execute is < 1 ms. A garbage collection is performed twice.
Example 1 allocates a heap size almost 3 times the size of example 2. The System.String has an allocation of more than 4 times the size in example 1 as it does in example 2.
Example 3 - String.Concat a single character 10,000 times.
The source is the same as example 1 but loops 10000 times insead of 1000. This produces a heap size of 686,080 with a System.String allocation of 532,480 which is 77.61% of the entire heap allocation. System.Byte[] allocates 90,112 which is 13.13% and System.Object[] allocates 20,480 which is 2.99%. The mean time to execute is 110 ms. A garbage collection is performed a whopping, but expected,
338 times!Example 4 - Text.StringBuilder.Append() a single character 10,000 times.
The source is the same as example 2 but 10,000 times insead of 1000. This produces a heap size of 483,328 with a System.String allocation of 206,848 which is 42.8% of the entire heap allocation. System.Byte[] allocates 118,784 which is 24.58% and System.Char[] allocates 49,152 which is 10.17%. System.Object[] allocates 20,480 which is 4.24% of the heap. The time to execute is < 1 ms. A garbage collection is performed twice.
Example 4 runs the loop the same amount of times, 10000, but performs over 100 times faster than example 3! This is mainly do the the allocations and garbage collections that occur when doing concats on string. When you concat a string to another string, you are creating a 3rd string. A = A + B doesn't replace the original A, it actually leaves the original A and produces a C to hold the value of A + B. So for each String.Concat, you are producing a whole new value on the heap. This is why the garbage collector ran 338 times during that 110 ms execution time, and why the execution time took so long. In example 4, the stringbuilder only had to collect twice, because we are not performing new allocations for object space, therefore not requiring the gc to run to reclaim that memory space for abandoned objects.
Example 5 - Text.StringBuilder.Append() a single character 1,000,000 times.
The source is the same as example 2 but loops 1 million (1,000,000) times insead of 1000. This
still executes faster than example 3. The mean execution time for 1 million appends is 90 ms. Heap size is where the big difference is, naturally. The heap size is 4,252,637 with a System.String allocation of 4,119,552 which is 96.87% of the entire heap allocation. A garbage collection is only performed 3 times in this case, which is the main reason why we have such an outstanding executiong time. Example 5 can append 100 times the amount of data, when dealing with single characters, than example 3 can concat, and do it 20 ms faster on average.
Here's a bit more explanation on the "why" of the heap allocation and the garbage collections.
In example 1, we see a heap size of 1.2 MB with 3 GC's. System.String allocates 986KB of the heap, and we are only doing 1000 concats. In example 3, where we do 100 times as many concats, 10,000. In example 3, our entire heap allocation is now half of what is was in example 1, in the System.String allocation is 532KB, almost half of what is was in example 1. However, in example 3, we had 338 GC's. Also note, these allocations are what we have left at the end of the process. See the charts in the original post for allocation sizes along the way, or do your own profiling.
So the question was asked, if example 3 is doing 100 times as much work, why is it only using half the amount of memory? GC my friend. In example 1, enough space is allocated to deal with all 1001 strings that will be created (1000 concats plus the 1 original string), but the space used is not so large that a GC wants to have to clean it up to make room for more strings. In example 3, when dealing with 10001 strings, as we loop through the concats, the GC sees that we have a bunch of string objects that we aren't using anymore, such as "A", "AA", "AAA", "AAAA" etc. Since we needs memory to hold the strings we are continuing to create, we need to allocate more memory, but the GC sees that we have old objects that we can throw away to make room for the new strings, so instead of allocating more memory, it reclaims old memory that we aren't using anymore. It does this over and over, hence the 338 garbage collections, instead of 338 increases in heap allocations.
After explaining that, it was asked, "Then why does example 5 allocate over 4 MB of memory then, instead of cleaning it up and reclaiming old memory like example 3 does?" Simple. We don't have any object to throw away. We are dealing with a single Text.StringBuilder object that we are appending a string object to.
String.concat keeps heap allocation low because every time you concatenate, a copy of the string is made. Thus, the older string becomes ready for garbage collection. As you saw, making and discarding many String objects is slow and inefficient.
StringBuilder was designed to overcome this deficiency by using an internal buffer that expands when characters/strings are appended. This buffer is initialized to a capacity of 16 characters. Whenever an append pushes the amount of characters past this capacity, the buffer is doubled. Considering the amount of characters we were appending, it's no surprise that the allocation for StringBuilder had become huge. The garbage collector won't clean this up since the object is still in use.
Note that you can construct a StringBuilder object with an initial capacity and even a maximum capacity.