long time ago I promised to release my buffer manager class that I use with
my Async socket server with some explanations on my blog as to why I use it. Chris
Mullins and JD
Conley were a lot of help to me in coming up with this strategy some time
ago (August? of last year I think) and they are continued resources on advanced socket programming who are great to bounce ideas off of. Unfortunately due to being so gosh darn busy
I never had the time to share it with everyone. I should also add that Sebastien Lorion also made a
really valuable suggestion in the move from a queue to a stack as the operation
is much faster and can provide better localization and that Jeffrey
Richter also provided some information on the non-assurance of LOH in the
Yun Jin gives a very
good description of the issue in this
post but I will briefly summarize the issue. When you call
BeginReceive/BeginSend on a socket the memory that you pass to the method will
be pinned for the duration of the call. This will lead to heap fragmentation and
to make matters worse it will often be in your gen0 heap if you aren’t careful.
This becomes especially true in the most common socket applications
(request/response) as BeginReceive is often called and waits for a long time to
receive data (by long time I mean a few seconds or more).
Unfortunately the solution presented by Yun Jin is far from optimal in 2.0.
This is of course of no fault of his own as it was written for 1.1 although some
optimizations could apply there as well. Even as presented it will run into a
few issues. The first issue and I consider this the biggest is based on one of
If the pinned objects are allocated around same time, the free slots
between each two objects would be smaller, and the situation is better.
The code presented there does a good job of creating objects at the same
time. The problem is that you don’t generally pin them for the same amount of
time (and they are often pinned for very long periods of time) so as collections
run the GC will move the unpinned ones thus creating holes around the pinned
ones (heap fragmentation) which is what we were trying to avoid. Given he does
qualify his buffer manager with another rule of
The shorter the objects are pinned, the easier GC could compact the
Unfortunately in the case of most TCP servers this is not something that can
always be realized.
The main area that we can improve in by moving to .NET 2.0 is by using ArraySegments<byte>
structures instead of using byte. This may seem like a very simple change but
it in fact makes a huge difference in two areas.
The first thing it will allow us to do which was a limitation of the previous
buffer manager is to vary our buffer size per client. If you have a client who
is receiving a large file for instance you probably want to use larger send
buffers than a client who is not really doing much. The reason you want to do
this is to limit the number of thread context switches associated with that
socket (less context switches = better and more scalable). The buffer manager as
presented used single sized byte arrays and as such all buffers were to be
considered of equal size. .NET 2.0 added a few new overloads in dealing with BeginReceive and
allow me to pass a List<ArraySegment<byte>> by making my
BufferManager now manage ArraySegment<byte> objects a connection can at
its whim increase and decrease its buffer size by checking out more buffers or
returning some buffer back to the cache respectively.
The next change that was allowed because of the use of ArraySegments was the
further isolation of the memory. Instead of creating numerous small arrays which
then move slowly from Gen0->Gen1 etc I can create really big arrays which
will be put right into the LOH
(Large Object Heap). At present objects over 80k or so are put in the LOH
but this is subject to change in the future *keep this in mind as it may affect
your performance slightly later as you could end up with a really big array in
gen0 which is pinned at almost all times*. One of the nice things about the LOH
is that there is no compaction in other words we don’t have to worry about the
object being pinned or about fragmentation. As you can see the ability to create
this one BIG array will be quite a bit better for us as opposed to dealing with
lots of little arrays.
The biggest issue I had with the code as it stood was the handling when we
ran out of buffers, it would create them one by one leading to all the same
issues we would have without a buffer manager (objects not living near each
other and being pinned thus fragmenting the heap). This issue is removed
automatically when dealing with chunks because we always N chunks at a time
(i.e. if we run out of buffers and we are told to deal with 8k chunks in 1000
buffer segments we won’t create 1 new buffer but a whole new 8mb segment
consisting of 1000 buffers). Although this could be wasteful for memory one is
generally better off in a server application to grow items like this once as
opposed to often. In fact the reason why this is not such a big issue overall is
that with the previous incarnate it should always be the case
that BUFFER_SIZE is larger than the maximum number of buffers you think you
would need and thus you never need to resize (an extra few mb in buffer space is
no big deal in the context of such an application and this is a common
The other changes which were made were making the buffer cache thread safe. I
used LockFreeStacks to do this. There are many implementations of lock free
stacks that you can use available on the net Julian Bucknall has a great series of
articles on this as well. I used this particularly because I run on a
machine with lots of processors, you will likely be better off running with
simple monitor based locking around a normal Stack<T> if running on less
than 8 processors and this should
be a fairly easy change to the code. If people
are fairly interested I will spend the 15 minutes to make this change.
UPDATE: I put the lock version in a seperate listing (you could make the locks finer grained than my quick update though)
As a quick sidebar: the checkout method may look odd to you in the LockFreeStack version. The TryPop method will try to pop a value of the stack for me (in the tryxxx pattern format). The reason this pattern is needed is because I don’t lock my count test would be innacurate, I need to do the operation attomically If a buffer is not found then I need to create more a new segment of buffers … but another thread may have already done this for me (so I check the count in my lock before creating again). There is also an interesting case where although a new segment was created I am unlucky enough that all of it has been used up when I get to it so I have to loop. This has all of the problems of lock free algorithms especially starvation but is so unlikely to happen that I am not considerring to be a major concern.
All of that said here is the code. Note that even with all of this long
explanation the code is very short (as is the original) but its a perfect
example of how subtle asynchronous socket programming with .NET can be!
|Listing 1: Code with
|Listing 2: Code for BufferManager with Stack<T> +