Async Sockets and Buffer Management

A
long time ago
 I promised to release my buffer manager class that I use with
my Async socket server with some explanations on my blog as to why I use it. Chris
Mullins
 and JD
Conley
 were a lot of help to me in coming up with this strategy some time
ago (August? of last year I think) and they are continued resources on advanced socket programming who are great to bounce ideas off of. Unfortunately due to being so gosh darn busy
I never had the time to share it with everyone. I should also add that Sebastien Lorion also made a
really valuable suggestion in the move from a queue to a stack as the operation
is much faster and can provide better localization and that Jeffrey
Richter
 also provided some information on the non-assurance of LOH in the
future.

Yun Jin gives a very
good description of the issue in this
post
 but I will briefly summarize the issue. When you call
BeginReceive/BeginSend on a socket the memory that you pass to the method will
be pinned for the duration of the call. This will lead to heap fragmentation and
to make matters worse it will often be in your gen0 heap if you aren’t careful.
This becomes especially true in the most common socket applications
(request/response) as BeginReceive is often called and waits for a long time to
receive data (by long time I mean a few seconds or more).

Unfortunately the solution presented by Yun Jin is far from optimal in 2.0.
This is of course of no fault of his own as it was written for 1.1 although some
optimizations could apply there as well. Even as presented it will run into a
few issues. The first issue and I consider this the biggest is based on one of
his statements.

If the pinned objects are allocated around same time, the free slots
between each two objects would be smaller, and the situation is better.

The code presented there does a good job of creating objects at the same
time. The problem is that you don’t generally pin them for the same amount of
time (and they are often pinned for very long periods of time) so as collections
run the GC will move the unpinned ones thus creating holes around the pinned
ones (heap fragmentation) which is what we were trying to avoid. Given he does
qualify his buffer manager with another rule of

The shorter the objects are pinned, the easier GC could compact the
heap

Unfortunately in the case of most TCP servers this is not something that can
always be realized.

The main area that we can improve in by moving to .NET 2.0 is by using ArraySegments<byte>
structures instead of using byte[]. This may seem like a very simple change but
it in fact makes a huge difference in two areas.

The first thing it will allow us to do which was a limitation of the previous
buffer manager is to vary our buffer size per client. If you have a client who
is receiving a large file for instance you probably want to use larger send
buffers than a client who is not really doing much. The reason you want to do
this is to limit the number of thread context switches associated with that
socket (less context switches = better and more scalable). The buffer manager as
presented used single sized byte arrays and as such all buffers were to be
considered of equal size. .NET 2.0 added a few new overloads in dealing with BeginReceive and
BeginSend that
allow me to pass a List<ArraySegment<byte>> by making my
BufferManager now manage ArraySegment<byte> objects a connection can at
its whim increase and decrease its buffer size by checking out more buffers or
returning some buffer back to the cache respectively.

The next change that was allowed because of the use of ArraySegments was the
further isolation of the memory. Instead of creating numerous small arrays which
then move slowly from Gen0->Gen1 etc I can create really big arrays which
will be put right into the LOH
(Large Object Heap). At present objects over 80k or so are put in the LOH
but this is subject to change in the future *keep this in mind as it may affect
your performance slightly later as you could end up with a really big array in
gen0 which is pinned at almost all times*. One of the nice things about the LOH
is that there is no compaction in other words we don’t have to worry about the
object being pinned or about fragmentation. As you can see the ability to create
this one BIG array will be quite a bit better for us as opposed to dealing with
lots of little arrays.

The biggest issue I had with the code as it stood was the handling when we
ran out of buffers, it would create them one by one leading to all the same
issues we would have without a buffer manager (objects not living near each
other and being pinned thus fragmenting the heap). This issue is removed
automatically when dealing with chunks because we always N chunks at a time
(i.e. if we run out of buffers and we are told to deal with 8k chunks in 1000
buffer segments we won’t create 1 new buffer but a whole new 8mb segment
consisting of 1000 buffers). Although this could be wasteful for memory one is
generally better off in a server application to grow items like this once as
opposed to often. In fact the reason why this is not such a big issue overall is
that with the previous incarnate it should always be the case
that BUFFER_SIZE is larger than the maximum number of buffers you think you
would need and thus you never need to resize (an extra few mb in buffer space is
no big deal in the context of such an application and this is a common
pattern)

The other changes which were made were making the buffer cache thread safe. I
used LockFreeStacks to do this. There are many implementations of lock free
stacks that you can use available on the net Julian Bucknall has a great series of
articles
on this as well. I used this particularly because I run on a
machine with lots of processors, you will likely be better off running with
simple monitor based locking around a normal Stack<T> if running on less
than 8 processors and this should
be a fairly easy change to the code. If people
are fairly interested I will spend the 15 minutes to make this change.

UPDATE: I put the lock version in a seperate listing (you could make the locks finer grained than my quick update though)

As a quick sidebar: the checkout method may look odd to you in the LockFreeStack version. The TryPop method will try to pop a value of the stack for me (in the tryxxx pattern format). The reason this pattern is needed is because I don’t lock my count test would be innacurate, I need to do the operation attomically If a buffer is not found then I need to create more a new segment of buffers … but another thread may have already done this for me (so I check the count in my lock before creating again). There is also an interesting case where although a new segment was created I am unlucky enough that all of it has been used up when I get to it so I have to loop. This has all of the problems of lock free algorithms especially starvation but is so unlikely to happen that I am not considerring to be a major concern.

All of that said here is the code. Note that even with all of this long
explanation the code is very short (as is the original) but its a perfect
example of how subtle asynchronous socket programming with .NET can be!

 

   /// <summary>
   /// A manager to handle buffers for the socket
connections
   /// </summary>
   /// <remarks>
   /// When
used in an async call a buffer is pinned. Large numbers of pinned buffers
  
/// cause problem with the GC (in particular it causes heap
fragmentation).
   ///
   /// This class maintains a set of large segments
and gives clients pieces of these
   /// segments that they can use for their
buffers. The alternative to this would be to
   /// create many small arrays
which it then maintained. This methodology should be slightly
   /// better
than the many small array methodology because in creating only a few very
  
/// large objects it will force these objects to be placed on the LOH. Since
the
   /// objects are on the LOH they are at this time not subject to
compacting which would
   /// require an update of all GC roots as would be
the case with lots of smaller arrays
   /// that were in the normal
heap.
   /// </remarks>
   public class BufferManager {
      
private readonly int m_SegmentChunks;
       private readonly int
m_ChunkSize;
       private readonly int m_SegmentSize;
       private
readonly LockFreeStack<ArraySegment<byte>> m_Buffers;
       private
readonly object m_LockObject = new Object();
       private readonly
List<byte[]> m_Segments;

       /// <summary>
       /// The current number of buffers
available
       /// </summary>
       public int AvailableBuffers
{
           get { return m_Buffers.Count; } //do we really care about
volatility here?
       }

       /// <summary>
       /// The total size of all
buffers
       /// </summary>
       public int TotalBufferSize
{
           get { return m_Segments.Count * m_SegmentSize; } //do we really
care about volatility here?
       }

       /// <summary>
       /// Creates a new segment, makes buffers
available
       /// </summary>
       private void
CreateNewSegment() {
           byte[] bytes = new byte[m_SegmentChunks *
m_ChunkSize];
           m_Segments.Add(bytes);
           for (int i = 0;
i < m_SegmentChunks; i++) {
               ArraySegment<byte> chunk
= new
ArraySegment<byte>(bytes, i * m_ChunkSize,
m_ChunkSize);
               m_Buffers.Push(chunk);
           }
      
}

       /// <summary>
       /// Checks out a buffer from the
manager
       /// </summary>
       /// <remarks>
      
/// It is the client’s responsibility to return the buffer to the manger
by
       /// calling <see cref=”Checkin”></see> on the
buffer
       /// </remarks>
       /// <returns>A <see
cref=”ArraySegment”></see> that can be used as a
buffer</returns>
       public ArraySegment<byte> CheckOut()
{
           ArraySegment<byte> b;
           while
(true)
           {
           while (true)
           {
               bool found = !m_Buffers.TryPop(out b);
               if (!found)
               {
                   lock (m_LockObject)
                   {
                       if (m_Buffers.Count == 0)
                       {
                           CreateNewSegment();
                       }
                   }
               }
               else
               {
                   return b;
               }
           }

       }

       /// <summary>
       /// Returns a buffer to the control of
the manager
       /// </summary>
       ///
<remarks>
       /// It is the client’s responsibility to return the
buffer to the manger by
       /// calling <see
cref=”Checkin”></see> on the buffer
       ///
</remarks>
       /// <param name=”_Buffer”>The <see
cref=”ArraySegment”></see> to return to the
cache</param>
       public void CheckIn(ArraySegment<byte>
_Buffer) {
           m_Buffers.Push(_Buffer);
       }

       #region constructors

       /// <summary>
       /// Constructs a new <see
cref=”BufferManager”></see> object
       ///
</summary>
       /// <param name=”_SegmentChunks”>The number of
chunks tocreate per segment</param>
       /// <param
name=”_ChunkSize”>The size of a chunk in bytes</param>
       public
BufferManager(int _SegmentChunks, int _ChunkSize) :
           
this(_SegmentChunks, _ChunkSize, 1) { }

       /// <summary>
       /// Constructs a new <see
cref=”BufferManager”></see> object
       ///
</summary>
       /// <param name=”_SegmentChunks”>The number of
chunks tocreate per segment</param>
       /// <param
name=”_ChunkSize”>The size of a chunk in bytes</param>
       ///
<param name=”_InitialSegments”>The initial number of segments to
create</param>
       public BufferManager(int _SegmentChunks, int
_ChunkSize, int _InitialSegments) {
           m_SegmentChunks =
_SegmentChunks;
           m_ChunkSize = _ChunkSize;
          
m_SegmentSize = m_SegmentChunks * m_ChunkSize;
           m_Buffers = new
LockFreeStack<ArraySegment<byte>>(_SegmentChunks *
_InitialSegments);
           m_Segments = new
List<byte[]>();
           for (int i = 0; i < _InitialSegments;
i++) {
               CreateNewSegment();
           }
       }
  
}

Listing 1: Code with
LockFreeStack

 

using System;
using System.Collections.Generic;
using System.Text;

namespace TickerPlant
{
   /// <summary>
   /// A manager to
handle buffers for the socket connections
   /// </summary>
   ///
<remarks>
   /// When used in an async call a buffer is pinned. Large
numbers of pinned buffers
   /// cause problem with the GC (in particular it
causes heap fragmentation).
   ///
   /// This class maintains a set of
large segments and gives clients pieces of these
   /// segments that they
can use for their buffers. The alternative to this would be to
   /// create
many small arrays which it then maintained. This methodology should be
slightly
   /// better than the many small array methodology because in
creating only a few very
   /// large objects it will force these objects to
be placed on the LOH. Since the
   /// objects are on the LOH they are at
this time not subject to compacting which would
   /// require an update of
all GC roots as would be the case with lots of smaller arrays
   /// that
were in the normal heap.
   /// </remarks>
   public class
BufferManager {
       private readonly int m_SegmentChunks;
      
private readonly int m_ChunkSize;
       private readonly int
m_SegmentSize;
       private readonly Stack<ArraySegment<byte>>
m_Buffers;
       private readonly object m_LockObject = new
Object();
       private readonly List<byte[]> m_Segments;

       /// <summary>
       /// The current number of buffers
available
       /// </summary>
       public int AvailableBuffers
{
           get { return m_Buffers.Count; } //do we really care about
volatility here?
       }

       /// <summary>
       /// The total size of all
buffers
       /// </summary>
       public int TotalBufferSize
{
           get { return m_Segments.Count * m_SegmentSize; } //do we really
care about volatility here?
       }

       /// <summary>
       /// Creates a new segment, makes buffers
available
       /// </summary>
       private void
CreateNewSegment() {
           byte[] bytes = new byte[m_SegmentChunks *
m_ChunkSize];
           m_Segments.Add(bytes);
           for (int i = 0;
i < m_SegmentChunks; i++) {
               ArraySegment<byte> chunk
= new
ArraySegment<byte>(bytes, i * m_ChunkSize,
m_ChunkSize);
               m_Buffers.Push(chunk);
           }
      
}

       /// <summary>
       /// Checks out a buffer from the
manager
       /// </summary>
       /// <remarks>
      
/// It is the client’s responsibility to return the buffer to the manger
by
       /// calling <see cref=”Checkin”></see> on the
buffer
       /// </remarks>
       /// <returns>A <see
cref=”ArraySegment”></see> that can be used as a
buffer</returns>
       public ArraySegment<byte> CheckOut()
{
           lock (m_LockObject) {
               if (m_Buffers.Count ==
0) {
                   CreateNewSegment();
              
}
               return m_Buffers.Pop();
           }
       }

       /// <summary>
       /// Returns a buffer to the control of
the manager
       /// </summary>
       ///
<remarks>
       /// It is the client’s responsibility to return the
buffer to the manger by
       /// calling <see
cref=”Checkin”></see> on the buffer
       ///
</remarks>
       /// <param name=”_Buffer”>The <see
cref=”ArraySegment”></see> to return to the
cache</param>
       public void CheckIn(ArraySegment<byte>
_Buffer) {
           lock (m_LockObject) {
              
m_Buffers.Push(_Buffer);
           }
       }

       #region constructors

       /// <summary>
       /// Constructs a new <see
cref=”BufferManager”></see> object
       ///
</summary>
       /// <param name=”_SegmentChunks”>The number of
chunks tocreate per segment</param>
       /// <param
name=”_ChunkSize”>The size of a chunk in bytes</param>
       public
BufferManager(int _SegmentChunks, int _ChunkSize) :
           
this(_SegmentChunks, _ChunkSize, 1) { }

       /// <summary>
       /// Constructs a new <see
cref=”BufferManager”></see> object
       ///
</summary>
       /// <param name=”_SegmentChunks”>The number of
chunks tocreate per segment</param>
       /// <param
name=”_ChunkSize”>The size of a chunk in bytes</param>
       ///
<param name=”_InitialSegments”>The initial number of segments to
create</param>
       public BufferManager(int _SegmentChunks, int
_ChunkSize, int _InitialSegments) {
           m_SegmentChunks =
_SegmentChunks;
           m_ChunkSize = _ChunkSize;
          
m_SegmentSize = m_SegmentChunks * m_ChunkSize;
           m_Buffers = new
LockFreeStack<ArraySegment<byte>>(_SegmentChunks *
_InitialSegments);
           m_Segments = new
List<byte[]>();
           for (int i = 0; i < _InitialSegments;
i++) {
               CreateNewSegment();
           }
       }
  
}
}

Listing 2: Code for BufferManager with Stack<T> +
monitor locking
This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

39 Responses to Async Sockets and Buffer Management

  1. gregyoung says:

    Many people have commented that the links are broken since codebetter moved sites.

    You can check out the full code as its used in the event store. http://github.com/eventstore/eventstore look in core at buffermanagement.

    Cheers,

    Greg

  2. wsky says:

    excellent

  3. Pingback from TCP: Buffer Management – taccato! trend tracker, cool hunting, new business ideas

  4. Confucius says:

    Where are the files? Download link returns error!!!!??? HELP!!!

  5. 出会い says:

    癒される不思議な場所

  6. overred says:

    很好,Overlapped 这玩意很占地方

  7. jay says:

    hi,Greg:

    a simple question, can i keep a socket connection alive when we finished trade?

  8. Garey says:

    Thank you for the article … it is enlighting and I am still study it for my current purpose and possilbe future ones too.

    I saw other people’s posts requesting a fuller example. Would you have one available?

  9. Greg says:

    Aaron there are a couple around on the net, I think I mentioned one in th previous post on julians blog in this series http://www.boyet.com/Articles/LockFreeLimitedPriorityQ.html (the stack is before this one I think but the whole series is worth a read!)

  10. Aaron says:

    Where can i find the LockFreeStack sourcecode?thanks.

  11. Philipp says:

    Thanks for your response. Okay, I can see that the GC issues can really hamper the system’s performance. So this solution is not really used to increase the number of sockets which can connect, rather it is used to boost each Socket’s performance by ensuring nice properties like buffers being near one another on the heap.

    Have you ever considered how many connections a .Net TCP/IP server can support? Have you ever done any tests to determine how many simultaneous connections your solution can support on a typical PC? Have you ever seen anything written on this subject.

    I would really like to support between 20000 and 40000 simultaneous connections on my server (a normal dual core PC with 4Gb of RAM) and am worried that this may not be attainable with .Net. I am starting to wonder if I should use a Linux/C++ implementation where I would have greater control of the kernel etc. The .Net Library is just so tempting because it makes development so fast and easy. So I am very eager to find out whether I can reach my connection targets using it.

    Thanks!
    Philipp

  12. Greg says:

    Phillip:

    The main thing is more so that without the buffer pool you will wreak havoc on the GC :( Every one of your buffers that is being held has to be pinned (which causes fragmentation of your heap). With even 500 this is not so big of a deal but with say 10000 it becomes a pretty serious problem. There is also the problem of creating/destroying alot of object which will be done (especially if you are using byte arrays to send your data and creating/destroying one each time).

    Cheers,

    Greg

  13. Philipp says:

    Hi Greg, great article! I have been looking for this sort of stuff for a while now. I am just about to embark on an implementation of a high-performance TCP/IP server. My server should be able to host a very large number of clients, each of whose connections will be active for days but each will be idle most of the time. May I ask you some high-level questions about this?

    As I understand it, the buffers which receive information at each socket are drawn from nonpaged memory and are therefore a real limiting factor to the number of simultaneous connections which a server can maintain. So the idea behind building a buffer pool is to have n sockets share a pool of m buffers (n > m), relying on the fact that since all the sockets are not expected to be actively transmitting/receiving at the same time, we can squeeze more performance out of the same amount of memory. Is this correct?

    I have more questions, but I think I should understand the answer to one before I start on the next.

    Thanks!
    Philipp

  14. Greg says:

    Jared.

    #1, is your example on the send or on the receive side? I agree that dealing with differring sized buffers can be good (in particular for very large buffers which occasionally need to be used). For this I generally just create a second buffer manager (I only really do it on my send side). As for the “CheckIn” “CheckOut” being too slow to say check out 10 buffers I generally hold my buffers while my connection is open and reuse them on the connection. I have an unusual case though in that I am sending hundreds to thousands of messages a second and it is rare that I don’t have an operation executing in my server.

    2. I admit in hindsight that the LockFreeStack was a bit of a premature optimization on my part. This is why I provided the version with just a normal stack and monitor as well. Since I maintain my buffers with connections and don’t checkin/out on every call I really don’t call into this as much as I originally thought that I would. In practice I really don’t call checkout very much.

    3. sounds more like a design decision but would be nice as something to support perhaps another method TryCheckOutWithoutResize(int Timeout)

    4. Yes I used raw ArraySegment<byte> but not in my system actually running this. I use a class to encapsulate it there, for the example I tried to limit coupling to other objects as much as possible.

    5. per server gc, I have always used it for my server, I have never even thought to try the workstation GC. Rico has a great article set on server GC

    blogs.msdn.com/…/156626.aspx

    blogs.msdn.com/…/234273.aspx

    and most interesting of all for you  …

    blogs.msdn.com/…/workstation-gc-for-server-applications.aspx

  15. Jared Allen says:

    Nice work, i’m also working on high performance servers in .NET and i’ve found the info that you and JD Conley extremely interesting.

    I am currently implementing my own buffer pooling similar to your here. I’ve added a number of extra features to my pool that i think should help performance.

    1. using different sized fixed buffers. when a request is made a buffer with “Best fit” is selected for use. This is useful for me as 80% time the buffers required are small < 1024bytes, but sometimes a much large buffer is required 32kb, so to avoid having to do multiple requests to the buffer and chaining them all together i can return a larger buffer. I also preallocate different numbers of each buffer size based on expected usage. e.g 1000x1024 + 128*32kb etc

    2. I'm not 100% confident in lockfree data structures. so i still need to lock the buffer lists\stacks etc. But this can cause lock contention and really slow down memory requests if there are alot of threads being used. So i separate my buffer pool into "apartments" each apartment effectively has it's own buffers that it owns. Then when a memory request is made i can select an apartment to use - either based on thread id of best fit\most free buffers. I only need to lock the single apartment for the request... though i'm thinking i can go one level further and only lock a single buffer list inside the apartment - (apartment contains multiple buffer lists of different sized buffers)

    3. I also don't want to just blindly allocate additional buffers if there were none free. This is a case i really wanted to avoid so that memory usage can be completely controlled and even denied if it will make a negative impact on the quality of service of the other clients. So i am prepared to wait for a buffer to free up if required ( <100ms ) so when ever buffers are returned to the pool i trigger a AutoResetEvent to notify any waiting threads. - if you choose the size of your buffer pool correctly based on performance and usage statistics this should be a rare case - burst\high demand patterns. And in these cases this helps because it means your server wont crash with out of memory errors or be over run by requests.

    4. you are using raw ArraySegment as the items in the pool. I suggest wrapping the buffer in a class that implements IDisposable. So that you can make sure the buffer always gets returned to the pool. I had a problem where my BeginSend was failing so the EndSend callback never got called – which is where i would normally return the buffer to the pool.

    I’m interested to know if you or anyone has used the “Server GC” instead of the normal workstation version and have any performance stats? – mscorsvr.dll

    Jared.

  16. Greg, excellent example. I can use this in a “memcached” – like set of services on a webfarm where eech service truly syncs all the others when it receives a cache operation.

  17. Greg says:

    its coming … I just have to do the test runs with SOS to show the difference it provides in heap fragmentation I hope to have a few minutes tonight to run that.

    Cheers,

    Greg

  18. Robert says:

    I’d still like to see this when you get a chance.

  19. Jon T says:

    Any news on the example? *Been waiting everyday ;o)*

  20. Alex says:

    I second Robert’s request

  21. Alex says:

    I second Robert’s request

  22. Greg says:

    Will do might take me a day or two.

    Ceers,

    Greg

  23. Robert says:

    I would like to see an example of an Async Socket server using this as well. This is good stuff, thanks.

  24. Ryan E says:

    I could really use some example code on how to use this. I have implemented it to the best of my ability, but the program I am writing still suffers from quickly growing memory space when using BeginSend.

  25. Ryan says:

    It would be great to get an example of this manager in use on something like an Asyc Socket server.

  26. Jon T says:

    Hi Greg. Thanks for sharing your buffer pool!

    I have a few questions regarding your buffer management. I’m creating a game server where each user connected has an instance of a class. Should I make a Buffer Pool for each user? Or should I have a large one that every user uses?

    Secondly, would you be so kind to give us an short example of usage? How I would send and receive packet using the buffer pool?

    Thank you very much!
    -Jon

  27. Greg says:

    I have a class that encapulates my connection … in this class I have send/receive buffers

    List m_SendBuffers
    List
    m_ReceiveBuffers

    when my connection is started I request buffers from the manager

    for(int i=0;i m_SendBuffers.Add(BufferManager.CheckOut());
    }

    for(int i=0;i m_ReceiveBuffers.Add(BufferManager.CheckOut());
    }

    then I just use my BeginReceive etc calls passing the IList (link in post above to overload)

    There are times where I adjust my buffer sizes by adding or removing. When the connection is closed just remember to call CheckIn() for all checked out buffers

  28. Shweta says:

    Hi Greg , First of all thanks for such a informative post. I am running into similar problem which i attribute to the pinning behaviour of buffers in the socket library.My server crashes after few hours of load (which includes lots of incoming and outgoing messages) and at the time of crash i see in perfmon that the # of pinned objects is pretty high (around 4000 obj)and it remains in that for a while and boom the server crashes. I was thinking of implementing something which you have shown , it would be great if you could show me snippets of the client code (the begin recieve part may be ? – if you have time. Your post is very helpful

  29. JD Conley says:

    Yeah, there are certainly some CPU/Memory bound trade-offs with the stream wrapper. :)

  30. Greg says:

    I’ll discuss the performance implications of that more in another post :)

  31. JD Conley says:

    Finally! :) We’ve been talking about this thing for ages. It should be noted that the best place to put this for extensibility is in a stream inheriting from NetworkStream. That way you can use the early-allocated buffers on your socket through SslStream, NegotateStream, and the like.

  32. Greg says:

    Fixed code which wasn’t posted right for first 5-10 minutes and added second code listing.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>