About six weeks ago we had a bizarre randomly failing integration test over at http://geteventstore.com.
The test would bring up a node. And it was doing some writing/reading/verifying. It would fail maybe 3% of the time. In trying to debug the problem we came across some quite interesting results when it would actually fail!
During a failure we found that the client was receiving back a message that normally it sends to the server. This was quite frightening for us because we do a lot of things with re-using of buffers as opposed to making new buffers every time in both our client and our server (we reuse socketargs/buffers/etc). Our initial thought was crap we have a usually not hit threading bug somewhere in our heavily multi-threaded code this should be fun!
After some looking around though we found that it was not in fact anything wrong with our buffers. We were getting hit with a TCP self connect! Most people have never heard of a self connect or would even imagine that its possible but as we were running the server and the client on ephemeral ports the client was actually connecting to itself! Anything you sent through the client would end up in the client’s receive.
You can actually see this behaviour in the tcp state diagram (syn/syn+ack).
This ended up costing us a ton of time. When we found it after debugging through (and philosophically debugging thousands of lines of multi-threaded buffer caching code). This is something to keep in mind when debugging tcp code, we knew tcp fairly well but had never run into this particular problem. Want to try it yourself?
while true do telnet 127.0.0.1 50000 done
Want to prevent getting bit by it? Our mistake was the server was running on an ephemeral port. If we didn't run on an ephemeral port in these tests we never would have been hit by the bug.