Why waiting for asynchronous IO is evil

Are you doing IO? If so, who’s waiting on it to complete? IO looks easy on the surface but if you in any way intend to have a fully scaleable application you’ll soon see that this is a delicate and tricky concept. To illustrate this, let’s make a web server real quickly using low level IO!

Old-school IO

using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Text;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        var tcpListener = new TcpListener(IPAddress.Any, 1337);
        tcpListener.Start();
        while (true)
        {
            var tcpClient = tcpListener.AcceptTcpClient();
            new Thread(() => HandleRequest(tcpClient)).Start();
        }
    }

    public static void HandleRequest(TcpClient client)
    {
        try
        {
            // Read the entire header chunk
            var stream = client.GetStream();
            var headers = new StringBuilder();
            while (!(headers.Length > 4 && headers.ToString(headers.Length - 4, 4) == "\r\n\r\n"))
            {
                headers.Append(Encoding.ASCII.GetChars(new[] {(byte) stream.ReadByte()}));
            }

            // Find out what was requested in the first line
            // Assume GET xxxx HTTP/1.1         
            var path = new string(headers.ToString().Skip("GET ".Length).TakeWhile(c => !Char.IsWhiteSpace(c)).ToArray());

            // Read the file and serve it back with the minimal headers
            var file = new FileInfo(path);
            var fileStream = file.OpenRead();

            // Minimal headers
            var responseHeaders = Encoding.ASCII.GetBytes(
                string.Format("HTTP 200 OK\r\n" + "Content-Length: {0}\r\n" + "\r\n", file.Length));
            stream.Write(responseHeaders, 0, responseHeaders.Length);
            fileStream.CopyTo(stream);
            fileStream.Close();
        } 
        catch {} 
        finally
        {
            client.Close();
        }
    }
}

Code is easy to read, and straight to follow. But this simplicity is deceptive. You see, what will actually go on here is waiting and locked threads. And locked threads are bad stuff. The read itself will not really run on your thread, instead .NET will be smart and use an IO completion port to get your data from the network if the data isn’t immediately available. This means that you’re wasting a thread. Threads are expensive resources. Not only do they cause context switching but they also each will incur it’s own stack space. This implementation of a web server will never scale because of memory useage.

Waiting is evil

Every time you wait, you are locking up a resource. It’s an important point to make, since the simplicity of the synchronous functions present such a lure to the developer. So we need to make use of the asynchronous pattern, either using the task parallel library or by the Begin/End function pairs. Trouble is that this also presents a way too easy to access waiting handle. If you’re doing an application that needs to scale and that needs to be able to handle lots of IO you can’t do any waiting.

In fact, the task parallel library presents another very nasty gotcha. If you were to wrap code that does waiting inside tasks you are screwing over the task thread pool by occupying tasks in waiting and preventing new tasks from starting. This leads to thread pool starvation and an unresponsive application. When you use the TPL for IO you need to create tasks that encapsulate a singular IO operation in a task using Task.FromAsync in order to make sure that the background IO runs without consuming a thread for waiting.

Thinking asynchronously

The great thing about doing IO async is that if data is not available the function to get it won’t be run by you. You’ll get your stuff back in a callback. This callback runs on the internal worker thread pool. This pool is something you do NOT want to do any sort of long running operations on. It needs to be available to other things.

This has a few other interesting applications. Since you can’t wait for things, iterating becomes really awkward. Consider the loop that reads each byte from the stream to get the header block. You can’t make a loop anymore, since going back and iterating means that the thread that holds control over the loop needs to wait for the result of the operation. So, iterating needs to be accomplished using recursion.

Error handling also becomes difficult. If you throw an exception from inside an AsyncCallback often the application will die straight away. There won’t be a callstack for the exception to propagate back on since the callback has been initiated by the system when the async operation completed asynchronously.

An asynchronous web server

using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Text;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        var tcpListener = new TcpListener(IPAddress.Any, 1337);
        tcpListener.Start();
        AcceptClient(tcpListener);

        // To avoid the program exiting
        Thread.Sleep(Timeout.Infinite);
    }

    private static void AcceptClient(TcpListener tcpListener)
    {
        tcpListener.BeginAcceptTcpClient(result =>
        {
            var client = tcpListener.EndAcceptTcpClient(result);
            var stream = client.GetStream();

            // Start next connection attempt
            AcceptClient(tcpListener);

            var buffer = new byte[1];
            var headers = new StringBuilder();

            Action readAction = null;
            readAction = () => stream.BeginRead(buffer, 0, 1, readResult =>
            {
                stream.EndRead(readResult);
                headers.Append(Encoding.ASCII.GetString(buffer));
                if (!(headers.Length > 4 && headers.ToString(headers.Length - 4, 4) == "\r\n\r\n"))
                {
                    readAction();   // Recurse to read one more byte
                }
                else
                {
                    // Assume GET xxxx HTTP/1.1         
                    var path = new string(headers.ToString().Skip("GET ".Length).TakeWhile(c => !Char.IsWhiteSpace(c)).ToArray());

                    // Read the file and serve it back with the minimal headers
                    if (!File.Exists(path))
                    {
                        stream.Close();
                        return;
                    }
                    var file = new FileInfo(path);
                    var fileStream = file.OpenRead();

                    // Minimal headers
                    var responseHeaders = Encoding.ASCII.GetBytes(
                        string.Format("HTTP 200 OK\r\n" + "Content-Length: {0}\r\n" + "\r\n", file.Length));
                    stream.BeginWrite(responseHeaders, 0, responseHeaders.Length, writeResult =>
                    {
                        stream.EndWrite(writeResult);
                        byte[] fileBuffer = new byte[file.Length];
                        fileStream.BeginRead(fileBuffer, 0, (int)file.Length, fileReadResult =>
                        {
                            fileStream.EndRead(fileReadResult);

                            stream.BeginWrite(fileBuffer, 0, fileBuffer.Length, contentWriteResult =>
                            {
                                stream.EndWrite(contentWriteResult);
                                fileStream.Close();
                                stream.Close();
                            }, stream);
                        }, fileStream);
                    }, stream);
                }
            }, stream);
            readAction();

        }, tcpListener);
    }
}

The code above is obviously for demonstrative purposes. Generally it’s not a good idea to read single bytes from streams, and in this case it’s an especially bad idea since it generates an impressive call stack from the recursion. But it shows the general idea on how you should code to achieve an IO-bound application that will scale. There are no explicit threading. No thread is ever in a waiting state. The application becomes entirely reactive to IO instead of reading and waiting for IO to complete. Interesting things happen when you start to run this and break and a random point. All threads in the worker thread pool are normally doing nothing, which is great because this means that they’re available to the system to quickly process IO callbacks. A request to this server will not spin up a thread. The memory usage will be kept absolutely minimal.

The situation improves somewhat with the new .NET 4.5 which has the async/await keywords built in. Improves in the way that the syntax becomes nicer but the lessons still hold true. If you’re waiting on anything you’re killing your application with wait handles and no amount of async keywords are going to rescue you. It’s a pity that most of the examples of doing asynchronous operations in the documentation often show off ways to use WaitOne that pretty much totally defeats the purpose of being asynchronous in the first place.

Advertisements
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: