Monthly Archives: August 2012

Using IDisposable with Autofac

Using IDisposable with Autofac presents a few challenges and requires you to really think about lifetime scopes to avoid memory leaks. These days, with pretty excellent garbage collection built into .NET the age-old process of finding memory leaks is usually something that you need not worry about anymore. However, the contract of an IDisposable is something that still requires manual release. And if you’re ever resolving anything deriving from IDisposable using Autofac you’re going to run into problems.

Take a look at this code. What does it do?

using System;
using Autofac;

namespace AutoFacDisposableDemo
{
    public interface IResource : IDisposable 
    {
    }

    public class MyResource : IResource
    {
        private byte[] _lotsOfMemory;

        public MyResource()
        {
            _lotsOfMemory = new byte[1000000];
        }

        public void Dispose()
        {
            _lotsOfMemory = null;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var builder = new ContainerBuilder();
            builder.RegisterType<MyResource>().As<IResource>();
            var container = builder.Build();

            while (true)
            {
                using (container.Resolve<IResource>())
                {
                }
            }
        }
    }
}

At a first glance, it seems to do nothing but feed the garbage collector. But take a look at the memory usage. It will be constantly increasing, and you will run out of memory! If you try removing the IDisposable interface from the IMyResource this code will stop running out of memory. So why does this happen?

Autofac manages your IDisposables

Yes, Autofac tries to be smart and will actually contain a reference to the object internally whenever you resolve a component that is deriving from IDisposable. This is because autofac doesn’t know what other objects might be referencing your resource and you haven’t told it anything on when it is supposed to go out of scope. Especially if autofac is wired in such a way to create non-transient instances where many could be using your disposable object and only the last usage should dispose of it.

This happens transparently, and because you’ve normally done what is usually the right thing and called Dispose on it, you have released the expensive resources on it – leaving only a small skeleton object floating around that never will be garbage collected. This is scary because the memory leak isn’t huge and obvious like forgetting to dispose of sockets that show up pretty quickly. If you run this through a memory profiler it will be held by some internal IDisposable stack somewhere that is rooted to a closure somewhere deep down.

It really is not a solution to try to work around this by removing the from the interface, since that will cause all sorts of problems down the road – breaking the semantics in the processs. Instead what you need to do is to use lifetime scopes. If you change the main loop to this it will run without a leak:

while (true)
{
    using (var scope = container.BeginLifetimeScope())
    {
       var myResource = scope.Resolve<IResource>();
    }
}

Note that we are resolving from the opened scope, and disposing the scope instead of the allocated resources. This is all fine but a bit simplistic. What happens if we are using factory functions instead?

public interface IResourceFactory
{
    IResource GetResource();
}

public class MyResourceFactory : IResourceFactory
{
    private readonly Func<IResource> _newResource;

    public MyResourceFactory(Func<IResource> newResource)
    {
        _newResource = newResource;
    }

    public IResource GetResource()
    {
        return _newResource();
    }
}

class Program
{
    static void Main(string[] args)
    {
        var builder = new ContainerBuilder();
        builder.RegisterType<MyResource>().As<IResource>();
        builder.RegisterType<MyResourceFactory>().As<IResourceFactory>();
        var container = builder.Build();

        var factory = container.Resolve<IResourceFactory>();

        while (true)
        {
            using (var scope = container.BeginLifetimeScope())
            {
                var myResource = factory.GetResource();
            }
        }
    }
}

Out of memory again. Autofac will give you a Func that does new for you in a sense. But that Func is dynamically created to make objects that have the same lifetime scope as the factory object – not the lifetime scope that you called it in! This makes prefect sense in a way, since you can have multiple lifetime scopes going on at the same time – even nested since you can create a lifetime scope from another lifetime scope. Changing it to this will eliminate the problem:

class Program
{
    static void Main(string[] args)
    {
        var builder = new ContainerBuilder();
        builder.RegisterType<MyResource>().As<IResource>();
        builder.RegisterType<MyResourceFactory>().As<IResourceFactory>();
        var container = builder.Build();

        while (true)
        {
            using (var scope = container.BeginLifetimeScope())
            {
                var factory = scope.Resolve<IResourceFactory>();
                var myResource = factory.GetResource();
            }
        }
    }
}

Singletons

Ah, the global variables of the 21st century. What happens if you make the resource factory into a singleton?

var builder = new ContainerBuilder();
builder.RegisterType<MyResource>().As<IResource>();
builder.RegisterType<MyResourceFactory>().As<IResourceFactory>().SingleInstance();
var container = builder.Build();

Bam. Out of memory again! Singletons might be better than globals, but it’s still not a very good idea. The func will be bound to the top lifetime scope and all the IDisposables that gets created are also bound to that scope regardless of how many times you call Dispose on them. A better idea would be to use InstancePerLifetimeScope instead. This removes the problem but also causes the factory to be instantiated several times. Singletons are generally a bad idea, since you can’t be sure who is going to be adding a dependency on an IDisposable and cause memory or resource leak.

More options

There is a Owned class that you can resolve for. So, if you resolve for Owned instead, you are required to release the resource yourself and autofac does no effort to keep the reference in memory any more. Just make sure you call Dispose on the Owned object instead of the internal IResource.

Creating lifetime scopes

You don’t want to pass your scopes around, so you can get the lifetime scope injected for you if you take a dependency on the LifetimeScope object. If you do so, the current lifetime scope from the Resolve will be passed to the constructor from which you can derive more child lifetime scopes or Resolve objects given the current scope. It leaves a bad taste since this has all the trademarks of showing your container since the scope will allow you to create any instance of any type without making an explicit dependency. It would be a better solution to try to avoid this and rely on auto-generated Funcs to create objects in the correct lifetime scope.

In conclusion

  • Be really careful. The leaks are not obvious, and if you’re using other peoples code to inject you can never be sure when they’re using something that is disposable.
  • If you’re using factories, they cannot be singletons if used to create anything IDisposable. In fact, I’d avoid them in general since it’s way too easy to pass a factory from a different scope into a child scope and using that to create objects that will be leaking.
  • Find distinct units-of-work and begin and end lifetime scopes there. This is the place to resolve objects that are all in the same scope.
  • Don’t dispose of injected IDisposables manually. If you work with autofac instead of against it the objects should dispose of them for you. This makes you safe for whenever someone decides to add another usage of your object or change the number of instances created. If you absolutely need to dispose of it manually – make use of the Owned class, and get an object that you yourself are responsible for.
Tagged , , , ,

Why waiting for asynchronous IO is evil

Are you doing IO? If so, who’s waiting on it to complete? IO looks easy on the surface but if you in any way intend to have a fully scaleable application you’ll soon see that this is a delicate and tricky concept. To illustrate this, let’s make a web server real quickly using low level IO!

Old-school IO

using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Text;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        var tcpListener = new TcpListener(IPAddress.Any, 1337);
        tcpListener.Start();
        while (true)
        {
            var tcpClient = tcpListener.AcceptTcpClient();
            new Thread(() => HandleRequest(tcpClient)).Start();
        }
    }

    public static void HandleRequest(TcpClient client)
    {
        try
        {
            // Read the entire header chunk
            var stream = client.GetStream();
            var headers = new StringBuilder();
            while (!(headers.Length > 4 && headers.ToString(headers.Length - 4, 4) == "\r\n\r\n"))
            {
                headers.Append(Encoding.ASCII.GetChars(new[] {(byte) stream.ReadByte()}));
            }

            // Find out what was requested in the first line
            // Assume GET xxxx HTTP/1.1         
            var path = new string(headers.ToString().Skip("GET ".Length).TakeWhile(c => !Char.IsWhiteSpace(c)).ToArray());

            // Read the file and serve it back with the minimal headers
            var file = new FileInfo(path);
            var fileStream = file.OpenRead();

            // Minimal headers
            var responseHeaders = Encoding.ASCII.GetBytes(
                string.Format("HTTP 200 OK\r\n" + "Content-Length: {0}\r\n" + "\r\n", file.Length));
            stream.Write(responseHeaders, 0, responseHeaders.Length);
            fileStream.CopyTo(stream);
            fileStream.Close();
        } 
        catch {} 
        finally
        {
            client.Close();
        }
    }
}

Code is easy to read, and straight to follow. But this simplicity is deceptive. You see, what will actually go on here is waiting and locked threads. And locked threads are bad stuff. The read itself will not really run on your thread, instead .NET will be smart and use an IO completion port to get your data from the network if the data isn’t immediately available. This means that you’re wasting a thread. Threads are expensive resources. Not only do they cause context switching but they also each will incur it’s own stack space. This implementation of a web server will never scale because of memory useage.

Waiting is evil

Every time you wait, you are locking up a resource. It’s an important point to make, since the simplicity of the synchronous functions present such a lure to the developer. So we need to make use of the asynchronous pattern, either using the task parallel library or by the Begin/End function pairs. Trouble is that this also presents a way too easy to access waiting handle. If you’re doing an application that needs to scale and that needs to be able to handle lots of IO you can’t do any waiting.

In fact, the task parallel library presents another very nasty gotcha. If you were to wrap code that does waiting inside tasks you are screwing over the task thread pool by occupying tasks in waiting and preventing new tasks from starting. This leads to thread pool starvation and an unresponsive application. When you use the TPL for IO you need to create tasks that encapsulate a singular IO operation in a task using Task.FromAsync in order to make sure that the background IO runs without consuming a thread for waiting.

Thinking asynchronously

The great thing about doing IO async is that if data is not available the function to get it won’t be run by you. You’ll get your stuff back in a callback. This callback runs on the internal worker thread pool. This pool is something you do NOT want to do any sort of long running operations on. It needs to be available to other things.

This has a few other interesting applications. Since you can’t wait for things, iterating becomes really awkward. Consider the loop that reads each byte from the stream to get the header block. You can’t make a loop anymore, since going back and iterating means that the thread that holds control over the loop needs to wait for the result of the operation. So, iterating needs to be accomplished using recursion.

Error handling also becomes difficult. If you throw an exception from inside an AsyncCallback often the application will die straight away. There won’t be a callstack for the exception to propagate back on since the callback has been initiated by the system when the async operation completed asynchronously.

An asynchronous web server

using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Text;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        var tcpListener = new TcpListener(IPAddress.Any, 1337);
        tcpListener.Start();
        AcceptClient(tcpListener);

        // To avoid the program exiting
        Thread.Sleep(Timeout.Infinite);
    }

    private static void AcceptClient(TcpListener tcpListener)
    {
        tcpListener.BeginAcceptTcpClient(result =>
        {
            var client = tcpListener.EndAcceptTcpClient(result);
            var stream = client.GetStream();

            // Start next connection attempt
            AcceptClient(tcpListener);

            var buffer = new byte[1];
            var headers = new StringBuilder();

            Action readAction = null;
            readAction = () => stream.BeginRead(buffer, 0, 1, readResult =>
            {
                stream.EndRead(readResult);
                headers.Append(Encoding.ASCII.GetString(buffer));
                if (!(headers.Length > 4 && headers.ToString(headers.Length - 4, 4) == "\r\n\r\n"))
                {
                    readAction();   // Recurse to read one more byte
                }
                else
                {
                    // Assume GET xxxx HTTP/1.1         
                    var path = new string(headers.ToString().Skip("GET ".Length).TakeWhile(c => !Char.IsWhiteSpace(c)).ToArray());

                    // Read the file and serve it back with the minimal headers
                    if (!File.Exists(path))
                    {
                        stream.Close();
                        return;
                    }
                    var file = new FileInfo(path);
                    var fileStream = file.OpenRead();

                    // Minimal headers
                    var responseHeaders = Encoding.ASCII.GetBytes(
                        string.Format("HTTP 200 OK\r\n" + "Content-Length: {0}\r\n" + "\r\n", file.Length));
                    stream.BeginWrite(responseHeaders, 0, responseHeaders.Length, writeResult =>
                    {
                        stream.EndWrite(writeResult);
                        byte[] fileBuffer = new byte[file.Length];
                        fileStream.BeginRead(fileBuffer, 0, (int)file.Length, fileReadResult =>
                        {
                            fileStream.EndRead(fileReadResult);

                            stream.BeginWrite(fileBuffer, 0, fileBuffer.Length, contentWriteResult =>
                            {
                                stream.EndWrite(contentWriteResult);
                                fileStream.Close();
                                stream.Close();
                            }, stream);
                        }, fileStream);
                    }, stream);
                }
            }, stream);
            readAction();

        }, tcpListener);
    }
}

The code above is obviously for demonstrative purposes. Generally it’s not a good idea to read single bytes from streams, and in this case it’s an especially bad idea since it generates an impressive call stack from the recursion. But it shows the general idea on how you should code to achieve an IO-bound application that will scale. There are no explicit threading. No thread is ever in a waiting state. The application becomes entirely reactive to IO instead of reading and waiting for IO to complete. Interesting things happen when you start to run this and break and a random point. All threads in the worker thread pool are normally doing nothing, which is great because this means that they’re available to the system to quickly process IO callbacks. A request to this server will not spin up a thread. The memory usage will be kept absolutely minimal.

The situation improves somewhat with the new .NET 4.5 which has the async/await keywords built in. Improves in the way that the syntax becomes nicer but the lessons still hold true. If you’re waiting on anything you’re killing your application with wait handles and no amount of async keywords are going to rescue you. It’s a pity that most of the examples of doing asynchronous operations in the documentation often show off ways to use WaitOne that pretty much totally defeats the purpose of being asynchronous in the first place.

Tagged ,