Leaking Connections Part 2: The Leakening

November 10. 2015 0 Comments

And now, for the thrilling conclusion! The thrilling conclusion to what you might ask? Check back here for the first part in this saga.

I mentioned very briefly in the first part of this epic that we attempted to fix a strange threading bug with token generation via the ASP.NET Identity Framework by making sure that the Entity Framework DbContext was fully initialized (i.e. model created, connection established, etc) before it left the factory method. Initial tests were promising, but it turns out this did not fix the issue.

I mention this because I had absolutely no luck reproducing the connection leak when I was running locally (with or without a profiler attached). I could easily force timeouts when getting a connection from the pool (because it was too busy), but I couldn’t reproduce the apparent situation where there were connections established that could not be actively used.

When going through the combination of CloudWatch logs for RDS (to track connection usage) and our own ELK stack I found a correlation between the errors that sometimes occurred when generating tokens and the increase in the usage of connections. This pattern was pretty consistent. Whenever there was a cluster of errors related to token generation, there was an increase in the total number of connections used by the service, which never went down again until the application pool was recycled at the default time of 29 hours from the last recycle.

Token Failure

We’ve been struggling with the root cause of the token generation failures for a while now. The most annoying part is that it doesn’t fail all the time. In fact, my initial load tests showed only around a 1% failure rate, which is pretty low in the scheme of things. The problem manifests itself in exceptions occurring when a part of the Identity Framework attempts to use the Entity Framework DbContext that it was given. It looks as though there is some sort of threading issue with Entity Framework, which makes sense conceptually. Generally EF DbContext objects are not thread safe, so you shouldn’t attempt to use them on two different threads at the same time.

The errors were many and varied, but all consistently come from our implementation of the OAuthAuthorizationServerProvider. A few examples are below:

System.Data.Entity.Core.EntityCommandExecutionException: An error occurred while executing the command definition. See the inner exception for details. ---> System.InvalidOperationException: Operation is not valid due to the current state of the object.
   at Npgsql.NpgsqlConnector.StartUserAction(ConnectorState newState)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReaderInternal(CommandBehavior behavior)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed)
   at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.Reader(DbCommand command, DbCommandInterceptionContext interceptionContext)
   at System.Data.Entity.Core.EntityClient.Internal.EntityCommandDefinition.ExecuteStoreCommands(EntityCommand entityCommand, CommandBehavior behavior)
   --- End of inner exception stack trace ---
   at System.Data.Entity.Core.EntityClient.Internal.EntityCommandDefinition.ExecuteStoreCommands(EntityCommand entityCommand, CommandBehavior behavior)
   at System.Data.Entity.Core.Objects.Internal.ObjectQueryExecutionPlan.Execute[TResultType](ObjectContext context, ObjectParameterCollection parameterValues)
   at System.Data.Entity.Core.Objects.ObjectContext.ExecuteInTransaction[T](Func`1 func, IDbExecutionStrategy executionStrategy, Boolean startLocalTransaction, Boolean releaseConnectionOnSuccess)
   at System.Data.Entity.Core.Objects.ObjectQuery`1.<>c__DisplayClass7.<GetResults>b__5()
   at System.Data.Entity.Core.Objects.ObjectQuery`1.GetResults(Nullable`1 forMergeOption)
   at System.Data.Entity.Core.Objects.ObjectQuery`1.<System.Collections.Generic.IEnumerable<T>.GetEnumerator>b__0()
   at System.Data.Entity.Internal.LazyEnumerator`1.MoveNext()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at System.Linq.Queryable.FirstOrDefault[TSource](IQueryable`1 source, Expression`1 predicate)
   at [OBFUSCATION!].Infrastructure.Repositories.AuthorizationServiceRepository.GetApplicationByKey(String appKey, String appSecret) in c:\[OBFUSCATION!]\Infrastructure\Repositories\AuthorizationServiceRepository.cs:line 412
   at [OBFUSCATION!].Infrastructure.Providers.AuthorizationServiceProvider.ValidateClientAuthentication(OAuthValidateClientAuthenticationContext context) in c:\[OBFUSCATION!]\Infrastructure\Providers\AuthorizationServiceProvider.cs:line 42
   
System.NullReferenceException: Object reference not set to an instance of an object.
   at Npgsql.NpgsqlConnector.StartUserAction(ConnectorState newState)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReaderInternal(CommandBehavior behavior)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed)
   at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.Reader(DbCommand command, DbCommandInterceptionContext interceptionContext)
   at System.Data.Entity.Core.EntityClient.Internal.EntityCommandDefinition.ExecuteStoreCommands(EntityCommand entityCommand, CommandBehavior behavior)
   at System.Data.Entity.Core.Objects.Internal.ObjectQueryExecutionPlan.Execute[TResultType](ObjectContext context, ObjectParameterCollection parameterValues)
   at System.Data.Entity.Core.Objects.ObjectContext.ExecuteInTransaction[T](Func`1 func, IDbExecutionStrategy executionStrategy, Boolean startLocalTransaction, Boolean releaseConnectionOnSuccess)
   at System.Data.Entity.Core.Objects.ObjectQuery`1.<>c__DisplayClass7.<GetResults>b__5()
   at System.Data.Entity.Core.Objects.ObjectQuery`1.GetResults(Nullable`1 forMergeOption)
   at System.Data.Entity.Core.Objects.ObjectQuery`1.<System.Collections.Generic.IEnumerable<T>.GetEnumerator>b__0()
   at System.Data.Entity.Internal.LazyEnumerator`1.MoveNext()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at System.Linq.Queryable.FirstOrDefault[TSource](IQueryable`1 source, Expression`1 predicate)
   at [OBFUSCATION!].Infrastructure.Repositories.AuthorizationServiceRepository.GetApplicationByKey(String appKey, String appSecret) in c:\[OBFUSCATION!]\Infrastructure\Repositories\AuthorizationServiceRepository.cs:line 412
   at [OBFUSCATION!].Infrastructure.Providers.AuthorizationServiceProvider.ValidateClientAuthentication(OAuthValidateClientAuthenticationContext context) in c:\[OBFUSCATION!]\Infrastructure\Providers\AuthorizationServiceProvider.cs:line 42

In the service, this doesn’t make a huge amount of sense. There is one DbContext created per request (via Owin), and while the Owin middleware is asynchronous by nature (meaning that execution can jump around between threads) there is no parallelism. The DbContext should not be being used on multiple threads at one time, but apparently it was.

It was either that, or something was going seriously wrong in the connection pooling code for Npgsql.

Scope Increase

As I didn’t quite understand how the dependency injection/object lifetime management worked via the OwinContext, I had my suspicions that something was going awry there. Either the DbContext was not in fact generated once per request, or there was some strange race condition that allowed a DbContext to be reused on more than one thread.

As decided to rewrite the way in which dependencies are obtained in the service. Instead of generating a DbContext per request, I would supply a DbContextFactory to everything, and let it all generate its own, temporarily scoped DbContext that it is responsible for disposing.

In order to accomplish this I switched to an IoC container that I was more familiar with, Ninject. Not a small amount of work, and not without added complexity, but I felt that it made the code more consistent with the rest of our code bases and generally better.

In retrospect, I should have verified that I could reproduce the token generation errors at will first, but I didn’t. I wrote the test after I’d spent the better part of a day switching out the dependency injection mechanisms. This was a mistake.

Since the errors always occurred during the execution of a single endpoint, I wrote a test that uses 10 tasks to spam that particular endpoint. If none of the tasks fault within a time limit (i.e. no exceptions are thrown), then the test is considered a success. Basically a very small, focused, stress test to be run automatically as part of our functional test suite.

[Test]
[Category("functional")]
public void WhenAttemptingToGenerateMultipleTokensAtTheSameTime_NoRequestsFail()
{
    var authClientFactory = _resolver.Get<IAuthClientFactory>();
    var app = new ApplicationBuilder(authClientFactory.CreateSeeder())
        .WithRole("auth_token_generate")
        .WithRole("auth_customer_register")
        .WithRole("auth_database_register")
        .WithRole("auth_user_register")
        .WithRole("auth_query")
        .Build();

    var userBuilder = new UserBuilder(authClientFactory.CreateFromApplication(app.ApplicationDetails.Key, app.ApplicationSecret));
    userBuilder.Build();

    List<Task> tokenGenerationTasks = new List<Task>();
    var cancellation = new CancellationTokenSource();
    for (int i = 0; i < 10; i++)
    {
        var task = Task.Factory.StartNew
        (
            () =>
            {
                var client = authClientFactory.CreateFromApplication(app.ApplicationDetails.Key, app.ApplicationSecret);
                while (true)
                {
                    if (cancellation.Token.IsCancellationRequested) break;
                    var token = client.TokenGenerate(userBuilder.CustomerId + "/" + userBuilder.DatabaseId + "/" + userBuilder.UserCode, userBuilder.Password);
                }
            },
            cancellation.Token,
            TaskCreationOptions.LongRunning,
            TaskScheduler.Default
        );

        tokenGenerationTasks.Add(task);
    }

    // The idea here is that if any of the parallel token generation tasks throw an exception, it will come out here
    // during the wait.
    Task.WaitAny(tokenGenerationTasks.ToArray(), TimeSpan.FromSeconds(15));
    cancellation.Cancel();

    var firstFaulted = tokenGenerationTasks.FirstOrDefault(a => a.IsFaulted);
    if (firstFaulted != null) throw firstFaulted.Exception;
}

The first time I ran the test against a local service it passed successfully…

Now, I don’t know about anyone else, but when a test works the first time I am immediately suspicious.

I rolled my changes back and ran the test again, and it failed.

So my restructuring successfully fixed the issue, but why?

The Root Of The Problem

I hadn’t actually understood the issue, all I did was make enough changes such that it seemed to no longer occur. Without that undestanding, if it recurred, I would have to start all over again, possibly misdirecting myself with the changes that I made last time.

Using the test that guaranteed a reproduction, I investigated in more depth. Keeping all of my changes reverted, I was still getting a weird sampling of lots of different errors, but they were all consistently coming from one of our repositories (classes which wrap a DbContext and add extra functionality) whenever it was used within our OAuthAuthorizationServerProvider implementation.

Staring at the code for a while, the obviousness of the issue hit me.

At startup, a single OAuthAuthorizationServerProvider implementation is created and assigned to generate tokens for requests to the /auth/token endpoint.

This of course meant that all of the functions in that provider needed to be thread safe.

They were not.

Of the two functions in the class, both set and then used a class level variable, which in turn had a dependency on a DbContext.

This was the smoking gun. If two requests came in quickly enough, one would set the variable (using the DbContext for the request) the other would do the same (using a different DbContext) and then the first would attempt to use a different threads DbContext (indirectly through the variable). This would rightly cause an error (as multiple threads tried to use the same DbContext) and throw an exception, failing the token generation.

I abandoned my changes (though I will probably make them again over time), removed the class variable and re-ran the test.

It was all good. No errors at all, even after running for a few hours.

But why did the error cause a resource leak at the database connection level?

Leaky Logic

In the end I didn’t find out exactly why threading errors with Entity Framework (using Npgsq) were causing connection leaks. I plan to investigate in more depth in the future, and I’ll probably blog about my findings, but for now I was just happy to have the problem solved.

With the bug fixed, profiling over a period of at least 24 hours showed no obvious connection leaks as a result of normal traffic. Previously this would have guaranteed at least 10 connections leaking, possibly more. So for now the problem is solved and I need to move on

Summary

Chasing down resource leaks can be a real pain, especially when you don’t have a reliable reproduction.

If I had realised earlier that the token generation failures and connection leaks were related, I would have put more effort into reproducing the first in order to reproduce the second. It wasn’t immediately obviously that they were linked though, so I spent a lot of time analysing the code trying to figure out what could possibly be leaking valuable resources. This was a time consuming and frustrating process, ultimately leading nowhere.

Once I finally connected the dots between the token failures and the connection leak, everything came together, even if I didn’t completely understand why the connections were leaking in error situations.

Ah well, can’t win em all.

Designer Code

September 8. 2015 0 Comments

Posted in:
c#
WPF
designer

Its time to step away from web services, log aggregation and AWS for a little while.

Its time to do some UI work! Not HTML though unfortunately, Windows desktop software.

The flagship application that my team maintains is an old (15+ years) VB6 application. It brings in a LOT of money, but to be honest, its not the best maintained piece of software I’ve ever seen. Its not the worst either, which is kind of sad. Like most things it falls somewhere in the middle. More on the side of bad than good though, for sure.

In an attempt to keep the application somewhat current, it has a .NET component to it. I’m still not entirely sure how it works, but the VB6 code uses COM to call into some .NET functionality, primarily through a message bus (and some strongly typed JSON messages). Its pretty clever actually, even if it does lead to some really strange threading issues from time to time.

Over the years, good sized chunks of new functionality have been developed in .NET, with the UI in Windows Forms.

Windows Forms is a perfectly fine UI framework, and don’t let anyone tell you different. Sure it has its flaws and trouble spots, but on the whole, it works mostly as you would expect it to, and it gets the job done. The downside is that most Windows Forms UI’s look the same, and you often end up with business logic being tightly coupled to the UI. When your choice is Windows Forms or VB6 though, well, its not really a choice.

For our most recent project, we wanted to try something new. Well, new for the software component anyway, certainly not new to me, some of the other members of the team, or reality in general.

WPF.

Present that Framework

I’m not going to go into too much detail about WPF, but its the replacement for Windows Forms. It focuses on separating the actual presentation of the UI from the logic that drives it, and is extremely extensible. Its definitely not perfect, and the first couple of versions of it were a bit rough (Avalon!), but its in a good spot now.

Personally, I particularly like how it makes using the MVVM (Model-View-ViewModel) pattern easy, with its support for bindings and commands (among other things). MVVM is a good pattern to strive for when it comes to developing applications with complicated UI’s, because you can test your logic independent of the presentation. You do still need to test it all together obviously, because bindings are code, and you will make mistakes.

I was extremely surprised when we tried to incorporate a WPF form into the VB6 application via the .NET channels it already had available.

Mostly because it worked.

It worked without some insane workaround or other crazy thing. It just worked.

Even showing a dialog worked, which surprised me, because I remember having a huge amount of trouble with that when I was writing a mixed Windows Form/WPF application.

I quickly put together a fairly primitive framework to support MVVM (no need to invest into one of the big frameworks just yet, we’re only just starting out) and we built the new form up to support the feature we needed to expose.

That was a particularly long introduction, but what I wanted to talk about in this post was using design time variables in WPF to make the designer actually useful.

Intelligent Design

The WPF designer is…okay.

Actually, its pretty good, if a little bit flakey from time to time.

The problem I’ve encountered every time I’ve used MVVM with WPF is that a lot of the behaviour of my UI component is dependent on its current data context. Sure, it has default behaviour when its not bound to anything, but if you have error overlays, or a message channel used to communicate to the user, or expandable bits within an Items Control whose items have their own data templates, it can be difficult to visualise how everything looks just from the XAML.

It takes time to compile and run the application as well (to see it running in reality), even if you have a test harness that opens the component you are working on directly.

Tight feedback loops are easily one of the most important parts of developing software quickly, and the designer is definitely the quickest feedback loop for WPF by far.

Luckily, there are a number of properties that you can set on your UI component which will only apply at design time, and the most useful one is the design time DataContext.

This property, when applied, will set the DataContext of your component when it is displayed in the designer, and assuming you can write appropriate classes to interface with it, gives you a lot of power when it comes to viewing your component in its various states.

Contextual Data

I tend towards a common pattern when I create view models for the designer. I will create an interface for the view model (based on INotifyPropertyChanged), a default implementation (the real view model) and a view model creator or factory, specifically for the purposes of the designer. They tend to look like this (I’ve included the view model interface for completeness):

namespace Solavirum.Unspecified.ViewModel
{
    public interface IFooViewModel : INotifyPropertyChanged
    {
        string Message { get; }
    }
}

namespace Solavirum.Unspecified.View.Designer
{
    public class FooViewModelCreator
    {
        private static readonly _Instance = new FooViewModelCreator();
        
        public static IFooViewModel ViewModel
        {
            get
            {
                return _Instance.Create();
            }
        }
        
        public IFooViewModel Create()
        {
            return new DummyFooViewModel
            {
                Message = "This is a message to show in the designer."
            };
        }
        
        private class DummyFooViewModel : BaseViewModel, IFooViewModel
        {
            public string Message { get; set; }
        }
    }
}

As you can see, it makes use of a private static instance of the creator, but it doesn’t have to (It’s just for caching purposes, so it doesn’t have to be recreated all the time, its probably not necessary). It exposes a readonly view model property which just executes the Create function, and can be bound to in XAML like so:

<UserControl 
    x:Class="Solavirum.Unspecified.View.FooView"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
    xmlns:Designer="clr-namespace:Solavirum.Unspecified.View.Designer"
    mc:Ignorable="d"
    d:DataContext="{x:Static Designer:FooViewModelCreator.ViewModel}">
    <Grid>
        <TextBlock Text="{Binding Message}" />
    </Grid>
</UserControl>

With this hook into the designer, you can do all sorts of crazy things. Take this creator for example:

namespace Solavirum.Unspecified.View.Designer
{
    public class FooViewModelCreator
    {
        private static readonly FooViewModelCreator _Instance = new FooViewModelCreator();

        public static IFooViewModel ViewModel
        {
            get { return _Instance.Create(); }
        }

        public IFooViewModel Create()
        {
            var foreground = TaskScheduler.Current;
            var messages = new List<string>()
            {
                "This is the first message. Its short.",
                "This is the second message. Its quite a bit longer than the first message, and is useful for determining whether or not wrapping is working correctly."
            };

            var vm = new DummyFooViewModel();

            var messageIndex = 0;
            Task.Factory.StartNew
                (
                    () =>
                    {
                        while (true)
                        {
                            Thread.Sleep(TimeSpan.FromSeconds(5));
                            var newMessage = messages[messageIndex];
                            messageIndex = (messageIndex + 1)%messages.Count;
                            Task.Factory.StartNew
                                (
                                    () => vm.Message = newMessage,
                                    CancellationToken.None,
                                    TaskCreationOptions.None,
                                    foreground
                                );

                        }
                    },
                    CancellationToken.None,
                    TaskCreationOptions.LongRunning,
                    TaskScheduler.Default
                );

            return vm;
        }

        private class DummyFooViewModel : IFooViewModel
        {
            private string _Message;

            public string Message
            {
                get { return _Message; }
                set
                {
                    _Message = value;
                    RaisePropertyChanged("Message");
                }
            }

            public event PropertyChangedEventHandler PropertyChanged;

            private void RaisePropertyChanged(string propertyName)
            {
                var handlers = PropertyChanged;
                if (handlers != null)
                {
                    handlers(this, new PropertyChangedEventArgs(propertyName));
                }
            }
        }
    }
}

It uses a long running task to change the state of the view model over time, so you can see it in its various states in the designer. This is handy if your view model exposes an error property (i.e. if it does a refresh of some sort, and you want to notify the user in an unobtrusive way when something bad happens) or if you just want to see what it looks like with varying amounts of text or something similar. Notice that it has to marshal the change onto the foreground thread (which in the designer is the UI thread), or the property changed event won’t be picked up by the binding engine.

Once you’ve setup the creator and bound it appropriately, you can do almost all of your UI work in the designer, which saves a hell of a lot of time. Of course you can’t verify button actions, tooltips or anything else that requires interaction (at least to my knowledge), but its still a hell of a lot better than starting the application every time you want to see if your screen looks okay.

Getting Inside

Now, because you are running code, and you wrote that code, it will have bugs. There’s no shame in that, all code has bugs.

Its hard to justify writing tests for designer specific support code (even though I have done it), because they don’t necessarily add a lot of value, and they definitely increase the time required to make a change.

Instead I mostly just focus on debugging when the designer view models aren’t working the way that I think they should.

In order to debug, you will need to open a second instance of Visual Studio, with the same project, and attach it to the XAML designer process (XDesProc). Now, since you have two instances of Visual Studio, you might also have two instances of the XAML designer process, but its not hard to figure out the right one (trial and error!). Once you’ve attached the process you can put breakpoints in your designer specific code and figure out where its going wrong.

I’ll mention it again below, but the app domain for the designer is a little bit weird, so sometimes it might not work at all (no symbols, breakpoints not being hit, etc). Honestly, I’m not entirely sure why, but a restart of both instances of Visual Studio, combined with a fresh re-compilation usually fixes that.

Gotchas

There are a few gotchas with the whole designer thing I’ve outlined above that are worth mentioning.

The first is that If you do not version your DLL appropriately (within Visual Studio, not just within your build server), you will run into issues where old versions of your view model are being bound into the designer. This is especially annoying when you have bugs, as it will quite happily continue to use the old, broken version. I think the designer only reloads your libraries when it detects a version change, but I can’t back that up with proof.

The solution is to make sure that your version changes every time when you compile, which honestly, you should be doing anyway. I’ve had success with just using the reliable 0.0.* version attribute in the assembly when the project is compiled as debug (so using an #IF DEBUG). You just have to make sure that whatever approach you use for versioning in your build server doesn’t clash with that.

The second gotcha is that the app domain for the designer is a bit…weird. For example, Ninject won’t automatically load its extension modules in the designer, you have to load them manually. For Ninject specifically, this is a fairly straightforward process (just create a DesignerKernel), but there are other issues as well.

Sometimes the designer just won’t run the code you want it to. Typically this happens after you’ve been working on it for a while, constantly making new builds of the view model creator. The only solution I’ve found to this is just to restart Visual Studio. I’m using Visual Studio 2013 Update 5, so it might be fixed/better in 2015, but I don’t know. Its not a deal breaker anyway, basically just be on the lookout for failures that look like they are definitely not the fault of your code, and restart Visual Studio before you start pulling your hair out.

Conclusion

I highly recommend going to the extra effort of creating view models that can be bound in your component at design time. Its a great help when you’re building the component, but it also helps you to validate (manually) whether or not your component acts as you would expect it to when the view model is in various states.

It can be a little bit difficult to maintain if your code is changing rapidly (breaking view models apart can have knock-on effects on the creator for example, increasing the amount of work required in order to accomplish a change), but the increase in development speed for UI components (which are notoriously fiddly anyway) is well worth it.

Its also really nice to see realistic looking data in your designer. It makes the component feel more substantial, like it actually might accomplish something, instead of being an empty shell that only fills out when you run the full application.

The Seed Is Strong

September 1. 2015 0 Comments

Update:I wrote the code for the seeder below outside of a development environment. It doesn’t work. I’ll try to revisit this post at a later date if I get a chance to implement a seeder class, but I’m currently using the Seed extension methods to great effect. My apologies to anyone who finds this post and is surprised when it doesn’t work.

Entity Framework and ORM’s in general have come a long way. Entity Framework in particular is pretty amazing now, compared to where it was 5 years ago. It was around then that my team made the decision to use NHibernate as our ORM, instead of EF. EF has obviously matured a lot since then, and seems to be the default choice now when working in the .NET world.

I’ve made a couple of posts on this blog involving Entity Framework and some of my adventures with it, one on creating test databases leveraging a scratch MSSQL instance in AWS and another on using a different EF provider to allow for in-memory databases.

One of the great things about working with ORMs, is that your persisted data is just objects, which means you have far more control over it than you ever did before. No need to use SQL statements (or similar) to build up some test data, just create some objects, insert them and off you go.

This post is going to talk about a mechanism for creating those objects, specifically about the concept of seeding.

Also, obviously, all of the sub-titles will be puns based on seeds and seeding.

Just Tossing My Seed Around

Most of the time I see seeding functions built into the DbContext class. They are typically executed whenever the context is created, making sure that certain data is available.

To me this is a violation of the Single Responsibility Principle, because now you have a class that is responsible for both managing data access and for putting some subset of the data there in the first place. While this alone is definitely a good reason to have a dedicated seeder class, there are others as well:

If you have a hardcoded seed method inside your DbContext, its much harder to customise it to seed different data based on your current needs.
Commonly, seed implementations inside the DbContext are wasteful, always trying to seed into the database whenever you create a new DbContext. I’m in favour of using a DbContextFactory and creating a DbContext per operation (or at least per request), which can make the time spent dealing with seeding significant.

I find that the best way to think about seeding is to use the specification pattern (or at least the concept). You want to be able to create an object that describes how you want your data to look (or what capabilities you want your data to have), and then execute it. Let the object sort out the seeding as its dedicated function.

This works fairly well. Yu define a Seeder or DataSpecification class, and expose appropriate properties and methods on it to describe the data (like how many DbFoo entries do I want, how many DbBar entries, what they look like, etc). You implement a method that takes a DbContext of the appropriate type, and in that method you use the information supplied to create and save the appropriate entities.

If you follow this approach, you find that your Seeder can become very complicated very quickly, especially because its entire purpose is to be highly configurable. Its also responsible for knowing how to construct many different varieties of objects, which is another violation of SRP.

I find SRP to be a pretty good guideline for handling class complexity. If you think about the responsibilities that your class has, and it has more than a few, then those responsibilities either need to be very tightly coupled, such that you couldn’t reasonably pull them apart, or you should really consider having more than one class. The downside of SRP is that you tend to have quite a lot of small classes, which is another form of complexity. The upside is that you have a lot of small, composable, modular classes, which are extremely useful once you get over that initial complexity bump for having many many classes.

Ready For Round Two

I didn’t like that my Seeder class had detailed knowledge about how to construct the various entities available from the DbContext. Plus it was huge and hard to understand at a glance.

The next step was to split the logic for how to create an entity into classes dedicated to that. I tend to use the naming convention of XBuilder for this purpose, and they all look very similar:

using System;

public interface IEntityBuilder<TEntity>
{
    TEntity Build();
}

public class DbFoo
{
    public int Id { get; set; }
    public string Do { get; set; }
    public string Re { get; set; }
}

public class DbFooBuilder : IEntityBuilder<DbFoo>
{
    private string _Do = "bananas";
    private string _Re = "purple";

    public DbFooBuilder WithDo(string v)
    {
        _Do = v;
        return this;
    }

    public DbFoo Build()
    {
        return new DbFoo()
        {
            Do = _Do,
            Re = _Re
        };
    }
}

As you can see, the builder features somewhat fluent syntax (the WithX methods) allowing you to chain calls to customise the constructed entity, but has sane defaults for all of the various properties that matter.

The Faker.Net package is handy here, for generating company names, streets, etc. You can also simply generate random strings for whatever properties require it, but its generally much better to generate real looking data than completely nonsensical data.

With the additional of dedicated builders for entities, the Seeder looks a lot better, being mostly dedicated to the concept of “how many” of the various entities. It could be improved though, because its difficult to use the Seeder to specify a subset of entities that meet certain criteria (like generate 10 DbFoo’s with their Do property set to “bananas”, and 200 where its set to “apples”).

We can fix that by providing some additional methods on the Seeder that allow you to customise the builders being used, instead of just letting to Seeder create X number of them to fulfil its “number of entities” requirement.

public class Seeder
{
    private List<DbFooBuilder> FooBuilders = new List<DbFooBuilder>();

    public Seeder WithDbFoos<TEntityBuilder>(IEnumerable<TEntityBuilder> builders)
        where TEntityBuilder : IEntityBuilder<DbFoo>
    {
        FooBuilders.AddRange(builders);

        return this;
    }

    public Seeder WithDbFoos<TEntityBuilder>(int number, Action<TEntityBuilder> customise = null)
        where TEntityBuilder : IEntityBuilder<DbFoo>, new()
    {
        var builders = Enumerable.Range(0, number).Select(a => customise(new TEntityBuilder()));
        return WithDbFoos(builders);
    }

    public void Seed(DbContext db)
    {
        foreach (var builder in FooBuilders)
        {
            db.DbFoos.Add(builder.Build());
        }
        db.SaveContext();
    }
}

Much better and extremely flexible.

Bad Seed

I actually didn’t quite implement the Seeder as specified above, though I think its definitely a better model, and I will be implementing it in the near future.

Instead I implemented a series of builders for each of the entities I was interested in (just like above), and then wrote a generic Seed extension method for IDbSet:

using System;
using System.Collections.Generic;
using System.Data.Entity;
using System.Linq;

namespace Solavirum.Database.EF
{
    public IEntityBuilder<TEntity>
    {
        TEntity Build();
    }

    public static class SeedExtensions
    {
        private static Random _random = new Random();

        public static void Seed<TEntity, TBuilder>(this IDbSet<TEntity> set, Func<TBuilder, TBuilder> modifications = null, int number = 10)
            where TEntity : class
            where TBuilder : IEntityBuilder<TEntity>, new()
        {
            modifications = modifications ?? (a => a);

            for (int i = 0; i < number; i++)
            {
                var builder = new TBuilder();
                builder = modifications(builder);
                set.Add(builder.Build());
            }
        }

        public static T Random<T>(this IEnumerable<T> enumerable)
        {
            int index = _random.Next(0, enumerable.Count());
            return enumerable.ElementAt(index);
        }
    }
}

This is nice from a usability point of view, because I can seed any entity that has an appropriate builder, just by using the one method. The Random<T> method exists so I can get a random element out of a DbSet for linking purposes, if I need to (it was used in a method that I removed, dealing specifically with an entity with links to other entities).

What I don’t like about it:

Its difficult to supply dependencies to the seed method (unless you expose them in the method signature itself) because its inside a static class. This means supplying a logger of some description is hard.
The builders have to have parameterless constructors, again because its hard to supply dependencies. This isn’t so bad, because the builders are meant to be simple and easy to use, with sane default values.
Builders with dependencies on other entities (like a hypothetical DbFooBar class that has references to both a DbFoo and a DbBar) have to have their own Seed method in order to use entities that exist in the current DbContext. This isn’t a dealbreaker, but it does complicate things.

I think a well constructed Seeder class better encapsulates the concept, even though its nice to be able to just hit up a Seed method right off the IDbSet and have it all just work.

Conclusion

Being able to easily create data that meets certain criteria is an amazing tool when it comes to development and testing. Doing it in a provider agnostic way is even better, because you can push data into an in-memory database, an SQL database or a Postgres database, using the same code. In my opinion, the ability to sub out providers is one of the best parts of using an ORM (that and not having to deal with query/data manipulation languages directly).

Nothing that I’ve written above is particularly ground breaking, but its still very useful, and I highly recommend following the general strategy when working with a persistence layer via Entity Framework.

I hope you enjoyed all the terrible seed puns.

I regret nothing.

Commanding the Masses

May 12. 2015 0 Comments

Posted in:
nuget
c#

Another short one this time, as I still haven’t been up to anything particularly interesting recently. This should change in the coming weeks as we start work on a brand new service, which will be nice. We learnt a lot from the first one, but we still have some unanswered questions like database schema versioning and deploying/automating functional tests as part of a continuous integration environment, which should make for some good topics.

Anyway this weeks post is a simple one about a component I ran across the other day while refactoring an application.

Its called Fluent Command Line Parser and it fills one of those niche roles that you come across all the time, but never really spend any time thinking about, which is supplying command line arguments to an application.

Yes Commander?

Most of the time when you see an application that takes input from the command line, you’ll probably see assumptions about the order of elements supplied (like args[0] is the database, args[1] is the username, etc). Typically there is little, if any, validation around the arguments, and you would be lucky if there was help or any other documentation to assist the people who want to use the application from the command line. There are many reasons for this, but the most common is usually effort. It takes effort to put that sort of thing in, and unless the command line tool is going to be publically consumed, that effort probably isn’t necessary.

Fluent Command Line Parser (to be referred to as FCLP from here on) makes this sort of thing much much easier, reducing the effort to the level of “Why not?”.

Essentially you configure a parser object to recognise short and long tokens to indicate a particular command line argument, and then use the presence of those tokens to fill properties in a strongly typed object. The parser can be configured to provide descriptions of the tokens, whether or not they are required, callbacks to be run when the token is encountered and a few other things that I haven’t really looked into yet.

FCLP also offers the ability to set a help callback, executed whenever /? or /help is present, which is nice. The help callback automatically ties in with the definitions and descriptions for the tokens that have been configured, so you don’t need to document anything twice.

Enough rambling about the code, lets see an example:

public static void Main(string[] args)
{
    var parser = new FluentCommandLineParser<Arguments>();
    parser
        .Setup(a => a.Address)
        .As('a', "address")
        .WithDescription("The URL to connect to the service.");
        .Required();

    parser
        .Setup(a => a.Username)
        .As('u', "username")
        .WithDescription("The username to use for authentication.");
    
    parser
        .Setup(a => a.Password)
        .As('p', "password")
        .WithDescription("The password to use for authentication.");

    var parsedArgs = parser.Parse(args);

    // Use parsedArgs in some way.
}

private class Arguments
{
    public string Address { get; set; }
    public string Username { get; set; }
    public string Password { get; set; }
}

As you can see its pretty clear what’s going on here. Its simply to add new arguments as well, which is nice.

The Commander is Down!

Its not all puppies and roses though. The FCLP library is very simple and is missing the ability to do at least 1 important thing that I needed, which was a shame.

It only supports simple/primitive objects (like strings, integers, booleans, enums and collections of those primitives) for the properties being set in your parsed arguments object. This means that if you want to supply a Uri on the command line you have to add an extra step or an extract function on your parsed arguments class that gets the string value passed in as a Uri. Its not a total deal breaker, its just unfortunate, because it seems like something you should be able to do with a callback or something similar, but the context (the object containing the parsed arguments) is just not available at that point in the workflow.

I might take a stab at adding that ability to the library (its open source) if I get a chance. I feel like you can probably accomplish it by adding another method to override the default parsing of the object from the entered string (supplying a Func<string, T> to transform the string with custom logic).

Conclusion

If you’re looking for a nice simple library to make your command line parsing code a bit clearer, this is a good one to grab. It really does help to improve the quality of a typically messy area of an application, which makes it a lot easier for someone coming in later to understand. Just add a reference to the FluentCommandLineParser package via Nuget (you are using Nuget aren’t you? No reason not to) and off you go.

Remember, every little bit like this helps, because we don’t write code for ourselves. We write it for the people who come after us. Even if that person is you 6 months from now, when you’ve forgotten everything.