Tuesday, 6 September 2016

Legacy database testing with PeanutButter

Preamble

Recently, I've been asked about a strategy for testing code which works against an existing database. It's not the first time I've been asked about this -- and it probably won't be the last. I should have listened to Scott Hanselman's advice long ago and blogged it.

Better late than never, I suppose. Let's get to it!

The problem

You have some code which hits a MSSQL database to do something. It could be hitting a stored procedure for a report. It could be inserting or updating rows. Whatever the use-case, you'd like to have that code under test, if not just for your own sanity, then because you're about to extend that code and are just plain scared that you're about to break something which already exists. This is a valid reason -- and it drove me to the strategy I'll outline below.

You may also simply wish to write your code test-first, but have this great existing legacy mass which you have to work with (and around) and you're just struggling to get the first test out.

Please note: this strategy outlines more integration-style testing than true unit testing. However, I'd rather have an integration test than no test any day. This kind of testing also leads to tests which take a few seconds to run (instead of the preferred milliseconds) -- but I'd rather have slow tests than no tests.

Note that I'm tackling MSSQL because:
  1. It's a common production database
  2. If you were dealing with simpler databases like SQLite or SQLCE, you may already have a strategy to deal with this (though PB can still make it easier, so read on)
  3. I haven't found (yet) a nice way to do temporary in-process MySQL or PostgreSQL. You could use this strategy with Firebird since server and embedded can even use the same file (but not concurrently, of course) -- though currently PeanutButter.TempDb has no baked-in Firebird provider. I guess I should fix that!

So, let's really get to it!

The general idea

Ideally, I'd like to have some kind of test framework which would spin up a temporary database, create all the structures (tables, views) that I need, perhaps the programmability (procedures, functions) I'd like to test (if applicable) and also provides a mechanism for loading in data to test against so that I can write "when {condition} then {expectation}"-style tests.

I'd also like that temporary database to die by fire when my tests are done. I don't want to have to clean anything up manually.

Traditionally, running instances of database services have been used for this style of testing -- but that leaves you with a few sticky bits:
  1. Your development and CI environments have to be set up the same, with credentials baked into the test suite. Or perhaps you can use environment overrides -- still, authorization to the test database has to be a concern
  2. Test isolation is less than trivial and as the test suite grows, the tests start interacting with each other, even if there is cleanup at some point along the way.
  3. Getting a new developer on the team is more effort than it really should be, mainly because of (1) above. For the same reasons, developing on a "loaner" or laptop is also more effort than it should be. You can't just "check out and go".
Some prior strategies exist to cope with this, but they are unsatisfactory:
  1. A shared development/test database which accumulates cruft and potentially causes strange test behaviour when unexpected data is matched by systems under test
  2. Swapping out the production database layer for something like SQLite. Whilst I really like SQLite, the problem boils down to differences in functionality between SQLite and whatever production database you're using. I've come across far to many recently, in a project where tests are run against SQLite and production code runs against PostgreSQL. I've seen similar issues with testing code targeting SQL Server on SQLCE. Even if you have a fairly beefy ORM layer (EF, NHibernate, etc) to abstract a lot of the database layer away from you, you're going to hit issues. I can think of too many to put them out here -- if you really want a list of the issues I've hit in this kind of scenario, feel free to ask. I've learned enough to feel fear when someone suggests testing on a database other than the engine you're going to deploy on.
    Sometimes you have tests which work when production code fails. Sometimes your tests simply can't test what you want to do because the test database engine is "differently capable".
  3. For similar reasons to (2) above, even if you're testing down to the ORM layer, mocked / substituted database contexts (EF) can provide you with tests which work when your production code is going to fail.
So we'd like to test against "real iron", but we'd like that iron to be transient.

PeanutButter to the rescue (:

The strategy that emerged

  1. Create a temporary database (PeanutButter.TempDb.(some flavor))
    1. Fortunately, when production code is going to be talking to SQL Server, we can use a LocalDb database for testing -- all the functionality of SQL Server (well, pretty-much all of it, enough for application code -- you'll be missing full text search for example, but the engine is basically the same).
  2. Create the database structures required (via migrations or scripts)
  3. Run the tests on the parts of the system to be tested
  4. Dispose of the temporary database when done, leaving no artifacts and no cruft for another test or test fixture.


public void TestSomeLegacySystem
{
  [Test]
  public void TestSomeLegacyMethod()
  {
    // Arrange
    using (var db = new TempDbLocalDb())
    { 
      using (var conn = db.CreateConnection())
      {
        // run in database schema script(s)
        // insert any necessary data for the test 
      }
      // Act
      // write actual test action here
      // Assert 
      using (var conn = db.CreateConnection())
      {
        // perform assertions on the database with the new connection 
      } 
    } // our TempDb is destroyed here -- nothing to clean up!
  } 
}

This is very raw, ADO-style code. Of course, it's not too difficult to extrapolate to using an EF context since TempDb exposes both a connection string or you could pass a new connection from CreateConnection to your context's constructor, which would call into DbContext's constructor which can take a DbConnection -- and you would set the second (boolean) parameter based on whether or not you'd like to dispose of the connection yourself. 

I did this often enough that it became boring and my laziness kicked in. Ok, so that happened at about the third test...

And so TestFixtureWithTempDb was born: this forms a base class for a test fixture requiring a temporary database of the provided type. It has a protected Configure() method which must be used to instruct the base class how to create the database (with an initial migrator to get the database up to speed), as well as providing hints; for example, by default, a new TempDb is spun up for every test, but if you're willing to take care of cleaning out crufty data after each test (perhaps with a [Teardown]-decorated method), then you can share the database between tests in the fixture for a little performance boost. The boost is more noticable when you also have EF talking to the temporary database as EF will cache model information per database on first access -- so even if you have the same structures in two different iterations, EF will go through the same mapping steps for each test, adding a few seconds (I find typically about 2-5) per test.

Indeed, if you have an EF context, you perhaps want to step one up the chain to EntityPersistenceTestFixtureBase, which is inherited with a generic type that is your DbContext. Your implementation must have a constructor which takes just a DbConnection for this base class to function. If you've created your context from EDMX, you'll have to create another partial class with the same name and the expected constructor signature, passing off to an intermediatary static method which transforms a DbConnection into an EntityConnection; otherwise, just add a constructor.

And this is where my laziness kicks in again: the test fixture for EntityPersistenceTestFixtureBase provides a reasonable example for usage. Note the call to Configure() in the constructor -- you could also have this call in a [OneTimeSetup] method. If you forget it, PeanutButter will bleat at you -- but helpfully, instructing you to Configure() the test fixture before running tests (:

Some interesting points:
  1. Configure's first parameter is a boolean: when true, the configuration will create ASP.NET tables in the temporary database for you (since it's highly unlikely you'll have them in your own migrations). This is useful only if you're intending to test, for example, an MVC controller which will change behaviour based on the default ASP.NET authentication mechanisms. Mostly, you'll want this to be false.
  2. The second parameter is a factory function: it takes in a connection string and should emit something which implements IDBMigrationsRunner -- this is an instance of a class with a MigratoToLatest() method which performs whatever is necessary to build the required database structures. You could wrap a FluentMigrator instance, or you can use the handy DbSchemaImporter, given a dump of your database as scripts (without the use statement!) to run in your existing schema. When doing the latter, I simply import said script in a regular old .net resource -- when doing so, you'll get a property on that resource which is a string: the script to run (:
  3. You can configure a method to run before providing an new EF context -- when configured, this method will be given the EF context which is first created in a test so that it can, for example, clear out old data. Obviously, this only makes sense if you're going full-EF.
  4. If you have a hefty legacy database, expect some minor issues that you'll have to work through. I've found, for instance, procedures which compiled in SSMS, but not when running in the script for said procedure because it was missing a semi-colon. Don't despair: the effort will be worth it. You can also try only scripting out the bare minimum of the target database that is required for your tests.

Enough blathering!

Ok, this has been a post well-worth a TL;DR. It's the kind of thing would would probably work better as a 15-minute presentation, but I suppose some blog post is better than no post (:

Questions? Comments? They're all welcome. If there's something you'd like me to go more in-depth with, shout out -- I can always re-visit this topic (:

No comments:

Post a Comment

Everything sucks. And that's OK.

There is no perfect code, no perfect language, no perfect framework or methodology. Everything is, in some way, flawed. This realisati...