Saturday, February 7, 2009

There's nothing wrong with a test hitting the database

Let me preface this rant with a preface. I work at a company that's all about agile, TDD, Scrum, and other cool words. Personally, I am all for those things. On the rare occasions that I have code unaccompanied by tests, I feel guilty. There should always be tests.

But, that's the extent of my opinion of it. Write tests that test the code you're about to write. I'm not into mock objects. I don't debate the validity or non-validity of any one approach vs any other approach. I do what I have to do to produce a test that proves the code works. I really try to keep it simple. If I need a dummy implementation of an interface to prove something, then I spend 6 seconds to write the implementation.

But, being in a company where there are lots of people with much stronger opinions about it, I hear a lot of stuff. When is a unit test no longer a unit test but a functional test? Should unit tests be allowed to hit the database? What should tests do and not do? Yadda yadda. I do not doubt the importance of those conversations or the ramifications of the results, its just not something I participate in. I'm more about the code and proving the code works; not the philosophy or implementation behind it.

One of thoe things that comes up quite a bit is "the tests should not hit the database". My response to most things is "well, it depends on the test". If you're writing tests that are implicity hitting the database, then sure, in that case the database component should be swapped out with something simpler and faster without the environmental requirements. Yippee.

But, sooner or later, you come down to the object that actually does the writes to and/or reads from the database. I'm sure you can emulate it, but if the object is a db object, then I'm of the opinion that you should make sure it reads and writes to/from the db. I don't know where that opinion stands in the overall view of the agile/tdd community, but I have heard blanket statements that tests should not hit the database.

Last night, I got a call after hours asking me to look at some tests that were failing. I immediately stated it was environmental since the tests were 2 years old and hadn't been touched in 6 months, and then I set out to prove it. The cause of the failure was a missing row of "system delivered data" from the database that was there prior to the related project, and should always be there.

If my test mocked the db activity rather than run it against the real scenarios, then we wouldn't have learned that the data was gone until someone fired up the product for real and tried to use it. The missing row was an adverse affect of a major database effort of another team. I wasn't involved with the fix, but as soon as it was identified they had no problem fixing it, so it seems to have been minor.

So is it a functional test or is it a unit test? I don't know, and it doesn't matter to me. I wrote a test to prove that the code works, and as soon as an environmental dependency vanished, the test failed. That's the imporant part.

Maybe there should've been a fitenesse test or some other automated test that would've tested the functionality within the website. Maybe that test does exist and we just didn't get to it yet. That's possible. But, it never got that far. They did the build and they ran the mbunit tests, and we immediately knew there was a problem.

There is one take away from this: As a developer, the exception message allowed me to quickly identify what the problem was. But, it was implicit; I mentally traced it to the actual cause. My take away is to proactively check for this condition and throw an explicit error message.

To those who say "your tests shouldn't hit the db", I say nay. Maybe I won't get the Agile Developer of the Year award, or maybe some in the TDD community will frown upon me, but the db hitting test identified a problem.

We (my team) have lots of tests that hit the db. This one is a bit different since its know system delivered data, but most of other tests are not. In those cases, they insert all the test data they need, run the tests, then clean all the test data. We achieve this, in part, by not using identity fields on our setup tables. All our test data gets inserted with ids > 10,000,000. We have a db test harness that's a facade for all of the things we need to do. The finally of every test calls a method that clears out all of the test data. All of these tests are flagged as "long running"; we run them locally, but not as part of the nightly build. In fact, we have a unit test that stress tests certain procs for thread safety.

That concludes tonight's rant.