Tuesday, March 25, 2014

making testable database access layer

Although I am a C# developer, the principles advocated here can apply to any language/platform. 

Why is testing important?

The question is very subjective and its answer depends on the profile of the person asking.

If its a product manager, he'd probably answer, testing is something done to see if the product works fine (in a very broad sense so the end-user/business function is not impeded).

If its an independent tester trying to answer this question; he'd have various responsibilities which sort of answer these questions:

  • are all the use cases of the product are satisfied?
  • has the product or software satisfied all non-functional requirements?
  • etc..
So for one such person described above, testing is important to ensure if the software does what its designed to do. 

Now, if you happen to be writing a lot of code, and "you" ask this question...in all good probability it means you are a novice software developer. 

If this is disheartening to learn, take solace in the fact that you reading this post is some indication you want to achieve professional excellence. "You want to write good code". :) But, be warned, being a software engineer/developer encompasses a much greater responsibility than just writing good code.

So what is testing to a developer/software engineer?

I'll digress a bit here to say something about writing testable code "in general". Many of the projects that people land into these days are not exactly new development, or, from-the-scratch deal. Of course, this is a decision, a senior manager/architect has to decide. This decision affects project schedules. People actually start with fixating the schedules first. These things are usually negotiated after engaging with client/end-users, or, decided after studying market trends where people decide a suitable release date. After the schedules are fixed a conscious decision is made on the amount of engineering required and level of quality that is acceptable.

Therefore, not every software development can afford to be engineered well enough, and, thereby, the level of quality will also show itself that way.

As a software engineer/developer, writing testable code ensures that your code does what it is supposed to do. Therefore, we have to carefully plan and foresee what sort of inputs should go into a particular program, what inputs make sense, and what inputs will fail, etc. We are also trying to see if the program does exactly does what it is supposed to do (without errors).

Testing your data access layer (DAL)

Here is a small idea to help you understand how testing your DAL works. Say you are writing a blog engine. One crucial thing to test for might be, to see if the submitted blog post is saved in database correctly. The database identifies each post with a unique postID. Your test procedures would be:
  1. to "post" a blog (with title and post body) to database and retrieve its postID.
  2. retrieve a post with the above postID
  3. check if the post title and post body match with what you started with.
If you happen to encounter error anywhere in the above steps it means your code fails, and you have to investigate further.

Repository pattern - independently testing your DAL

We are going to focus on segregating all your data access to another layer (DAL). So that your application code need not concern itself with directly talking to the database. There are multiple advantages to this - one of them is being able to test it independently. 

So suppose we take the blog engine as an example. We'd design our dal as follows:

1. Entities and interfaces first


Entities:
public class BlogPost
{
 public int PostID { get; set; }
 public string PostTitle { get; set; }
 public string PostBody { get; set; }
 public List<Comment> Comments { get; set; }
 public DateTime PostedOn { get; set; }
 public User BlogUser { get; set; }
}

public class Comment
{
 public int CommentID { get; set; }
 public strimg CommentText { get; set; }
 public DateTime CommentedOn { get; set; }
}

public class User
{
 public int UserID { get; set; }
 public string UserFullName { get; set; }
 public string UserNickname { get; set; }
}

Interface:
public interface IBlogEngine
{
 void InsertPost(BlogPost newPost);
 void AddComment(Comment newComment);
 List<BlogPost> GetPosts();
 BlogPost GetPostById(int PostID);
}

2. Concrete implementation

public class BlogEngineMsSql: IBlogEngine
{
 public void InsertPost(BlogPost newPost)
 {
  int newPostId;
  //SQL server data access code...
  
  //new post id is retrieved from db somehow...
  newPost.PostID = newPostId;
 }
 
 //other implementstions follow
}

3. Test code

public class UnitTest1
{
 [TestMethod]
 public void TestBlogInsert()
 {
  BlogPost testPost = new BlogPost() { 
   PostTitle = "test",
   PostBody = "test",
   BlogUser = new User() { UserID = 123, UserNickname = "Mock", UserFullName = "Mock" }
  };
  
  IBlogEngine db = new BlogEngineMsSql();
  db.InsertPost(testPost);
  var id = testPost.PostID;
  var post = db.GetPostById(id);
  Assert.AreEqual(testPost.Title, post.Title);
  Assert.AreEqual(testPost.Body, post.Body);
 }
}

So in this example, we've created a simple blog engine. Note the interface and its concrete implementation. (For now assume its Sql Server specific code). The use of an interface allows us to mask the underlying implementation. In other words, I can make my application code completely unaware of the db implementation like this:

IBlogEngine db = BlogEngineFactory.GetMsSqlInstance();

...or even better, if I used some DI pattern:

IBlogEngine db = BlogEngineFactory.GetEngineInstance();

This way my unit test code is independent of the underlying db implementation.

The repository pattern is simply the act of exposing or channelling your database access code via interfaces. The dual advantage of this is,

(a) you can write test code which is fairly independent of the underlying implementation, and
(b) enforces a modular approach.

If tomorrow we want to run the same tests against an oracle database, we just add the corresponding dal assembly in our project and run the tests. If we are using some dependency injection, we'd change our application configuration to reflect likewise.

This decoupled nature, allows us to write test code, and, the actual implementation code, in parallel. And in theory, no person should write both; he should write code for any one - either the implementation or the test code. But if you are stuck with yourself, then the only go is to write the tests first, then actual implementation. This should progress incrementally, method after method. You write a test for the method; write the implementation, test it; fix errors and re-test, else move to the next method, and repeat as necessary.

No comments: