What is your Test Data Strategy? Do you even have one? Do you even care?
In my less-than-humble opinion, test data can really make or break any test effort. Second only to error handling, test data is rarely something that receives a whole lot of attention. It tends to be one of the more neglected aspects of testing. Personally, I think it is way too important to overlook. As a consultant, whenever I ask clients about their test data strategy I’m usually met with blank stares.
I was once contracted to test an online banking application. One of the key tests was related to transfers between accounts. U.S.l banking regulations apparently limit the number of online transfers between two accounts (like savings to checking or vice versa) to 5 transfers per month (they did while I was testing it anyway). So, at the end of a very long test day, I reached the transfer limit test. I logged on to my test user’s account and made 5 transfers of $100 from the user’s savings account to their checking account. The transfers all succeeded. Then I attempted to make a sixth transfer which was correctly prevented with an appropriate message as to why. So far, so good….I thought. I went home for the day and figured I’d try again the next morning just to be sure it wasn’t a daily limit. The next morning, coffee and bagel in hand (with chives and onion cream cheese), I sat down at my computer to resume testing. I launched the application and attempted to log on to my test user’s account. The log on failed. Surely a typo. So I tried again. It failed a second time. Third time’s the charm. This time I took great care to enter the user name and password correctly. Strike 3 – it failed again. So logically, I asked if any changes had been made to the database. My manager informed me that the user had changed their password because someone had been tampering with their account. What!!! Apparently I hade been using a live account to test and the account holder was quite rightly upset and changed her password. The incident was also reported to the managers of the bank. I was testing with live production data! I couldn’t believe it.
I assumed it was a blinding glimpse of the obvious that you never, never, ever, test with live data! Well you know what happens when you assume. Lesson learned – the hard way.
So now I give a lot of thought to test data. Rule Number 1 – separate test data from production data. But where do you get test data? There are essentially 3 approaches you can take: create it, copy it from production, or a combination of the two. Which one you use will depend on your particular situation, schedules, and database saavy. Let’s take a closer look at each.
Option 1: Create it from scratch. If you are testing a brand spanking new application this may be your only option. Creating your own test data gives you the most flexibility. You can tailor the data to each specific test case. Once it is created you can save a snapshot of it or write scripts to recreate it which will allow you to restore that data to a clean copy at the beginning of each test cycle or as needed. The downside is that it can take a lot of time to build the data. Especially if you need a lot of it. Get to know your DBA. Take them to lunch. Buy them a muffin.
Option 2: Copy the data from production. If you are working on an update to an existing application, you may be able to take a copy from the production database. Even if the database structure is modified from the previous application to the new one, it may still be more efficient to get a copy and modify it than to create it. Of course the existing data may not support testing. For example if you are testing data filtering, the production data may not have all of the filter values available to adequately test the filter. If you are testing numeric sorting where vales may be positive, negative, or zero, some of those values may not exist in the current copy of the data. As a result, the data needed for a specific test may not even available. Another issue – production data is constantly changing. The data used in one test cycle may be different from the copy you take for the next cycle. Records may be added, deleted, or modified between tests. As a result, any defect you find may be a code issue or may be a data issue. Because results can vary from cycle to cycle test results become unreliable. One way around this problem is to take a copy of the production data before the start of the test cycle and save it. Then restore the test database using this saved copy rather than a current copy. The data becomes more consistent and therefore more reliable. Of course if the database structure changes, you may have some work to do. Another downside to using or copying live data – privacy issues. The data may contain sensitive information such as bank account numbers, social security numbers, usernames and passwords, etc. You may need to cleanse the data before you can use it.
Option 3: Combine Options 1 and 2. Start with a copy of production and then modify it or add data to meet the needs of your tests. Write an update script. Once it’s ready – copy it, and save it. At the beginning of each new test cycle you can restore the database using the copy. The benefit – you can have a lot of data and still meet your test needs. Again, do not use a fresh copy of the production data. This is actually my preferred option. Especially if I’m testing any kind of filtering, sorting, or searching. If you do those action with a small data set you may believe these functions are pretty quick performance-wise. However, add a real data load and these functions may slow dramatically or even break.
Regardless of the Test Data Strategy you use, be sure to give your data needs some thought. Work with you DBAs to find the best approach. DBAs are your friends. Since most people usually avoid them, they might enjoy the attention. Lastly, open an account at your local bagel or donut shop.