The country needs more quality! High test coverage with simultaneous reduction of the number of test cases is becoming more and more a basic requirement in software testing. Driven by this desire, but also by the amendment of the EU Data Protection Directive, both test data management and tool-supported test data generation have advanced in recent years from stepchild to focus area of the industry.
In his article “Test data management – the right input for your tests”, my colleague Maximilian Wallisch explained why test data management is a process that ideally accompanies the entire process of an IT project, who benefits from it and which steps are required for professional implementation in the development process.
The subject of test data management (TDM) is becoming increasingly important in both the test and development areas in view of Agile and DevOps. As a result, most major tool manufacturers now offer TDM tools in the field of test automation. Furthermore, an amendment to the EU data protection law is imminent. In the future, every company will have to take care of alienating personal data. This means that the use of productive data in non-productive areas will be made more difficult. Test data and data in development are not excluded from this.
Everyday life layers from the test department
From the environment of testers and developers we learn again and again that test data in projects are often generated and managed in a “historically grown” way. This often means that testers and developers construct their data in their own database as they are needed for test cases and change and adapt this test data again and again. It can also mean that a defined dataset (e.g. a Golden Copy) is imported into the test environment and used by every developer and tester.
The test data is changed individually as it is needed for the respective test case. There is no versioning of the test data, identical test cases are often tested with different data sets, the data in the database is changed and “adapted” until the data consistency is affected and test cases fail. This is often the time when a new Golden Copy is installed and the whole process starts all over again. The man from La Mancha sends his regards.
What is “good” test data?
When testing software in general and automated tests in particular, specific preconditions have to be established before each test. This includes making certain settings on the test environment (e.g. setting the language, changing the date and time, installing the software version, etc.), but also bringing the database or test data for the respective test case into the required state. As self-evident as the first step in software testing may be, many people react in disbelief that state cleaning of the test data should not be a foreign word.
Synthetic data can be BETTER than productive data? How is that supposed to work?
Quite simple: test data can be consumed. This often leads to problems, because one of the most important aspects of software testing is the reproducibility of results. Just think of a somewhat more complex test case: test data is used that is created in the background in the database by linking several tables. At first glance this background magic is not noticeable, and at the moment of the test it is not relevant.
The test changes the parameters of the data set so that the test data cannot be used in another test run (e.g. by exceeding a limit value or deleting a data set). The test case starts, the point of no return for the data set is reached, the test case continues to run, but the test case fails somewhat later at an unexpected point with a previously unknown error message.
Test data can not only be consumed, it can also be obsolete.
The same aspect between productive data and synthetic data must also be taken into account when aging data. In other words, an obsolete data record is just as useless as a missing one.
Test cases may require specific data or times to deliver the desired result. A data set may need to be in the future or a specific number of days in the past to test specific constellations of the test environment. Since the deduction and import of production data into the test environment often means a considerable additional effort, testers and developers have to make do with the same data status for a longer period of time. With the constant use of future test data, for example, the available test data becomes less with each day, until finally no more data is available for testing.
In this case, you can laboriously import new data records by hand, accept the additional effort of a new Golden Copy – if that is possible – or use professional test data management: The required data is generated on time, has the required parameters, both developers and testers save nerves. In addition, time and money are saved again!
Synthetic test data makes life easier!
Let’s take as an example a transaction that is to take place differently on public holidays than during the rest of the year. The demand on the record is clear: Either it is a holiday or not. No background magic, no outdated data, no used data.
As a student I hated the time between the Three Kings (06.01.) and Good Friday (mostly in April) – not a single holiday! Now as a tester it is similar to me. Especially when debugging, you often have to dig very far back in databases shortly before Easter to find a data set that fulfills the claim to be a holiday.
If one has then still other special conditions – for example a day the holiday is AND on a weekend falls – the scrolling becomes already times the diligence task.