SDG: Search-based Synthetic Data Generator

Many testing activities, such as usage-based statistical testing, require the generation of synthetic testing data that can be used to build confidence in the reliability of the system under test. Generating such data is not a trivial task as the underlying data schemas are usually large, complex, and subject to numerous domain-related logical constraints. The ultimate goal of the SDG tool is to automatically generate such synthetic data.

System Requirements

Eclipse IDE (Mars or higher) [link].
Java Development Kit (JDK) 1.8.0 (or higher) [link].
Note that all the other required third-party libraries are included in the installation package.
We also recommend using the Papyrus modeling environment for building and managing models [link].

Demonstration Material

Profile for expressing the statistical characteristics of the test data [link].
Example of a domain model annotated with statistical information (TaxCard) [link].
OCL constraints expressing the logical validity of the data [link].
Example of a valid and representative test data sample generated using SDG [link].

Installation Material for SDG

The SDG tool can be found [here].
Installation and usage instructions can be found [here].

Relevant Publications

G. Soltana, M. Sabetzadeh, and L. C. Briand, "Synthetic Data Generation for Statistical Testing”, in proceedings of 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017), Illinois, USA, October 30 - November 3, 2017.

Contact Information

Ghanem Soltana
Interdisciplinary Centre for Security, Reliability and Trust
29, Avenue John Fitzgerald Kennedy
L-1855, Luxembourg
E-mail: ghanem(dot)soltana(at)uni(dot)lu