Article Preview
Top1. Introduction
Software quality is an important issue that all developers of software systems want to achieve. It currently attracts a lot of attention since software is everywhere and affects our lives on a daily basis. Software testing is the main factor in enhancing and increasing the quality of software, for which it is necessary to generate different test cases according to certain coverage criteria such as graph, logic, input space, and syntax coverage (Amman & Offut, 2008). The size and complexity of software systems is growing dramatically, in addition to which, the existence of automated tools leads to the generation of a huge number of test cases, the execution of which causes huge losses in cost and time (Lilly & Uma, 2010). According to Rothermel et al. (Rothermel et al., 2001), a product of about 20,000 lines of code requires seven weeks to run all its test cases. Ultimately, the challenge is to find a way to reduce the number of test cases or to order the test cases to validate the system being tested.
The main goal of software testing is to ensure that the software is almost free from errors. The test process can be said to be effective when the test cases are able to locate any errors. Several tools have been seen in the literature which automatically generate thousands of test cases for a simple program in a few seconds, but executing those test cases takes a great deal of time. Moreover, the tools could also generate redundant test cases (Muthyala et al., 2011). The problem is compounded when we have complex systems, where the execution of the test cases may take several days to complete. Moreover, it should be noted that most of the time is spent in executing redundant or unnecessary test cases.
To identify the redundant test cases a technique such as data mining (Lilly & Uma, 2010) is required to understand the properties of test cases, with a view to determining the similarities between them and removing the redundant ones.
This paper aims to deal with this issue, of reducing the number of test cases in order to minimise the time and cost of executing them. Several techniques can be used to reduce test cases such as information retrieval, pairwise testing (Yoo at al., 2009) and data mining. We used the data mining approach, mainly because of the ability of data mining to extract patterns of test cases that are invisible.
We present our approach, concentrating on the two most effective attributes of test cases, coverage and complexity (Kameswari et al., 2011). An empirical study presented in (Jeffery & Gupta, 2007) suggested that during test case reduction, using several coverage criteria rather than single coverage is more effective in selecting test cases that are able to expose different faults.
We start by collecting the test cases for a given system and then we build the dataset by selecting coverage and complexity. Next, we use data mining technique, K-clustering, to group several test cases into a particular cluster. Finally, redundant test cases that have the same distance to the cluster centre point are removed. To evaluate our approach, we calculate the coverage ratio of the original test cases and compare it with the coverage ratio of the reduced test cases.