Article Preview
Top1. Introduction
Software fault prediction is vital to benchmark the present state of software quality and to help in predicting its future quality. The prediction of a software fault-proneness is very important to minimize software costs and to improve the effectiveness of software verification and validation (Jin & Jin, 2015). Many factors contribute to introducing faults to software including size, complexity, coupling, cohesion and inheritance. These factors can be measured using product metrics, such as the Chidamber and Kemerer (CK) metrics suite (Chidamber & Kemerer, 1994). These product metrics are widely used to predict software quality because they are easy to collect and interpret (Malhotra & Bansal, 2015). Often, many metrics are used to measure different characteristics of software, and a single metric is not adequate to measure software properties under development (Malhotra & Bansal, 2015). These metrics measures software product statically and do not provide information on software evolution. Software process metrics that measure evolution of software also have a great effect on assessing and evaluating software quality (Rahman& Devanbu, 2013). Software metrics are used to predict fault-proneness of software using machine-learning and statistical models, which help software developers, test engineers and managers in allocating the appropriate resources to achieve acceptable quality in software products within budget and costs. In addition, software evolution lifecycle is much longer than development lifecycle. Therefore, the measurement of software evolution should be included in fault prediction models. Fault prediction models help in providing insights on software modules and therefore software inspection and testing can be limited to the most fault-prone modules whenever resources are either not available or limited. As shown by the power law in many previous studies (Andersson & Runeson, 2007; Shatnawi & Althebyan, 2013), a small number of modules are accountable for most software faults. Therefore, it is more cost-effective to validate only a portion of software in the search for faults. Fault prediction models can be used to detect fault-prone modules. Hence, quality assurance efforts should be focused on the most fault-prone modules in systems in order to reduce the efforts and costs of improving the quality of products. In addition, adding change-proneness of modules to fault-prediction models provides more information about the quality of modules in parts and the software as a whole. The development and evolution of software are inseparable in most software development processes. The prediction models that include product and process metrics actually measures systems from two perspectives, development and evolution.
Some fault prediction models built using product metrics were reported in (Basili et al., 1996; Shatnawi, 2010; Shepperd et al., 2013; Hamill & Goseva-Popstojanova, 2014; Zhou et al., 2014; Kaur et al., 2016; Jindalet al., 2016). These models were built only from product metrics such as the CK metrics. In this work, we use the well-known product metrics known as the CK metrics to build fault-proneness models. In addition, we propose to use change-proneness in building fault prediction models. The change-proneness of modules is vital in assessing the evolution of software systems and helps in finding which modules are more fault-prone. Therefore, Change-proneness is measured in three scenarios: a module is marked change-prone if it has been changed in a long evolution period (the last three years), medium evolution period (last twelve months), or short evolution period (the last six months). The proposed change-proneness metric measures the effect of software evolution in a particular period. The measurement of software change in three time spans helps to know the extent of correlation between change and fault-proneness and therefore, knowing which time span is more appropriate to consider for fault prediction.
Six machine-learning classifiers were used to validate the proposed metrics. These classifiers were used repeatedly to predict fault-proneness as shown in previous works (Basili et al., 1996; Shatnawi, 2010; Shepperd et al., 2013; Hamill & Goseva-Popstojanova, 2014; Zhou et al., 2014; Kaur et al., 2016; Jindalet al., 2016). The classifiers were trained and tested on five large open-source systems using a combination of product and process metrics. The classifiers were trained and tested using the product metrics alone, and a non-parametric signed rank test was used to find the significance of the difference between the performance of two classifiers using product metrics only and a combination of product and process metrics.