Article Preview
Top1. Introduction
Outlier detection is one of the most important techniques in knowledge discovery in databases (KDD), which aims to detect objects that are substantially different from the rest of the data.
Many algorithms have been employed in Data Mining, Machine Learning, and Statistics to find outliers. These algorithms can be classified as either distance-based (Knorr & Ng, 1997; Knorr & Ng, 1998; Knorr & Ng, 1999; Knorr, Ng & Tucakov, 2000), density-based (Breunig, Kriegel, Ng & Sander, 2000), clustering (Jain, Murty & Flynn, 1999), depth-based (Johnson, Kwok & Ng, 1998) or distribution-based (Branett & Lewis, 1994). These algorithms have been applied in several domains such as network intrusion (Ding et al, 2012), (Idé & Kashima, 2004), credit card fraud (Richard & Hand, 2001), email spam (Castillo et al, 2007), customer activity monitoring (Fawcett & Provost, 1999), and many other domains. Nevertheless, a little attention has been paid to detect outliers in multicriteria decision aid (MCDA) field.
A large number of multicriteria methods have been developed (Brans & Vincke, 1985), (Figueira, Mousseau & Roy, 2005), (Dyer, 2005) and applied in several domains (Joerin, 1997), (Belacel, 2000), (Zopounidis, 1999) but the study of the robustness of these methods has not been sufficiently tackled (Hites et al, 2006). Most of these methods are based on subjective parameters which represent decision makers’ preferences (like preference, indifference, weight and veto thresholds). Changes in the values of one of these parameters could affect the results of these methods. In addition, if the values of these parameters are not estimated carefully, the results will be less sensitive to outliers. For instance, adding only one arbitrarily high (or low) action to a given data-set is sufficient to completely modify the result (ranking, choice or sorting) of the method. As consequence, the detection of outliers can be exploited to improve the estimation of such parameters. Furthermore, in multicriteria classification and sorting methods, the presence of outliers is not benefit for improving classification results. These objects affect other objects to be classified correctly. Thus, detecting multicriteria outliers can be exploited as a pretreatment step before executing multicriteria classification methods.
For these reasons, it is important to study the presence of multicriteria outliers. This new research direction has not been enough treated in the literature. The only work that dealt with this problem has been developed by (De Smet, Hubinont & Rosenfeld, 2017). The authors proposed a distance-based model to detect multicriteria outliers. The authors extended the multicriteria distance proposed in (De Smet & Guzman, 2004) to different samplings of the set of objects. Outliers were detected by the identification of bi-modal distributions of the distance values.
The present paper proposes a new idea for detecting outliers in MCDA field. The paper introduces the concept of relation-based outlier. This concept permits to detect outliers using the preference relations that characterize the multicriteria outranking methods. More explicitly, in our approach, each object is identified by its relations with the rest of the objects in the data-set. The outliers will be detected by applying the local outlier factor (LOF) algorithm (Breunig, Kriegel, Ng & Sander, 2000) on the distributions of the outranking relations.
The rest of the paper is organized as follows. The main concepts related to preference relations in MCDA field are presented at first. Then, a brief description of LOF algorithm is given. Next, the algorithm of the proposed approach is detailed followed by an experimental study. Finally, we conclude by some general remarks and future works.