2.1. Cluster Analysis (CA)
Clustering is a popular data mining technique, which involves the partitioning of a set of objects into a useful set of mutually exclusive clusters so that the similarity between the observations within each cluster (i.e., subset) is high, whereas the similarity between the observations from the different clusters is low (Samoilenko & Osei-Bryson, 2008, 2010).Unlike decision trees which assign a class to an instance (supervised method), clustering procedures are applied when instances are divided into natural groups or clusters (unsupervised method). There are different ways to produce these clusters. The groups may be exclusive i.e. any instance belongs to only one group probabilistic or fuzzy i.e. an instance belongs to each group to a certain probability or degree (membership value) hierarchical i.e. there is a crude division of instances into groups at the top level and each of these groups are refined further up to individual instances (Thomassey & Fiordaliso, 2006).In other literature, overview of two general approaches to clustering was provided: hierarchical clustering, partitional clustering (e.g., k-means, k-median) (Samoilenko & Osei-Bryson, 2008).
- Examples of application of clustering seen in (Banfield & Raftery, 1992; Ben-Dor, Shamir, & Yakhini, 1999; Dhillon, 2001; Fisher, 1997; Hirschberg & Lye, 2001; Lai, Fan, Huang, & Chang, 2009; Okazaki, 2006; Wallace, Keil, & Rai, 2004).