Article Preview
TopLiterature Survey
In a previous paper (Schumacher et al., 2010), data mining techniques were applied in a study that investigated the likelihood that incoming college freshmen majoring in Actuarial Mathematics (AM) will graduate in this major. The study applied data mining to an earlier investigation which predicted success using only traditional logistic regression. The original study contained data spanning seven years of incoming university freshmen who started as AM majors in the years 1995-2001 (Smith and Schumacher, 2006).
Data mining applications in education are not limited to higher education. One such investigation is described in Sen et al. (2012), where four techniques (neural networks, support vector machines, decision trees, and logistic regression) were utilized to predict high school placement test results for 8th graders in Turkey. In this case, the decision tree was the best predictor while logistic regression was the least accurate. However, there have also been many investigations of issues in higher education involving data mining methods. For example, in one comprehensive paper (Davis et al., 2007), predictive models were generated for three important educational concerns: student retention, student enrollment and donor giving. In another study (Herzog, 2006), used logistic regression, decision trees and neural nets to predict student retention and degree completion time for new and transfer students. Similarly, student retention was analyzed through six-year graduation predictive models which were developed with the use of various data mining techniques (Campbell, 2008). Delen (2010) utilized four individual models (artificial neural networks, decision trees, support vector machines and logistic regression) along with ensemble techniques to predict student attrition. The data consisted of five years of first-year student enrollment. The support vector machines resulted in the best prediction with the decision tree being the next accurate. Meanwhile, Zhang et al. (2010) also applied data mining techniques to investigate student retention in college. They considered three techniques: Naive Bayes, Support Vector Machines and Decision Trees. They found that the Naïve Bayes algorithm had the highest prediction accuracy for those students dropping out. In Lin (2012), various machine learning algorithms were applied to data consisting of information for eight years of first year students in another study of college student retention. The five most accurate predictive models were shown to be decision trees, of which the best technique for these data was alternative decision tree (ADT).