Article Preview
Top1. Introduction
As the modern generation tends to spend more time on digital media platforms for learning, data analytics majorly focuses on the extraction of digital user’s data and enables the analytical process to gradually enhance their learning capabilities by enriching proper guidance and further optimize their learning methods through relevant digital learning materials. The increasing growth of digital medium paves the way for digital learners to subscribe to many online tutorials seeking their inherent interest to learn more on the recent topics. Besides, Virtual Learning Environments (VLEs) have come up with many new online courses to offer for the students to explore the evolving technologies worldwide. In this way, the concept of Massive Open Online Courses (MOOCs) was originated aiming to provide courses all over the world empowering students to study courses of interest on demand.
On similar grounds, Coursera was founded in the year 2011 and gained huge momentum in its inception. The starting year attained almost 1.65 million digital learners and registered for most of the online courses that were offered. Similarly, edX was started in the collaboration of Harvard University and MIT which received huge interest in its online courses, and almost 350 K students were registered; it is still growing very fast year after year. In the year 2012, a popular course called “Introduction to Computer Science” was offered in MOOC by the prestigious Harvard University which got registration of 150 K students, but out of which only 1400 students finished the course successfully and the dropout rate dipped to 92%. On all the digital courses, the dropout rate is very high and it has brought the need to focus on analyzing the reason. The high dropout rate of online courses has increased the skepticism about the way these courses were conducted on a digital platform and the mode of the exams. This research’s focus is mainly on dealing with the reason why the dropout rate is very high on online courses and to find suitable causes for the same.
Nevertheless, there are some benefits for MOOC users –besides being mostly free and allowing access to world-class specialists– and for providers. A large amount of data generated in the digital interactions opens up new possibilities for studying and understanding student behavior (Onah, Sinclair, Boyatt, & Foss, 2014). Researchers can now delve deep into the domain to gain so much knowledge about individual users, their learning interest, and patterns of search on the online platforms. Further, it opens up new possibilities and understanding to gather more about digital user behaviors and their method of accessing the resources. To analyze digital users’ behavior in MOOC, some significant volume of data were extracted from the sources and performed the statistical analysis against the factors hindering digital learners’ completion of the course successfully and thus causing the high dropout rate. This research paper is focused on utilizing the most effective feature extraction methods and suitable machine learning classification techniques to calibrate the dropout rate and further assess the dropout behavior of digital users.
The major contribution of this research is threefold. At first, we extensively used the methodology of applying a suitable feature extraction method to understand a digital learner’s activity within MOOC; in fact, this method can even be applied to other domain-specific datasets as well. Second, to evaluate the model’s performance effectively, we used machine learning algorithms such as Support Vector Machine (SVM), Random Forest (RF) and Conditional Random Fields (CRF) to predict the reason for high dropout rate. Eventually, an empirical analysis was carried out within the temporal context to find strong indicators for the abandon rate and combined the static and temporal features to highlight indicators for high dropout.
The rest of this paper is organized as follows. The next section surveys the related work background and presents a summary table. Section 3 explains how the data was selected and the methodologies used in the process. Section 4 covers in detail our proposed model for dropout rate prediction. The three machine learning techniques used for classification/prediction estimating dropout prediction are briefly introduced, the outcome of the empirical analysis is given, and the results are discussed in Section 5. Finally, Section 6 presents the conclusions.