Article Preview
Top1. Introduction
The uses of Information Technology (IT) has increased day which therefore ended to be everything that we are doing, we can directly go through online on the spot. Information technology is any kinds of software or tools for keeping information, retrieve and sending the information using a certain type of technology such as computer, mobile phones, computer networks and more. With this IT, people are now able to upload, retrieve, store their information and collect information to Big Data. Since Big Data hold massive information with the use of IT such as the internet, students are now able to study online which is called as e-Learning. As the tools provided by Information Technology (IT) have increased continuously, these have affected all aspects of our lives, specifically, in the area of academic. Big Data and e-Learning do bring people or the users specifically, both various benefits and disadvantages because of its multi-function ability. Therefore, it affects our social skills, mental growth, physical and risks of invading our personal information (Internet of Things, n.d.). Web Semantics for Textual and Visual Information Retrieval is a pivotal reference source for the latest academic research on embedding and associating semantics with multimedia information to improve data retrieval techniques (Singh et al., 2017).
Data is the concrete form of information presentation. The main source of knowledge we acquire is text data. Therefore, in order to meet the needs of users for fast and accurate information acquisition, it is necessary to effectively classify and manage massive text data. Traditional text categorization and clustering techniques have many problems in dealing with this information, such as reduced scalability, lack of corpus and inadequate classification accuracy.
In recent years, many text classification methods have been proposed, such as a clustering-based PU active text classification method proposed by Liu Lu et al. (2013), which combines SVM active learning and the improved Rocchio classifier. The method improves the weight evaluation function and improves the accuracy of classification to a certain extent; Xu Li et al. (2012) introduced genetic algorithm into SVM text classifier, which reduced the error text to a certain extent; Dhar and so on proposed categorization of Bangla web text documents based on tf-idf-icf text analysis scheme (Dhar et al., 2018). The paper argues that addition of Inverse Class Frequency (ICF) measure to the Term Frequency (TF) and Inverse Document Frequency (IDF) methods can yield better responses in the act of feature extraction from a language like Bangla. The automatic text classification using BPLion-neural network and semantic word processing proposed by Ranjan (2017). It presents a semantic word processing technique for text categorization that utilizes semantic keywords, instead of using independent features of the keywords in the documents. Zhang Xiaofei et al. (2009) fusion clustering operation based on the KNN text classification method to improve the accuracy of text classification; Improving semi-supervised text classification by using Wikipedia knowledge proposed by Zhang Zhilin (2013). It proposed a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Zhu Jun et al. (2014) proposed an SVM method-based gene/protein name extraction, the accuracy of classification results reached 71.9. %. This method shows good performance when dealing with long text, but it cannot solve short text classification with sparse feature words and high unevenness of sample. It is obviously unable to meet the needs of data classification in the current network platform. Then there are some clustering algorithms for short text, such as the dynamic combination classification method of short text proposed by Yan Rui (2009). Liu Kang et al. (2014) using deep learning network, the space vector of high-dimensional and sparse short text is changed to a new low-dimensional and essential feature space. The method solves the classification of short text by constructing a tree combination classifier structure.