Article Preview
TopIntroduction
In this early 21st century organizations capture all business activities in some computer storage systems. Data are also gathered or used from the systems owned by others. Over a period of time data growth has increased significantly in organizations. Emergence of the Internet, social networking tools (e.g., Twitter, Facebook and LinkedIn), and online shopping sites allows for capturing huge data volume related to business. In 2014 the US government had mandated release of huge volume of privacy-protected Healthcare and Medicare data which could be used by researchers, policy makers, business organizations and general public for analysis and decision making (US Govt. Health and Human Services Office, 2014). With the advent of commodity hardware (to process big data), computer processing power (thanks to Moore’s Law), maturity of computer engineering, software engineering, network bandwidth and increasingly low cost of data storage companies are able to capture, process, transform and store large volume of data. Organizations find business value in the data and have come to rely on with their decision-making process.
Traditional Business Intelligence (BI) tools consists of reports, interactive query and Online Analytical Processing (OLAP) all of which can provide intelligence as to what happened in the past. In the past reporting was based on what happened. These days business would like to understand what is going to happen now and in the future using predictive analytics (for example). With the increase in fraudulent activities there is a desire to detect it immediately – using anomaly detection (credit card fraud). Here data mining techniques and algorithms come into picture and play a prominent role in providing solutions to complex business problems.
Data mining techniques are used to discover previously unknown and valuable interesting patterns and relations in large data sets (Phyu, 2009). Given that the amount of data has been growing in enterprise data warehouses, it is apparent that data-driven decision making will help organizations achieve competitive advantage. However, the challenge of using this data to achieve business success is dependent on efficient data mining methods (Wu et al., 2014) that help in extracting hidden knowledge and translating that into business values.
In this study, the author presents the data mining problems that are aligned with providing solution to business problems. These data mining problems include anomaly detection (Ogwueleka et al., 2011), prediction, classification, pattern recognition (Jain, 2010), sequence discovery, data visualization (Shaw et al., 2001) and recommendation system. The author also presents the data mining techniques used to solve data mining problems. These are Bayesian networks, neural networks, decision trees, association rules, clustering, support vector machines, logistic regression, and k-nearest neighbors.
Last two decades most research was conducted on the theoretical and computational process of data mining and knowledge discovery (Shaw et al., 2001). Now is the time to evaluate those data mining techniques and classify them from data mining problem taxonomy perspective. This will allow users to choose an optimal algorithm to solve a particular data mining problem. This paper discusses data mining problem taxonomy issues by presenting a complete taxonomy of data mining problems from the context of real world applications.