Article Preview
Top1. Introduction
A practical definition from Gartner for Big Data is “…high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision-making, and process automation…” (Gartner, n.d.). The 3V of Big Data – volume, variety and velocity – originally defined by Gartner, then have been substantially expanded with veracity, value, validity, variability and even visualization, leading to the term “V confusion” (Grimes, 2013). Avoiding the “confusion”, the authors wish to clarify that both the complexity of the data set (due to the volume and variety of the data), as well as the advanced analytics, are generating the need for “velocity”, i.e. processing power, to support the software used to analyze it and to have changed and integrate to support new insights and new uses, combined to create Big Data goals (Lokshina, Durkin, & Lanting, 2017). Hence “velocity” is not an inherent characteristic of the data sets, but rather indicative of the complexity of the data set combinations and analytical goals, associated with Big Data.
Big Data receives a lot of press and attention—and rightly so. Big Data, the combination of greater size and complexity of data with advanced analytics (Manyika et al., 2011; Boyd & Crawford, 2012; Rubinstein, 2012; Hartzog & Selinger, 2013; Lokshina, Durkin, & Lanting, 2017; Lokshina & Lanting, 2018) has been effective in:
- •
Improving national security
- •
Detecting tax evasion and black money streams
- •
Making marketing more effective
- •
Reducing credit risks
- •
Improving medical research
- •
Facilitating urban planning
In leveraging observable characteristics and events, Big Data combines information from diverse sources in new ways to create knowledge, make better predictions and tailor services (Mayer-Schonberger & Cukier, 2013; Lokshina, Durkin, & Lanting, 2017; Lokshina & Lanting, 2018). Governments can serve their citizens better, hospitals can become safer, firms extend credit to those previously excluded from the market, law enforcers can catch more criminal activities and nations can become safer.
Yet, Big Data (sometimes in academic circles included under “data analytics”) has been criticized for points of weakness, such as:
- •
Difficulties to obtain suitable data complementing “normal” data sets (Manyika et al., 2011; Agrawal et al., 2012)
- •
High cost to generate and analyze Big Data attributable to the need for a large number of programmers-analysts involved in the data and analytics generation process (Gartner, 2014; Lokshina, Durkin, & Lanting, 2017)
- •
Slow decisions because of processing large data sets and tuning data analysis (Manyika et al., 2011; Agrawal et al., 2012)
- •
Limited reproducibility and repeatability due to limited control over large data sets (Manyika et al., 2011; Agrawal et al., 2012)
- •
Requirement for new tactics and strategies, as well as new accounting rules, to capture the value and risks created in new transactions - for firms monetizing the value of data (Monga, 2014)
- •
Market difficulties (Boyd & Crawford; 2012; Hartzog & Selinger, 2013), for example:
- o
Data aggregators and brokers (Duhigg, 2012; Hill, 2013)
- o
Analysis tools, generic and market/application specific (Chen et al., 2012; Blake, 2017)
- •
Ethical concerns (Rubinstein, 2012; Wen, 2012), such as:
- o
Questionable data uses, not respecting original intended use, anonymity and privacy (Calo, 2011; Calo, 2013a; Lokshina & Thomas, 2013; Lokshina, Durkin, & Lanting, 2017; Lokshina & Lanting, 2018)
- o
Shared data sets (Manyika et al., 2011; Agrawal et al., 2012)
- o
Inappropriate and unjustifiable conclusions, creating tendentious assumptions and “fake news” (Eastwood, 2017; Ortutay, 2017; Chahal, 2017), etc.