Article Preview
Top“Big data” describes innovative techniques and technologies to capture, store, distribute, manage and analyze datasets that traditional data management methods are normally unable to handle. The concept of “Big data” was first defined by Laney in his research note (Laney, 2001). According to the definition, big data is mainly characterized by three Vs: Volume, Velocity, and Variety (Zikopoulos et al., 2012). The first V, refers to the data volume. General speaking, the size of the data sets of big data is huge compared to regular data. However, it seems that there is no fixed definition for the size, i.e. how big of data could be classified as big data. Therefore, the size might vary based on the disciplines. Traditional software usually can handle megabyte and even gigabyte sized data sets, while big data tools should be able to handle terabyte and petabyte sized data sets. The second V, velocity, refers to the situation where data is created dynamically and accessed in a fast way; The data come in frequently, such as every second or so, and data access often has to be finished in a fraction of a second. Sometimes, data processing has to be done in real time therefore the software system has large throughput. The third V, referring to variety, indicates data heterogeneity which makes big data sets harder to organize and analyze. The regular type of data collected by researchers or businesses is strictly structured, such as data entered into a spreadsheet with specific rows and columns. However, big data sets often have unstructured data and different types of data, such as email messages or notes.