Big Data Analytics is a major branch of data science where the huge amount raw data is processed to get insight for relevant business processes. Integration of big data, its analytics along with Service Oriented Architecture (SOA) is need of the hour, such integration shall render reusability and scalability to various business processes. This chapter explains the concept of Big Data and Big Data Analytics at its implementation level. The Chapter further describes Hadoop and its technologies which are one of the popular frameworks for Big Data Analytics and envisage integrating SOA with relevant case studies. The chapter demonstrates the SOA integration with Big Data through, two case studies of two different scenarios are incorporated that integrates real world implementation with theory and enables better understanding of the industrial level processes and practices.
TopBig Data: An Introduction
Big Data as a terminology is mistaking as it is not small or big in term of data, but size in terms of volume as well as type of the data (structured/unstructured) in system. The Big Data is normally defined as the data set which is beyond the ability of traditional system to process. (Zikopoulos et al., 2011)
Evolution of Big Data and Beyond
Figure 1 the big data landscape envisages a huge collection of Technologies, Architectures and concepts. The evolution of Big Data can be traced backward to dot com period of late 1990. The record of many years as well as the rate of generation of the data has reached new high in the process of evolution. The Big Data is data which is generated by the various sources primarily the social network, extending to Internet of things and high end Information and analysis system like black box of airplane, DNA and forensic analysis and stock markets.
The term Big Data was coined by “Gartner Inc.” in 2007. In 2012, Gartner defined “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization”. They suggest multiple V’s to define the Big Data and these V’s are discussed in the next section.
V’s of Big Data
The Vs are used for defining the Big Data. (Sagiroglu, S. et.al. 2013) The Vs of Big Data can define in terms Volume, Value, Velocity, Veracity and Variety and shown in figure 2. (Verma, C. et. al., 2016) The different Vs are described below.
Volume
The volume measures the data in the terms of size. Due to high rate of generation with parallelism has led to many fold increase in the amount of the data.
Value
The value is cost of the data in term of its worth to the users. For example, the data about discount offer on certain product to prospective customer may have high value and same data may no value for uninterested person.
Figure 2. Different Vs of big data with key points
(Designed by the authors) Velocity
The velocity describes the speed at which data flows in and out of the system. The real-time data entry and access is the general trend in most of software based solutions, due to which the rate of data flow through system is very high.
Variety
The variety specifics the different type of data like text, images, video, and metadata. (Sagiroglu et al., 2013) The data may be structured or unstructured data and both type of data are being used in synchronous way. For example, the social networking application is using the text, images, video and metadata in single page or view.
Veracity
The veracity is related to uncertainty and reliability of the data. It relates to data in which trustworthiness is tested and source of data is subsequently processed.