Big Data Analytics Lifecycle

Big Data Analytics Lifecycle

Copyright: © 2024 |Pages: 19
DOI: 10.4018/979-8-3693-0413-6.ch003
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Big data analysis is the process of looking through and gleaning important insights from enormous, intricate datasets that are too diverse and massive to be processed via conventional data processing techniques. To find patterns, trends, correlations, and other important information entails gathering, storing, managing, and analyzing massive amounts of data. Datasets that exhibit the three Vs—volume, velocity, and variety—are referred to as “big data.” The vast amount of data produced from numerous sources, including social media, sensors, devices, transactions, and more, is referred to as volume. The rate at which data is generated and must be processed in real-time or very close to real-time is referred to as velocity. Data that is different in its sorts and formats, such as structured, semi-structured, and unstructured data, is referred to as being varied.
Chapter Preview
Top

Introduction

Big data arose in recent years to fulfill the demands and challenges of expanding data volumes. Big data refers to the process of managing massive amounts of data from many sources such as DBMS, log files, social media posting, and sensor data (Bajaj et al. 2014). When we hear the term “big data,” we immediately think of the massive amounts of data that must be stored and processed. Indeed, a large volume of data is a big data type feature that exceeds Exabyte (1018), necessitating unique storage solutions, high performance data processing, and particular analytics capacity (Kaisler et al., 2013). Big data is a collection of complex datasets (text, numbers, photos, and videos) in massive volumes that exceed the capabilities of typical database management systems (Govindarajan et al., 2014).

Big data, in particular, has three primary characteristics: volume, velocity, and variety. Aside from the three Vs, other big data traits included value and complexity (Kaisler et al., 2013; Katal et al., 2013). The volume attribute denotes the amount of data. In general, big data has a vast volume of data that is beyond the capacity of typical storage systems. According to (Bajaj et al.,2014), 90 percent of the world's current data was created in the last two years, with an average of 2.5 quintillions of data bytes created everyday. The velocity aspect of big data relates to the rate at which data is generated and processed (Bajaj et al.,2014). Currently, data and information are generated and processed at a high pace, resulting in a massive amount of knowledge being contributed to the knowledge base; this velocity rate of big data necessitates more processing power than older systems. Furthermore, the term velocity alludes to the rapid movement of data between data storage locations via networks (Bajaj et al., 2014).

Variety is another important aspect of huge data. The term “variety” in big data refers to the various resources that generate data in various formats and types (Bajaj et al. 2014; Govindarajan et al. 2014; Kaisler et al., 2013; Katal et al., 2013). Digital photographs and videos, social media, sensor data, healthcare data records, text, log files, tweets, and purchase transaction records are all examples of data resources. In other words, big data is made up of several data forms, including structured, unstructured, and semi-structured data.

Value and complexity are two further big data properties (Kaisler et al., 2013). The value attribute in big data refers to the usefulness of information (knowledge) that may be derived from processing and analyzing big data. This newly produced information is beneficial and supportive of decision-making (Katal et al., 2013). The complexity attribute refers to the complexity of relationships and links in a large data structure. In this regard, we may understand how complex it is when only a few changes occur in enormous amounts of data, resulting in a significant number of modifications (Katal et al., 2013).

Key Terms in this Chapter

Data Visualization: Data visualization involves the creation of visual representations, such as graphs, charts, plots, or infographics, with the purpose of facilitating comprehension and interpretation of data and analytical outcomes, hence enabling the extraction of practical insights.

Data Security and Privacy: The protection of data from unauthorized access, use, disclosure, disruption, modification, or destruction. The practice of safeguarding large-scale datasets from unlawful access, utilization, disclosure, disturbance, alteration, or eradication. The process of safeguarding individuals' rights to exercise control over their personal data.

Data Analysis: Data analysis is the systematic examination, manipulation, and modelling of data in order to uncover valuable insights, propose valid conclusions, and facilitate informed decision-making. This process entails the utilization of analytical and statistical methodologies and algorithms to analyse and assess data.

Big Data: The term “big data” pertains to data sets that are characterized by their vast size and intricate nature, posing challenges for processing and analysis by conventional data processing techniques. The primary attributes associated with big data encompass: The generation of data from diverse sources has resulted in a substantial volume. In the realm of big data, data sets are typically characterized by their substantial size, sometimes measured in terabytes and petabytes. Diverse Range - Big data encompasses a diverse array of data sources and formats. The data may be organized in a structured format, such as databases, or in an unstructured format, encompassing emails, images, videos, and similar content. The velocity of data generation and processing in the context of big data is characterized by a high rate of speed. Streaming data plays a key role within the realm of big data. The veracity of data can be compromised by inconsistencies, noise, and irregularities. The task of maintaining the high quality and integrity of data presents significant challenges. The primary attribute of big data is in its capacity to reveal insights that contribute to company value, impact, and improved decision-making capabilities. The extraction of value from intricate data is of paramount importance.

Data Acquisition: Data acquisition refers to the systematic procedure of collecting, obtaining, or retrieving pertinent organized and unstructured data from diverse sources. The collection of this unprocessed data is necessary prior to conducting any analysis.

Data Pre-Processing: Data pre-processing involves the necessary steps to prepare raw data for analysis. The process encompasses many techniques such as data cleansing, data integration, data reduction, data transformation, and data discretization, which are employed to ensure that the data aligns with the analysis prerequisites.

Data Generation: Data generation refers to the process of generating data from diverse sources, including but not limited to social media platforms, sensors, and mobile devices. Ongoing generation of new data necessitates its collection and analysis.

Complete Chapter List

Search this Book:
Reset