Multi-Objective Big Data View Materialization Using Improved Strength Pareto Evolutionary Algorithm

Multi-Objective Big Data View Materialization Using Improved Strength Pareto Evolutionary Algorithm

Akshay Kumar, T. V. Vijay Kumar
Copyright: © 2022 |Pages: 23
DOI: 10.4018/JITR.299947
Article PDF Download
Open access articles are freely available for download

Abstract

Big data refers to the enormous heterogeneous data being produced at a brisk pace by a large number of diverse data generating sources. Since traditional data processing technologies are unable to process big data efficiently, big data is processed using newer distributed storage and processing frameworks. Big data view materialization is a technique to process big data queries efficiently on these distributed frameworks. It generates valuable information, which can be used to take timely decisions, especially in cases of disasters. As there are a very large number of big data views, it is not possible to materialize all of them. Therefore, a subset of big data views needs to be selected for materialization, which optimizes the query response time for a given set of workload queries with minimum overheads. This big data view materialization problem, having objectives minimization of the query evaluation cost of a set of workload queries, while simultaneously minimizing the update processing costs of the materialized views, has been addressed using improved strength pareto evolutionary algorithm (SPEA-2) in this paper. The proposed big data view selection algorithm, which is able to compute a set of diverse non-dominated big data views, is shown to perform better that existing big data view selection algorithms..
Article Preview
Top

1 Introduction

Advances in Information and Communication Technology (ICT) have impacted almost every aspect of human life including health, education, commerce, agriculture, scientific exploration etc., triggering the generation of large amounts of data. This Big data has a very large volume; is produced by variety of sources; is generated at a high speed or velocity; generally has low veracity or trustworthiness; and has low value. These characteristics of Big data are referred to as the five V’s of Big data (Jacobs, 2009; Zikopoulos et al., 2011; Gupta et al., 2012; Gandomi & Haider, 2015; Kumar & Vijay Kumar, 2015). Big data cannot be processed efficiently by the traditional technologies. This led to the emergence of Big data processing frameworks, which entail distributed processing over a Distributed File System (DFS). Some of the technologies and frameworks that can be used to process Big data include Hadoop, Apache Hadoop, map-reduce framework, NoSQL database, Apache Spark etc. (Hadoop 2008; Hadoop 2012; Manyika 2011; Dezyre, 2015; Dean & Ghemawat, 2012; Kumar & Vijay Kumar 2021a). These frameworks provide features of redundant Big data storage along with reliable distributed processing of Big data.

Big data processing has the potential to provide useful, unforeseeable information, which can benefit society in many different ways. For example, healthcare systems generate large amount of clinical, diagnostic, medical imaging, and public health data received from large number of hospitals and health centers, which can be used to predict and monitor the spread or outbreak of infectious diseases (Luo et al., 2016). One of the most recent applications of Big data involves the determining of the extent and possible future spread of corona virus disease, called COVID19, which is threatening human health internationally. This Big data application has faced the technological challenges of integrating redundant data from multiple diverse data sources and processing such geographically spread Big data in real time (Zhou et al., 2020). Another interesting application of Big data has been proposed in (Bibri, 2018), which relates to the use of the Internet of Things (IoT) devices in smart cities. IoT devices can produce large amounts of Big data in smart cities, which relate to people’s health, water system, electrical appliances, vehicles, machines, plants, soil, air etc. This Big data can be processed to determine the environmental impact of the smart cities, which can be used to model environmentally sustainable cities (Bibri, 2018). Thus, Big data applications can be created to facilitate healthcare, disease control, resource management, environmental protection etc. A Big data application is required to collect, clean, integrate, store, process, analyze and present information in various visual forms. Further, it must generate accurate real time information, as incorrect or delayed information has no value, especially in case of disasters. Thus, a Big data application must process data efficiently to produce information, which can be used for making timely decisions.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 14: 4 Issues (2021)
Volume 13: 4 Issues (2020)
Volume 12: 4 Issues (2019)
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing