Multi-Objective Big Data View Materialization Using NSGA-III

Multi-Objective Big Data View Materialization Using NSGA-III

Akshay Kumar, T. V. Vijay Kumar
Copyright: © 2022 |Pages: 28
DOI: 10.4018/IJDSST.311066
Article PDF Download
Open access articles are freely available for download

Abstract

Present day applications process large amount of data that is being produced at brisk rate and is heterogeneous with levels of trustworthiness. This Big data largely consists of semi-structured and unstructured data, which needs to be processed in admissible time so that timely decisions are taken that benefit the organization and society. Such real time processing would require Big data view materialization that would enable faster and timely processing of decision making queries. Several algorithms exist for Big data view materialization. These algorithms aim to select Big data views that minimize the total query processing cost for the query workload. In literature, this problem has been articulated as a bi-objective optimization problem, which minimizes the query evaluation cost along with the update processing cost. This paper proposes to adapt the reference point based non-dominated sorting genetic algorithm, to design an NSGA-III based Big data view selection algorithm (BDVSANSGA-III) to address this bi-objective Big data view selection problem. Experimental results revealed that the proposed BDVSANSGA-III was able to compute diverse non-dominated Big data views and performed better than the existing algorithms..
Article Preview
Top

1. Introduction

Information, in the present-day world, is one of the key resources for making informed decisions by businesses and/or by the Government. A large amount of data is being produced by a variety of systems like database transactions, human computer interactions, social media interactions, machines, sensors and internet of things (IoT), medical systems, Government data etc. This data, in general, is extremely large and processing this data requires advanced processing technologies. This Big data is characterized by its large Volume, heterogeneity or Variety, rapid rate of data generation or Velocity, a reasonable level of trustworthiness or Veracity and its decision-making capability or Value. These 5 characteristics are also expressed as the 5 Vs of Big data (Jacobs A. 2009; Zikopoulos et al., 2011; Gandomi & Haider, 2015; Kumar & Vijay Kumar, 2015). Big data has enhanced the decision-making capabilities of business organizations (Niu et al., 2021). (Araujo et al., 2020) discusses the trust on the decisions made using artificial techniques on Big data. Big data consists of structured, semi-structured and unstructured data. However, the size of unstructured and semi-structured data is very large in comparison to structured data. Therefore, Big data is stored using the distributed file system (DFS), which uses large sized storage blocks (DocumentationH 2008; Hadoop 2012; Dean & Ghemawat, 2012). The processing of this distributed data requires coordination and communication amongst the various distributed processing units. This resulted in the generation of several distributed processing systems for Big data storage and processing such as Hadoop and map reduce framework (DocumentationH 2008; Hadoop 2012; Manyika 2011; Dean & Ghemawat, 2012), Apache Hadoop, Apache Spark and cloud based map-reduce frameworks (Dahiphale et al., 2014; Dezyre, 2015) etc. In addition, a large number of NoSQL databases and Big data warehousing tools were developed to store and process Big data. Decision making through Big data processing has to address many challenges in order to succeed. These challenges are mostly in the category of data collection, data pre-processing, data processing and visualization of Big data. This paper attempts to address the issue of efficient Big data processing (Zhou et al., 2020; Shneiderman, 2020), which has a potential to support informed and timely decision-making. Big data processing distributes the processing tasks to a large number of connected data processing nodes, each of which contains part of the data that is required to be processed. These nodes are able to process data using deep learning algorithms by performing computation on extremely large size data, which may be generated by IoT enabled smart cities (Xiaoming et al., 2022) or 6G-enabled massive IoT devices (Lv et al., 2021). Such computation is scheduled and coordinated by a master node, which also monitors the execution of the Big data processing tasks. Big data view materialization (BDVM) is one of the mechanisms that improve the efficiency of Big data processing. There can be a very large number of Big data views (BDVs), wherefrom only a subset of these views, which optimize the query processing costs, are selected for materialization. BDVM problem was formulated as a single objective constraint optimization problem and as a bi-objective constraint optimization problem in (Kumar & Vijay Kumar, 2021a; 2021b) respectively for workload queries. The objectives of this bi-objective Big data view materialization (BiBDVM) problem are the minimization of the query evaluation (query processing) cost of a set of workload queries along with the minimization of the view update (view maintenance) costs for the materialized views. The BiBDVM has a constraint on the total size of the BDVs. BiBDVM problem has been addressed using several multi-objective evolutionary algorithms, viz. vector evaluated genetic algorithm (VEGA), multi-objective genetic algorithm (MOGA), strength Pareto evolutionary algorithm (SPEA-2) and non-dominated sorting algorithm-II (NSGA-II) (Kumar & Vijay Kumar, 2021b, 2021c, 2021d, 2021e). This paper adapts the reference point based non-dominated sorting genetic algorithm (NSGA-III) (Deb & Jain, 2014) to address this BiBDVM problem given in (Kumar & Vijay Kumar, 2021b).

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 2 Issues (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing