A Multi-Objective Approach to Big Data View Materialization

A Multi-Objective Approach to Big Data View Materialization

Akshay Kumar, T. V. Vijay Kumar
Copyright: © 2021 |Pages: 21
DOI: 10.4018/IJKSS.2021040102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Big data comprises voluminous and heterogeneous data that has a limited level of trustworthiness. This data is used to generate valuable information that can be used for decision making. However, decision making queries on Big data consume a lot of time for processing resulting in higher response times. For effective and efficient decision making, this response time needs to be reduced. View materialization has been used successfully to reduce the query response time in the context of a data warehouse. Selection of such views is a complex problem vis-à-vis Big data and is the focus of this paper. In this paper, the Big data view selection problem is formulated as a bi-objective optimization problem with the two objectives being the minimization of the query evaluation cost and the minimization of the update processing cost. Accordingly, a Big data view selection algorithm that selects Big data views for a given query workload, using the vector evaluated genetic algorithm, is proposed. The proposed algorithm aims to generate views that are able to reduce the response time of decision-making queries.
Article Preview
Top

1. Introduction

Big data analysis is an essential element of the Business Intelligence (BI) process used for generating beneficial information for an organization to enable it to take appropriate and timely decisions. This Big data is collected from various sources, such as transactional data, e-commerce data, social media, scientific explorations, IoT devices etc. A Big data application requires to efficiently process large amount of data, which is collected and integrated from large numbers of unverifiable sources. The Big data sources, which could be heterogeneous - structured, semi-structured and unstructured, generate data at a brisk pace with such data having varying levels of integrity (Gupta et al., 2012; Jacobs, 2009; Kumar & Vijay Kumar, 2015; Zikopoulos, 2011). The raw Big data is cleaned, collated and analyzed to make it more reliable and valuable so that it can be used for effective and efficient business decision making. However, this high value visual information should be valid and should not be vulnerable to heterogeneity considering the low veracity of Big data (Firican, 2017; Gandomi & Haider, 2015; Khan et al., 2014).

A Big data application must efficiently process the large and heterogeneous data generated from various sources. Big data view materialization is a technique that can optimize the processing time of Big data queries, even as continuous updates to Big data continue in real time. View materialization, which is a widely studied problem for various types of database systems, is concerned with the identification of sets of views which, when materialized, would optimize the query processing time and the resource utilization, even as the data continues to receive updates. This is shown to be an NP-Hard problem (Harinarayan et al., 1996). View materialization was first studied in the context of RDBMS and data warehouse (Chirkova et al., 2001; Gupta, 1996; Harinarayan et al., 1996; Mami & Bellahsene, 2012; Roussopoulos, 1998). Empirical based (Agrawal et al., 2000), heuristic based (Gupta, 1996; Harinarayan et al., 1996) and meta-heuristic based (Goswami et al., 2017; Arun & Vijay Kumar, 2015a, 2015b, 2017a, 2017b; Vijay Kumar & Arun, 2016, 2017, Vijay Kumar & Kumar, 2014, 2015; Kumar & Vijay Kumar, 2018) techniques were used to address this problem.

One of the key characteristics of materializing views over a data warehouse was the ease of data representation due to the structured nature of the data. However, a large portion of Big data is semi-structured and unstructured. Big data is voluminous, has a high rate of growth and is low on authenticity. In addition, Big data is not directly processed by structured tools like RDBMS and Data Warehouse. Rather, a large number of frameworks and tools are being used to store and process Big data. Some of these include the Hadoop distributed file system (HDFS), map-reduce framework, Apache Hadoop, Apache Spark framework (Dean & Ghemawat, 2012; Dezyre, 2015; Hadoop, 2008; Hadoop, 2012; Manyika et al., 2011) and many other tools including NoSQL databases, Hive, BigTable, Neo4j etc. As Big data is voluminous, it is stored and processed using distributed systems. Thus, the Big data view materialization problem needs to be addressed for distributed file systems (DFS).

This paper defines the Big data view materialization problem, as a bi-objective optimization problem with the objectives being to minimize the query evaluation cost of workload queries, as also to minimize the update processing cost of the materialized views, subject to a constraint on the total size of the materialized views. These two objectives, in general, conflict with each other, as minimizing the query evaluation cost may also lead to an increase in the update processing cost and vice versa. This paper uses the vector evaluated genetic algorithm (VEGA), a multi-objective evolutionary algorithm given in (Schaffer, 1985), to select Big data views that can reduce the response times for the workload queries. Accordingly, a VEGA based Big data view selection algorithm that selects Big data views for a given query workload is proposed herein.

This paper is organized as follows: Section 2 discuses a brief account of the view materialization problem for different data types and database management systems. View materialization in the context of Big data is discussed in section 3. Section 4 discusses the formulation of the Big data view materialization problem, as a bi-objective Big data view selection problem. VEGA based Big data view selection algorithm is given in section 5 followed by an example illustrating its use to select Big data views in section 6. Experimental results are discussed in section7. Section 8 is the conclusion.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing