Article Preview
Top1. Introduction
Information, in the present-day world, is one of the key resources for making informed decisions by businesses and/or by the Government. A large amount of data is being produced by a variety of systems like database transactions, human computer interactions, social media interactions, machines, sensors and internet of things (IoT), medical systems, Government data etc. This data, in general, is extremely large and processing this data requires advanced processing technologies. This Big data is characterized by its large Volume, heterogeneity or Variety, rapid rate of data generation or Velocity, a reasonable level of trustworthiness or Veracity and its decision-making capability or Value. These 5 characteristics are also expressed as the 5 Vs of Big data (Jacobs A. 2009; Zikopoulos et al., 2011; Gandomi & Haider, 2015; Kumar & Vijay Kumar, 2015). Big data has enhanced the decision-making capabilities of business organizations (Niu et al., 2021). (Araujo et al., 2020) discusses the trust on the decisions made using artificial techniques on Big data. Big data consists of structured, semi-structured and unstructured data. However, the size of unstructured and semi-structured data is very large in comparison to structured data. Therefore, Big data is stored using the distributed file system (DFS), which uses large sized storage blocks (DocumentationH 2008; Hadoop 2012; Dean & Ghemawat, 2012). The processing of this distributed data requires coordination and communication amongst the various distributed processing units. This resulted in the generation of several distributed processing systems for Big data storage and processing such as Hadoop and map reduce framework (DocumentationH 2008; Hadoop 2012; Manyika 2011; Dean & Ghemawat, 2012), Apache Hadoop, Apache Spark and cloud based map-reduce frameworks (Dahiphale et al., 2014; Dezyre, 2015) etc. In addition, a large number of NoSQL databases and Big data warehousing tools were developed to store and process Big data. Decision making through Big data processing has to address many challenges in order to succeed. These challenges are mostly in the category of data collection, data pre-processing, data processing and visualization of Big data. This paper attempts to address the issue of efficient Big data processing (Zhou et al., 2020; Shneiderman, 2020), which has a potential to support informed and timely decision-making. Big data processing distributes the processing tasks to a large number of connected data processing nodes, each of which contains part of the data that is required to be processed. These nodes are able to process data using deep learning algorithms by performing computation on extremely large size data, which may be generated by IoT enabled smart cities (Xiaoming et al., 2022) or 6G-enabled massive IoT devices (Lv et al., 2021). Such computation is scheduled and coordinated by a master node, which also monitors the execution of the Big data processing tasks. Big data view materialization (BDVM) is one of the mechanisms that improve the efficiency of Big data processing. There can be a very large number of Big data views (BDVs), wherefrom only a subset of these views, which optimize the query processing costs, are selected for materialization. BDVM problem was formulated as a single objective constraint optimization problem and as a bi-objective constraint optimization problem in (Kumar & Vijay Kumar, 2021a; 2021b) respectively for workload queries. The objectives of this bi-objective Big data view materialization (BiBDVM) problem are the minimization of the query evaluation (query processing) cost of a set of workload queries along with the minimization of the view update (view maintenance) costs for the materialized views. The BiBDVM has a constraint on the total size of the BDVs. BiBDVM problem has been addressed using several multi-objective evolutionary algorithms, viz. vector evaluated genetic algorithm (VEGA), multi-objective genetic algorithm (MOGA), strength Pareto evolutionary algorithm (SPEA-2) and non-dominated sorting algorithm-II (NSGA-II) (Kumar & Vijay Kumar, 2021b, 2021c, 2021d, 2021e). This paper adapts the reference point based non-dominated sorting genetic algorithm (NSGA-III) (Deb & Jain, 2014) to address this BiBDVM problem given in (Kumar & Vijay Kumar, 2021b).