Article Preview
Top1 Introduction
Advances in Information and Communication Technology (ICT) have impacted almost every aspect of human life including health, education, commerce, agriculture, scientific exploration etc., triggering the generation of large amounts of data. This Big data has a very large volume; is produced by variety of sources; is generated at a high speed or velocity; generally has low veracity or trustworthiness; and has low value. These characteristics of Big data are referred to as the five V’s of Big data (Jacobs, 2009; Zikopoulos et al., 2011; Gupta et al., 2012; Gandomi & Haider, 2015; Kumar & Vijay Kumar, 2015). Big data cannot be processed efficiently by the traditional technologies. This led to the emergence of Big data processing frameworks, which entail distributed processing over a Distributed File System (DFS). Some of the technologies and frameworks that can be used to process Big data include Hadoop, Apache Hadoop, map-reduce framework, NoSQL database, Apache Spark etc. (Hadoop 2008; Hadoop 2012; Manyika 2011; Dezyre, 2015; Dean & Ghemawat, 2012; Kumar & Vijay Kumar 2021a). These frameworks provide features of redundant Big data storage along with reliable distributed processing of Big data.
Big data processing has the potential to provide useful, unforeseeable information, which can benefit society in many different ways. For example, healthcare systems generate large amount of clinical, diagnostic, medical imaging, and public health data received from large number of hospitals and health centers, which can be used to predict and monitor the spread or outbreak of infectious diseases (Luo et al., 2016). One of the most recent applications of Big data involves the determining of the extent and possible future spread of corona virus disease, called COVID19, which is threatening human health internationally. This Big data application has faced the technological challenges of integrating redundant data from multiple diverse data sources and processing such geographically spread Big data in real time (Zhou et al., 2020). Another interesting application of Big data has been proposed in (Bibri, 2018), which relates to the use of the Internet of Things (IoT) devices in smart cities. IoT devices can produce large amounts of Big data in smart cities, which relate to people’s health, water system, electrical appliances, vehicles, machines, plants, soil, air etc. This Big data can be processed to determine the environmental impact of the smart cities, which can be used to model environmentally sustainable cities (Bibri, 2018). Thus, Big data applications can be created to facilitate healthcare, disease control, resource management, environmental protection etc. A Big data application is required to collect, clean, integrate, store, process, analyze and present information in various visual forms. Further, it must generate accurate real time information, as incorrect or delayed information has no value, especially in case of disasters. Thus, a Big data application must process data efficiently to produce information, which can be used for making timely decisions.