Article Preview
TopIntroduction
Internet of Things (IoT) was born due to progress in wireless communication. IoT introduces a revolution in the interconnection among objects such as smartphones, PCs, tablets, smart sensors, wearable devices and household machines. The advantage of IoT is that the connected items become intelligent and they are equipped to communicate with one another and with humans (Al-Fuqaha, et al., 2015). IoT can utilize many essential technologies such as embedded devices, communication technologies, Internet protocols and applications, and sensor networks to convert objects in the environment from traditional to smart. IoT has been identified as one of the most powerful tools for the future of Information and Communication Technology (ICT). Physical devices in the IoT environment can hear, see, think, talk (with other entities), share information, and coordinate decisions. The world itself continues to transition to a complicated form of connectivity technology, that is, “the Internet of People, Things, and Services (IoPTS)” (Shinde & Olesen, 2018).
IoT comprises of three main layers – application, network and physical layers. Attacks such as the increasingly popular Denial of Service (DoS) and Distributed Denial of Service (DDoS) could be introduced to any of the IoT layers. For example, jamming attack attacks IoT from the sensor/physical layer; flooding attack aims at the network layer; and reprogramming and path-based DDoS attacks target the application layer. IoT can also be threatened from its physical layer due to attacks such as Sybil, RF interference, tampering, object replication and tag cloning attacks. Attacks that could be found in the network layer include traffic analysis, spoofing attack, sinkhole, Hello flood, black-hole and man-in-the-middle (MITM) attacks. Attacks in the application layer include malicious code injection, software vulnerabilities, privacy leak, buffer overflow and cross-site scripting (XSS) attacks (Ahmed, Nasr, Abdel-Mageid, & Aslan, 2019).
By 2022, it is expected that IoT smart devices could reach 200 billion and 45% of the entire internet traffic might be managed by M2M (machine to machine) traffic flows (Gantz & Reinsel, 2013) (Taylor, 2013). McKinsey Global Institute also reported that the number of connected machines (units) increased by 300% since the last 5 years (Manyika, Chui, & Bughin, 2013). In the era of IoT, connectivity between things will continue to increase and the logo will be “connect each with every (EwE)”(Sharvari Tamane, Kumar, & Dey, 2017). Big Data is generated from the increase in IoT connected devices and applications. It has been described as “high-volume, high-velocity, and high-variety data”, which is challenging to process and perform decision-making operations. It is also difficult to efficiently and easily store, manage and process these data (Fazal-E-Amin, Alghamdi, Ahmad, & Hussain, 2015)(Hadi, Lawey, El-Gorashi, & Elmirghani, 2018). This is more pronounced when dealing with a more complicated and fast-changing environment such as IoT. Currently, around 90% of the world’s data is generated by digital devices such as laptops, tablets, smartphones, desktop computers, cameras, and wireless network sensors(Jat, Bishnoi, & Nambahu, 2018).
Techniques for Big Data promote data storage and processing. Apache Hadoop is an extremely scalable storage platform that was invented to process bulk data using numerous computing nodes that work in a parallel manner. MapReduce is considered as a programming model that enables this high scalability platform, Apache Hadoop, to be analyzed. As a result, MapReduce is at the heart of Hadoop. Hadoop Distributed File System (HDFS) and YARN are two main components of Apache Hadoop. HDFS is a distributed file system that was invented to operate through commodity hardware while YARN serves as a distributed operating system that is applicable for Big Data analysis (Yadav & Chandra, 2019).