Article Preview
TopIntroduction
The actors involved in IoT scenarios have extremely heterogeneous characteristics (in terms of processing and communication capabilities, energy supply and consumption, availability, and mobility), spanning from constrained devices, also denoted as “Smart Objects (SOs),” to smartphones and other personal devices, Internet hosts, and the Cloud. Smart Objects are typically equipped with sensors and/or actuators and are thus capable to perceive and act on the environment where they are deployed. By 2020, 50 billions of Smart Objects are expected to be deployed in urban, home, industrial, and rural scenarios (Evans, 2011), in order to collect relevant information, which may be used to build new useful applications.
Shared and interoperable communication mechanisms and protocols are currently being defined and standardized, allowing heterogeneous nodes to efficiently communicate with each other and with existing common Internet-based hosts or general-purpose Internet-ready devices. The most prominent driver for interoperability in the IoT is the adoption of the Internet Protocol (IP), namely IPv6 (Postel, 1981; Deering & Hinden, 1998). An IP-based IoT will be able to extend and interoperate seamlessly with the existing Internet.
In a typical IoT scenario, sensed data are collected by SOs, deployed in and populating the IoT network, and sent uplink to collection entities (servers or the Cloud). In some cases, an intermediate element may support the Cloud, carrying out storage, communication, or computation operations in local networks (e.g., data aggregation or protocol translation). This approach is the basis of the Fog Computing (Bonomi, Milito, Zhu, & Addepalli, 2012) and will be better explained in the “Background” section.
Figure 1 shows the hierarchical structure of layers involved in data collection, processing, and distribution in IoT scenarios.
Figure 1. The hierarchy of layers involved in IoT scenarios: the Fog works as an extension of the Cloud to the network edge to support data collection, processing, and distribution
With billions of nodes capable of gathering data and generating information, the availability of efficient and scalable mechanisms for collecting, processing, and storing data is crucial.
Big Data techniques, which were developed in the last few years and became popular due to the evolution of online and social/crowd services, address the need to process extremely large amounts of heterogeneous data for multiple purposes. These techniques have been designed mainly to deal with huge volumes of information (focusing on storage, aggregation, analysis, and provisioning of data), rather than to provide real-time processing and dispatching (Zaslavsky, Perera, & Georgakopoulos, 2013; Leavitt, 2013). Cloud Computing has found a direct application with Big Data analysis due to its scalability, robustness, and cost-effectiveness.
One of the distinctive features of IoT systems is the deployment of a huge amount of heterogeneous data sources collecting data from the environment and sending information through the internet to collectors. The work of all data sources generate, as a whole, streams with a very high frequency. Moreover, several relevant IoT scenarios (such as industrial automation, transportation, networks of sensors and actuators) need real-time or predictable latency.
The number of data sources, on one side, and the subsequent frequency of incoming data, on the other side, create a new need for Cloud architectures to handle such massive information flows.
Big Data approaches typically have an intrinsic inertia because they are based on batch processing. For this reason, they are not suitable to the dynamicity of IoT scenarios with real-time requirements.