Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Data Streams Processing Techniques Data Streams Processing Techniques

Fatma Mohamed, Rasha M. Ismail, Nagwa. L. Badr, Mohamed F. Tolba

Source Title: Handbook of Research on Machine Learning Innovations and Trends

DOI: 10.4018/978-1-5225-2229-4.ch015

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Many modern applications in several domains such as sensor networks, financial applications, web logs and click-streams operate on continuous, unbounded, rapid, time-varying streams of data elements. These applications present new challenges that are not addressed by traditional data management techniques. For the query processing of continuous data streams, we consider in particular continuous queries which are evaluated continuously as data streams continue to arrive. The answer to a continuous query is produced over time, always reflecting the stream data seen so far. One of the most critical requirements of stream processing is fast processing. So, parallel and distributed processing would be good solutions. This paper gives (1) analysis to the different continuous query processing techniques; (2) a comparative study for the data streams execution environments; and (3) finally, we propose an integrated system for processing data streams based on cloud computing which apply continuous query optimization technique on cloud environment.

Chapter Preview

Top

Introduction

Recently a new class of data-intensive applications has become widely recognized: applications in which the data are modeled best not as persistent relations but as transient data streams. However, their continuous arrival in multiple, rapid, time-varying, possibly unpredictable and unbounded streams appear to yield some fundamentally new research problems. These applications also have inherent real-time requirements, and queries on the streaming data should be finished within their respective deadlines (Kapitanova, Son, Kang & Kim, 2011; Lijie & Yaxuan, 2010). In this context, researchers have proposed a new computing paradigm based on Stream Processing Engines (SPEs). SPEs are computing systems designed to process continuous streams of data with minimal delay. Data streams are not stored, but are processed on-the-fly using continuous queries. The latter differs from queries in traditional database systems because a continuous query is constantly “standing” over the streaming tuples and results are continuously output. In the last few years, there have been substantial advancements in the field of data stream processing. From centralized SPEs, to Distributed Stream Processing Engines (DSPEs), which distribute different queries among a cluster of nodes (interquery parallelism) or even distributing different operators of a query across different nodes (interoperator parallelism). However, some applications have reached the limits of current distributed data streaming infrastructures (Gulisano, Jimenez-Peris, Patino-Martinez, Soriente & Valduriez, 2012).

Because of the continuous changes in input rates, DSPSs need techniques for adjusting resources dynamically with workload changes. Making decisions when to update resource allocation in response to workload changes and how, is an important issue. Effective algorithms for elastic resource management and load balancing were proposed, where resizes the number of VMs in a DSPS deployment in response to workload demands by taking throughput measurements of each involved VM (Cerviño, Kalyvianaki, Salvachúa & Pietzuch, 2012; Fernandez, Migliavacca, Kalyvianaki & Pietzuch, 2013; Gulisano et al., 2012). Thus, cloud computing has emerged as a flexible for facilitating resource management for elastic application deployments at unprecedented scale. Cloud providers offer a shared set of machines to cloud tenants, often following an Infrastructure-as-a-Service (IaaS) model. Tenants create their own virtual infrastructures on top of physical resources through virtualization. Virtual machines (VMs) then act as execution environments for applications (Cerviño et al., 2012).

Thus, we categorize research challenges in data streams to: 1) Continuous queries processing which focus on continuous queries optimization, how to provide real time answering of continuous queries, how to process different typed of continuous queries, and how efficiently process multiple continuous queries. 2) Data streams execution environments where different environments were proposed to execute data streams such as parallel, distributed, and cloud environments. Where, exploiting parallelism and distribution techniques to fast data streams processing, also exploiting virtualization strategies in cloud to provide elastic processing environment in response to workload demands.

The rest of the chapter is organized as follows: related background is introduced first, and then efficient processing techniques for continuous queries, including different algorithms for effective continuous query optimization are presented. Then, we present different execution environments for data streams, which include parallel, distributed, and cloud environments. Then, we present the related research issues. And then our proposed system for data streams processing over cloud computing is presented. Then future research directions are presented. Finally, we present the conclusion.

Key Terms in this Chapter

Skyline Query: A type of query which used to return the objects which are not dominated by any other objects.

Sliding Window: A processing model which is used to process the continuous data streams in an incremental manner.

MapReduce: A programming model which process massive amounts of unstructured data in parallel and distributed cluster of processors.

Nearest Neighbor Query: A type of query which used to find the nearest neighbor objects to a given point in space.

Cloud Computing: A type of internet-based computing which based on sharing computing resources rather than having local servers or personal devices to handle applications.

Continuous Query: A data streams’ query which is evaluated continuously over time with the continuous arrival of data streams.

Data Streams: A continuous, unbounded, rapid and time-varying data elements which generated from many modern applications such as sensor networks, financial applications and web logs applications.

Pattern Mining: An important concept in data mining which is used to find existing patterns in data.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Data Streams Processing Techniques Data Streams Processing Techniques

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List