Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Data Pipeline

Encyclopedia of Data Science and Machine Learning
A sequence of data processing components connected in series, where the output of one part is the input of the next one, in which the pipeline can be operated in parallel or in time-sliced manner.
Published in Chapter:
Sustainable Big Data Analytics Process Pipeline Using Apache Ecosystem
Jane Cheng (UBS, USA) and Peng Zhao (INTELLIGENTRABBIT LLC, USA)
Copyright: © 2023 |Pages: 13
DOI: 10.4018/978-1-7998-9220-5.ch073
Abstract
This article provides a comprehensive understanding of the cutting-edge big data workflow technologies that have been widely applied in industrial applications, covering a broad range of the most current big data processing methods and tools, including Hadoop, Hive, MapReduce, Sqoop, Hue, Spark, Cloudera, Airflow, and GitLab. An industrial data workflow pipeline is proposed and investigated in terms of the system architecture, which is designed to meet the needs of data-driven industrial big data analytics applications concentrated on large-scale data processing. It differs from traditional data pipelines and workflows in its ability of ETL and analytical portals. The proposed data workflow can improve the industrial analytics applications for multiple tasks. This article also provides bid data researchers and professionals with an understanding of the challenges facing big data analytics in real-world environments and informs interdisciplinary studies in this field.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR