ETL Logs Under a Pattern-Oriented Approach

ETL Logs Under a Pattern-Oriented Approach

Bruno Oliveira, Óscar Oliveira, Orlando Belo
Copyright: © 2021 |Pages: 19
DOI: 10.4018/IJDWM.2021100102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Considering extract-transform-load (ETL) as a complex and evolutionary process, development teams must conscientiously and rigorously create log strategies for retrieving the most value of the information that can be gathered from the events that occur through the ETL workflow. Efficient logging strategies must be structured so that metrics, logs, and alerts can, beyond their troubleshooting capabilities, provide insights about the system. This paper presents a configurable and flexible ETL component for creating logging mechanisms in ETL workflows. A pattern-oriented approach is followed as a way to abstract ETL activities and enable its mapping to physical primitives that can be interpreted by ETL commercial tools.
Article Preview
Top

Introduction

Data Warehouses (DW) store massive and integrated data representing a unified organisational view of transactional data needed for understanding the current, and possibly forecast, business activities. As faster and more flexible reports are made available by the DW, better and expedite decisions can be delivered within the organisation. Usually dealing with vast amounts of data collected from several business sources with their requirements, technology, and availability, the Extract-Transform-Load (ETL) system (Kimball and Caserta 2004) becomes a vital component of any DW. This system is responsible for processing and controlling how data will be extracted from diverse information sources, cleaned, transformed, and loaded according to established DW requirements. ETL must ensure data integrity and correctness as fundamental properties for adequate data analysis to support decision-making.

Considering ETL as a complex and evolutionary process, development teams must conscientiously and rigorously create strategies that allow for retrieving the most value of all information that can be gathered from events occurring through the ETL workflow. Logs suit well with this requirement as they provide available and suitable information about the events (Swennen et al. 2015). Logs should be well structured and easily extended to be efficiently used as a useful asset (Kreps 2014) either as relevant data that can be correlated and queryable (Andrews et al. 2018) for troubleshooting and development either for analytics or business intelligence (Khouri and Bellatreche 2018). Efficient logging strategies must be structured so that metrics, logs, and alerts can, beyond their troubleshooting capabilities, provide insights and discoveries about the system (e.g., allowing the ETL team to consider alternative approaches) behaviour (Leemans and van der Aalst 2015).

As more ETL requirements are being progressively introduced, such as low latency, high availability, and flexibility, alternative development approaches for ETL have been devised (Biswas et al. 2019; Raj et al. 2020). The Pattern-Oriented Approach (POA), proposed in (Oliveira and Belo 2017), simplify and decouples process components for ETL development and maintenance. Under POA, the most used tasks denoted hereafter as patterns, are identified and used as configurable components covering all ETL development phases (i.e., conceptual, logical, and physical phases). ETL metadata can be used to support error handling and logging strategies as an integral part of the pattern configuration. As presented in (Oliveira and Belo 2017), the Error Handling (Throwable) and Log patterns can be used as patterns' subsystems, encompassing all the logic behind the error and log handling. These two components represent a crucial resource for capturing ETL metadata contributing to improve or correct the logic of an ETL process (Belo et al. 2017) since they allow for monitoring and handling unexpected exceptions or errors that can occur.

In this paper, it is presented the Log Pattern, following the POA for ETL development. This pattern can be coupled with others present in the ETL workflow to generate information about the process life cycle. This pattern creates effective logging strategies, that, alongside the traditional system audit and recovery use, allows (in conjunction with other technologies and approaches) to streamline the ETL development and maintenance. As this pattern allows structured, extendable, and flexible logging mechanisms, diverse log analysis approaches can be exploited and used to drive the ETL development.

The rest of this paper is organised as follows. Section 2 presents an overview of the POA for ETL development that through a set of patterns, i.e., reusable components representing the most used processes, aims to enhance the domain best practices and improve the overall system quality. In Section 3, the Log Pattern is presented in detail. Section 4 demonstrates a case study with the application of patterns and respective log strategies. Finally, conclusions and future work directions are presented in the last section.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing