Initial Optimization Techniques for the Cube Algebra Query Language: The Relational Model as a Target

Initial Optimization Techniques for the Cube Algebra Query Language: The Relational Model as a Target

Thomas Mercieca, Joseph G. Vella, Kevin Vella
Copyright: © 2022 |Pages: 17
DOI: 10.4018/IJDWM.299016
Article PDF Download
Open access articles are freely available for download

Abstract

A common model used in addressing today's overwhelming amounts of data is the OLAP Cube. The OLAP community has proposed several cube algebras, although a standard has still not been nominated. This study focuses on a recent addition to the cube algebras: the user-centric Cube Algebra Query Language (CAQL). The study aims to explore the optimization potential of this algebra by applying logical rewriting inspired by classic relational algebra and parallelism. The lack of standard algebra is often cited as a problem in such discussions. Thus, the significance of this work is that of strengthening the position of this algebra within the OLAP algebras by addressing implementation details. The modern open-source PostgreSQL relational engine is used to encode the CAQL abstraction. A query workload based on a well-known dataset is adopted, and CAQL and SQL implementations are compared. Finally, the quality of the query created is evaluated through the observed performance characteristics of the query. Results show strong improvements over the baseline case of the unoptimized query.
Article Preview
Top

Introduction

The availability of massive amounts of data in every domain and application has led researchers to develop innovative techniques for representing and handling data, e.g., time-series data (Fu, 2011; Esling & Agon, 2012). Within the database field, such techniques are categorized under Online Transaction Processing (OLTP), dealing with voluminous transactions, and Online Analytical Processing (OLAP), addressing sophisticated data analysis over large and varied data sources.

In more detail, the OLAP field is concerned with keeping insightful analysis intuitive and related computation efficient. The OLAP Council (1995), although currently inactive, has provided the following OLAP requirements: (a) modelling across dimensions and through data hierarchies; (b) trend analysis over sequential time periods; (c) slicing subsets for data visualizations, and (d) drilling down to deeper levels for consolidation. Such requirements emphasize the importance of performance and the need for an adequate data model.

The prevailing data model for OLAP is the data cube. Several OLAP database algebras have been proposed in the literature, suggesting the need for thinking in this data model. Romero and Abelló (2007) survey several OLAP algebras, observing that such algebras express the same fundamental OLAP operations differently, i.e., through different formalisms and semantics to interact with the OLAP cube abstraction. Although interest in this area for its potential of logical optimization and reaching a consensus on modelling issues exists, work in this aspect has been lacking. A standard algebra has yet to be nominated despite being indicated as important for facilitating future research.

A recent development in the algebras is the Cube Algebra Query Language (CAQL) by Ciferri et al. (2013). The focus here is on a data model which is more straightforward and more intuitive for the end-user than comparable models proposed in the past. Thus, this study provides an overview of several algebras to clarify this point. This work aims to build on CAQL by exploring and commenting on the optimization potential of querying through this algebra, specifically by applying parallelism and logical optimization. The main contributions of this article are:

  • 1. The identification and application of several optimization methods which are applicable for this type of algebra based on parallel computing and logical rewriting.

  • 2. The adoption of a query workload and its application to the cube algebra domain, together with an evaluation that focuses on the quality of the query generated. The observed performance characteristics of the executed query are the metric used for evaluation.

  • 3. The strengthening of CAQL's position by addressing implementation details when several OLAP algebras exist, but a standard is lacking.

  • 4. The identification of a database engine that adequately implements the data cube abstraction. The DBMS must be extensible, allowing for seamless entrenchment of this algebra.

It is not within the scope of this article to study the extension of the cube algebras, e.g., through the proposal of new operators. Moreover, the algebra's expressiveness and its relationship with the relational algebra's expressiveness are not within this article's scope.

The remainder of this article is structured as follows. The following section discusses related work in cube data models and algebras. Then, the article describes the main optimization techniques used in this study. Following this section, the performance and experimental analysis of the techniques used are presented using a case study. The article then concludes with final remarks on this study and ideas for future research related to this work.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing