Article Preview
Top1. Introduction
In database management systems (DBMS), query workloads are segmented into two broad modes (Elnaffar et al., 2002; Li et al., 2019). Online transactional processing (OLTP) workloads typically consist of write queries that modify small amounts of data, and queries that read a few records whilst projecting the majority of the attributes available (Bach & Werner, 2016). In OLTP, queries are expected to have short response times, often in the order of microseconds (Harizopoulos et al., 2018), in order to avoid user frustration and business impact (Poggi et al., 2014). At the other end of the spectrum, Online analytical processing (OLAP) workloads typically consist of read-only queries which traverse a large amount of records, performing aggregations and projecting a narrow set of attributes (Bach & Werner, 2016). A system dedicated to OLAP queries is also known as a Business Intelligence (BI) or Decision Support System (DSS), since such queries often aim to elicit information from a data warehouse to support making decisions.
Traditionally, longer response times for OLAP queries have been tolerated, and such queries tend to execute within a dedicated data warehouse which is periodically loaded by data coming from operational (OLTP) systems, typically via extract-transform-load (ETL) processes. On the other hand, modern business requirements are refusing the bounds of these assumptions. The phenomenon of perishable insights (E. A. Lee, 2018), as illustrated in Figure 1, indicates that, in some application domains such as fraud detection, data might lose value for decision making as time passes. In such use cases, increasing the data freshness in the OLAP database is beneficial.
Figure 1. Perishable Insights (E. A. Lee, 2018)
Top2. Problem Definition
Running transactional and analytical workloads efficiently on the same dataset is an open problem which attracts research and commercial interests (Yang et al., 2020). Referred to as Hybrid Transactional and Analytical Processing (HTAP), several approaches are proposed to tackle the ostensibly conflicting demands of preserving the performance of transactional workloads whilst at the same time running analytical queries efficiently on fresh data to facilitate time-critical business decisions.
Several HTAP systems presented in the literature are bespoke DBMSs. These vary from adopting the Single System for OLTP and OLAP approach (Yang et al., 2020) that typically rely on support from cutting-edge hardware (Appuswamy et al., 2017) to handle both OLTP and OLAP workloads on the same hardware, to those adopting the Separate OLTP and OLAP Systems approach, which deploy loosely-coupled OLTP an OLAP DBMSs.
Several problems are identified. Firstly, although data freshness is largely improved by taking the Single System for OLTP and OLAP approach, OLTP and OLAP workloads running on the same hardware conflict, with some systems reporting a reduction of OLTP throughput by three times when running OLAP queries concurrently (J. Lee et al., 2018).
Secondly, reliance on cutting-edge hardware, such as fast non-volatile memory (NVM), restricts DBMS users from exploiting commodity hardware for their workloads and may therefore be either an infeasible solution if the hardware is not available, or require a costlier hardware setup (Neumann & Freitag, 2020).
Lastly, an approach based on bespoke solutions forces the use of specific DBMSs, which might not be compatible with the rest of the software ecosystem or require specialised expertise on the database administrator (DBA) team, increasing the complexity of the information system (IS).