Article Preview
Top1. Introduction
The data warehouse is used to hold historical and cross-functional data. Organizations use data warehouses as their integrated enterprise repository of data coming from disparate operational sources. As the business environment has become more global and competitive the data warehouse has proved to be a very critical technology for an organization to better manage and enhance its information. This in turn helps an organization become more competitive, better understand its customers, and more rapidly meet market demands (Roberts & Grover, 2012; Akhter & Rahman, 2015; Wixom & Watson, 2001).
In order to provide faster business information to different groups of users including the business intelligence (BI) community, a data warehouse should be stable and its computing resources available to users (Rahman, 2010). To achieve that goal, the queries that get executed on a data warehouse need to be efficient. Efficiency of queries can be ensured only by putting some consistent and repeatable measures (Chandra et al., 2015; Mahmood et al., 2014) on the quality of queries which will provide necessary indicators about the on-going health of a data warehouse environment. We need to make sure the analytical tools and SQL queries we run on the data warehouse, use optimal resources, and have parallel efficiency (Rahman, 2013b). In addition to queries, data warehouse design, extract-transform-load (ETL) development, and load strategy need to be efficient. We need load process strategies that save database management system (DBMS) resources (Rahman, 2010; Narasayya et al., 2015) and ensure DBMS resources availability of analytical tools and query processing while the load is running.
Kaplan and Norton (1992) proposed a balance scorecard approach and compared it with dials and indicators of an airline cockpit that provides detailed information about complex tasks used by pilots when navigating and flying an airplane. In order to ensure the quality of SQL that run on the data warehouse and get indicators on SQL performance we propose a balanced scorecard approach (Rahman, 2013a) for the SQL used to load tables or retrieve information from the data warehouse.
In our case, a balanced scorecard allows the data warehouse team and other stakeholders to look at the data warehouse performance from four important perspectives and provides answers to the following questions. How do users, including analytical and BI community, see us running the data warehouse (customer perspective)? What must we excel at in maintaining a data warehouse (internal perspective)? How can we continue to improve and create value by overcoming technical challenges such as innovation (Ramamurthy et al., 2008) (learning perspective)? How do we enable management initiatives (financial perspective)? A balanced scorecard minimizes information overload by limiting the number of measures used (Kaplan & Norton; 1992, 2007). We need to make sure that the database system resources such as CPU time and I/O utilization are as optimal as possible. This can be achieved by introducing SQL scorecard tools.
A data warehouse environment performance and stability depends on several factors including the design of SQL queries, data warehouse logical and physical data modeling (Bowen et al., 2009; Schlesinger & Rahman, 2015), and capacity planning. Writing efficient SQL queries is the most important piece among the stated factors. Gartner reports that analysts are running more ad hoc, but equally, complex queries, each competing for CPU, memory and disk access (Feinberg & Beyer, 2010). This research is conducted to tackle the issues of SQL queries that are run to load tables by executing stored procedures and macros, executing queries by various reporting and business intelligence (BI) tools. In a data warehouse hundreds of queries are run at any point in time either to load tables via batch cycles (Rahman, 2010) or retrieve information by analytical community using reporting and BI tools. It is very important that SQL queries are written efficiently. In a parallel processing data warehousing architecture, the SQL queries that are written must be compliant with PE architecture (Deepak et al., 2012; Rahman, 2013b).