Article Preview
Top1. Introduction
Business intelligence (BI) is a bundle of tools and techniques for timely detecting key business factors and effectively solving strategic decisional problems. The “new wave” of BI, often called BI 2.0, aims at specifically addressing more sophisticated user needs; among the characterizing trends of BI 2.0 we focus on pervasive BI, where information can be easily and timely accessed through devices with different computation and visualization capabilities, and with sophisticated and customizable presentations, by everyone in the organization (Rizzi, 2012).
In the context of pervasive BI, one of the key factors that rule the effectiveness of analysis is the achievement of a satisfactory (from the users’ viewpoint) compromise between the precision and the size of the information being displayed while analyzing multi-dimensional cubes. The OLAP paradigm gives a significant support in this direction by enabling users to interactively slice, dice, and aggregate cube facts, but this is not always sufficient: more detail gives more information, but at the risk of missing the overall picture, while focusing on general trends may prevent users from observing specific small-scale phenomena (Marcel, Missaoui, & Rizzi, 2012). This is also strictly related to the “information flooding” problem, that may happen because the user drilled down a cube up to a very detailed level, where a huge number of facts are to be returned. In this case, it may be very hard for the user to browse and analyze the results, especially if the device used has limited visualization and data-transmission capabilities.
Different approaches can be taken to cope with this issue. For instance, in query personalization there is an attempt to tune the size and pertinence of facts returned by considering the users’ preferred aggregation levels, measures, and slices (Golfarelli, Rizzi, & Biondi, 2011). In approximate query answering, the focus is on quickly returning an answer at the price of some imprecision in the returned values (Vitter & Wang, 1999). In intensional query answering, the set of facts returned by a query is summarized with a concise description of the properties shared by those facts (Marcel, Missaoui, & Rizzi, 2012). Other papers couple the OLAP paradigm with data mining techniques to create an OLAM approach where cubes can be mined “on-the-fly” to extract concise patterns for user’s evaluation (Han, 1997).
The shrink approach is a form of OLAM based on hierarchical clustering, specifically aimed at balancing precision with size in visualization of multi-dimensional cubes via pivot tables like the one shown in Figure 1. The shrink operator can be applied during an OLAP session to the cube resulting from a query to decrease its size while controlling the approximation introduced, like sketched in Figure 2. The idea is to fuse similar facts together and replace them with a single representative fact (computed as their average), respecting the bounds posed by dimension hierarchies.
Figure 1. A simple pivot table showing data per city and year
Figure 2. Functional overview of the shrink approach
A mono-dimensional version of the shrink operator has been proposed by Golfarelli, Graziani, & Rizzi (in press). In that work one shrink dimension is explicitly chosen by the user, and cube slices are fused together along that dimension until a user-specified precision/size trade-off is achieved. Though the mono-dimensional version has been shown to be quite effective and efficient in delivering compact visualizations of cubes, it suffers from two main drawbacks: (i) since the shrink dimension is fixed a priori, some possibly more effective directions for shrinking may be lost; and (ii) the approach is subject to the user’s discretion in choosing the shrink dimension.