Article Preview
Top1. Introduction
As the amount of data grows very fast inside and outside organizations, it is getting important to seamlessly analyze both of them in order to help decision makers to better understand the business processes of their organizations and make well-founded decisions they cannot make relying on a conventional (numerical) Data Warehouse (DW). In this context, several studies have been interested in the manipulation of documentary information through OLAPing documents (Zhang et al., 2009), or by modeling documents relying on facets (Kumar et al., 2012) that describe several viewpoints.
For OLAPing documents, two categories of works can be distinguished: (1) Those having enriched the classical Multidimensional Models (MM) (i.e., star, snowflake and constellation schemas) with extensions for textual data processing ((Feki et al., 2013) and (Hachaichi et al., 2010) for data-centric documents; (Lin et al., 2008) for document-centric documents), and (2) Those who proposed MM specific for documents such as Galaxy model (Ravat et al., 2008) and Diamond model (Azabou et al., 2018).
Other research works as (Cabanac et al., 2010) and (Hernandez et al., 2008) were interested in the multi-representation of documents by using the concept of facet; a facet describes useful aspects of documents as semantics and context. In fact, various types of facets have been proposed in the literature; however, they are application domain-dependent. Therefore, it would be interesting to seek for standard facets, i.e., application domain-independent, thus enabling modeling of documents of any field.
In this paper, we propose our CobWeb as an extension to the Galaxy model (Ravat et al., 2008) and we base it on standard facets. Each facet includes a set of data and is considered as a means for users to express their needs; that is why we transform later facets into dimensions. Note that in multidimensional modeling, a dimension is a set of attributes called parameters that are organized, from the finest to the highest granularity, into hierarchies (e.g., a hierarchy for the TIME dimension could be: Day < Month < Quarter < Semester < Year) (Kimball, 1997). The dimension is an analysis axis whereas a parameter of a hierarchy represents an analysis level. Hierarchy parameters enable aggregating the fact’s measures1 and DrillDown and RollUp OLAP operations.
Integration of facets in an OLAP model raises a set of specific problems for which the classical DW MM are not made for and, therefore do not expect solutions. As examples of such problems, we cite the recursion of a parameter within a given hierarchy and the multiple use of a same dimension within the same analysis. To alleviate the drawbacks of existing models, the CobWeb model brings a set of extensions namely i) the exclusion constraint inter-dimensions, which prohibits using a given couple of dimensions in the same analysis; ii) the recursive parameter as a multi-valued parameter which values are organized hierarchically; iii) the duplicated dimension, i.e., used twice in the same analytical query; and iv) the correlated dimension which enables the move between dimensions during the same analysis.
From the other hand, we have proposed a set of three operators for the visualization of results of OLAP queries on documents; these operators rely on the concept of Tag-clouds in order to help decision-makers to better see and interpret query.