Article Preview
TopIntroduction
Top-k and skyline queries are two alternatives to pose preferences in query processing. In a top-k query a ranking function is required to associate a score to each item. The answer to the query is the set of k items with the best scores. A skyline query does not require a ranking function, and the result is based on preferences (minimization or maximization) posed in each attribute. The result is composed of all items that are not dominated. For the rest of the work we deal with multidimensional items, where each dimension corresponds to an attribute. Formally, a multidimensional item dominates another item when:
where
d is the number of dimensions. A top-
k dominating query may be seen as a combination of a top-
k and a skyline query. More specifically, a top-
k dominating query returns the
k items with the highest domination scores. The domination value of an item
p, denoted as
dom(
p), equals the number of items that
p dominates (Yiu & Mamoulis, 2007; Yiu & Mamoulis, 2009).
The maximum domination value is the number of items dominated by the top-1 (best) item. More formally, let us assign to each item t of the data set D a score, m(t), which equals the number of items that t dominates: Then, if p is the item with the maximum domination value we have:
An example is illustrated in Figure 1. A tourist wants to select the best hotel according to the attributes distance to the beach and price per night. The domination values of all hotels A, B, C, D, E, F, G, H, I, J are 0, 1, 0, 0, 2, 0, 4, 6, 2, 3 respectively, thus the hotel with the maximum domination value is H. This hotel is the best possible selection, whereas the next two best choices are hotels G and J.
In this work, we focus on estimating the maximum domination value and the skyline cardinality in multi-dimensional data under the dinstinct values and attribute independence assumptions. Estimating the maximum domination value and the skyline cardinality contributes in:
- •
Optimizing top-k dominating and skyline algorithms,
- •
Estimating the cost of top-k dominating and skyline queries,
- •
Developing pruning strategies for these queries and algorithms.
Moreover, we show that the maximum domination value is closely related to the cardinality of the skyline set.
A preliminary version of this study appears in (Tiakas et al., 2010) where we have presented the estimation methods for the maximum domination value only. The current version is more complete with the following new material:
- •
The proposed estimation method is further extended to provide efficient estimations for the skyline cardinality also,
- •
Additional experimental results are presented to study the accuracy of all estimation methods (including the proposed) for the skyline cardinality,
- •
A study for the eliminating dimension, i.e., the dimension after which there are no dominations between the items is presented.