Volunteer Data Warehouse: State of the Art

Volunteer Data Warehouse: State of the Art

Amir Sakka, Sandro Bimonte, Francois Pinet, Lucile Sautot
Copyright: © 2021 |Pages: 21
DOI: 10.4018/IJDWM.2021070101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.
Article Preview
Top

1. Introduction

Crowd science (i.e., citizen science or volunteer science) has been defined as “online, distributed problem-solving and production model” (Brabham et al., 2008). Crowdsourcing systems are more and more common thanks to the democratization of new acquisition systems such as sensors on smartphones, social networks, etc. This leads to the advent of many citizen observatories in different domains (for example agriculture, urban, environment, etc.). However, when volunteers (users of the observatories) have not a direct return on investment about their data collection tasks, they usually quickly abandon the project. Gratification for volunteers can be economical as an interesting service provided to them (as in the context of self-data) (for example for the traffic jam and itinerary planning), or it could be seen as a hobby (such as naturalist applications). In some other cases, they can participate in the observatory if they are fully engaged with the main mission of the observatory for personal reasons (such as climate changing). In this case, it is important to provide volunteers with results about their collected data, which are really understandable and fitting with their wishes. Therefore, there is a need to change the classical “bottom-up” approach of citizen science, which defines volunteers as active data producers and passive analysis consumers, to a “top-down” approach where volunteers also play an active role in the definition of what and how to analyze the collected data.

In some recent works, Business Intelligence (BI) technologies have been used to analyze crowdsourced data (Bimonte et al., 2014). Among BI technologies, Data Warehouse (DW) and OLAP systems allow the exploration and analysis of huge volumes of data using simple visual analytical tools (Kimball, 2013). Warehoused data is stored according to the multidimensional model that defines the concepts of facts (i.e. analysis subject) and dimensions (i.e. analysis axes). Facts are described by numerical attributes (called measures) that are aggregated along dimensions hierarchies using classical SQL aggregation functions. OLAP systems implement some operators (such as Drill-down, Roll-up, Slice, etc.) to explore warehoused data by means of user-friendly visual analytics interfaces composed of interactive pivot tables and graphical displays.

In the context of DW, existing works do not support the “top-down” approach for citizen science as described above. Warehoused data is collected by volunteers, and then analyzed by few experts. Therefore, Volunteer Data Warehouse has been recently proposed (Bimonte et al., 2019; Sakka et al., 2019), where volunteers also participate in the design of DW and OLAP applications, and to the data analysis. Indeed, commonly DW implementation approach consists of three main phases: (i) requirement elicitation (i.e. the collection of analysis needs from decision-makers), (ii) DW design (i.e. definition of the DW schema according to data sources and requirement elicitation phase), and (iii) Extraction Transformation Loading (ETL) (i.e. moving data from data source to the DW).

In existing work, correctly handling requirements issued from decision-makers in DW projects is a crucial step since the more the DW implementation reflects their analysis needs, the more the DW projects are successful. (Golfarelli et al.) points out the main problems of the successful DW implementation process also state this. Authors identify three main problem classes:

- “Uncertain and changing requirements”, which deals with the difficulties related to:

  • requirements can be unclear and ambiguous among decision-makers of the same organization, and

  • well understanding and translating decision-makers analysis needs in DW schemata. This could be due to the complexity of the application domain, the difficulties to make decision-makers and DW experts to exchange, and the vague and/or changing vision of needed requirements by the decision-makers.

    • -

      Linear approach to design”, which refers to the rigid temporal organization of the implementation project that does not prioritize any DW schema,

    • -

      Design complexity” that is usually founded with real-life applications when dealing with data quality, data integration issues, etc.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing