Crowdsourced Document Summarization as a Support for Research Publication

Crowdsourced Document Summarization as a Support for Research Publication

Ouassila Askratni, Sihem Mostefai
Copyright: © 2022 |Pages: 24
DOI: 10.4018/IJOCI.313598
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Understanding large amounts of online research works is extremely complex and time-consuming nowadays. The European and scientific commissions have recently highlighted the importance of implementing a research data management system in higher education institutes which combines technical as well as organizational solutions. In this paper, a semi-automated crowdsourcing approach is proposed to engage humans in the summarization task of scientific data. Semantic wiki technology is used to enable self-organization of crowdworkers and collaboration between them. In order to deal with these problems, the analysis and summarization of research findings is a task that can be outsourced to a crowd of researchers. A scientific data ontology and a crowdsourcing approach are proposed. They represent the main aspects of a collaborating research community that exploits data crowdsourcing to derive state of the art and/or survey reports. It is designed to fit the requirements for a quality report reflecting a timely tracking of research progress in a given domain
Article Preview
Top

Introduction

Undertaking research in any domain is a challenging and engaging task since ever. The scientific approach to research and investigations requires a deep understanding of what already exists in a given domain, this is known as the survey of the literature or state-of-the-art phase, which should take place prior to research itself. This step is crucial to any research path since it guides researchers in their investigations and helps them target the main issues of a field of interest. It requires reading and understanding numerous papers, reports and other written resources to grab the essential ideas, findings and challenges that exist before deciding to target specific issues.

Due to the exponential growth of online published research papers, this problem is over emphasized. That is why surveying the literature is a very hard and time-consuming task. As reported in recent studies such as (Ronzano, F., & Saggion, H., 2016) the European Project Dr. Inventor (DRI) and (Xu, Y et al., 2020), the number of papers published on the Internet is estimated at about one paper every 20 seconds in 2015. This number has grown to more than one new article every 13 seconds. One of the most important citation and abstract databases is Elsevier’s Scopus. It covers 22.000 peer reviewed journals, with approximately 3 million new items added each year (Rob Johnson et al., 2018). In the last few years (2021), Scopus covered nearly 36,377 titles and published more than 500,000 articles annually in 2,500 journals.

The European Commission as well as scientific commissions have recently highlighted the importance of implementing a research data management system (RDMS) in higher education institutes (HEI) which combines technical and organizational solutions. (Donner, 2022).

Scientific journals represent basic channels of scientific communication. Therefore, the growing pressure to provide access to research data raises questions about research data management strategy in publication groups. (Jachimczyk, 2020).

Progress in science today needs more efforts and competencies and requires the use of sophisticated methods that are necessary to fully understand all the experimental facts and models of large-scale scientific endeavors (Aberer et al., 2012). In order to have a clear idea about research in a specific domain, a new investigator can spend a lot of time in collecting and analyzing existing works. He can lose the best path of progress in a certain research axis. Thus, the reduction of the time and effort spent in the state-of-the-art phase of a scientific process will accelerate and augment the research performance of scientific research communities. To achieve this goal, a crowdsourcing approach is proposed for collecting, annotating and structuring data existing in PDF resource papers. This data will be structured in semantic datasets (rdf triples) that will facilitate the reusability and reproducibility of the published material.

This paper describes a collaborative approach using semantic wikis and crowdsourcing techniques for easing the state of the art analysis phase of a research process by using ontologies and semantic web technology. The data collection and summarization task can be outsourced to a large online research community. The topics of a state-of-the-art report are organized in a set of topic tasks and subtasks. As a result, a SAReport (State-of-the-Art Report) ontology is created. It will be correlated to a set of reports organized and published on a semantic wiki platform. The proposed approach prepares an open collaborative environment for the scientific community that is engaged in semantic publishing of scientific RDF datasets. Such a system would be very useful for PhD and master students to comprehend, compare and analyze existing works; for reviewers that need efficient tools to evaluate the huge amount of research works published online and for researchers that can reuse this extracted knowledge to create new research works.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing