Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Hierarchical Interpretable Topical Embeddings for Exploratory Search and Real-Time Document Tracking

Anastasia Ianina, Konstantin Vorontsov

Source Title: International Journal of Embedded and Real-Time Communication Systems (IJERTCS) 11(4)

DOI: 10.4018/IJERTCS.2020100107

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Real-time monitoring of scientific papers and technological news requires fast processing of complicated search demands motivated by thematically relevant information acquisition. For this case, the authors develop an exploratory search engine based on probabilistic hierarchical topic modeling. Topic model gives a low dimensional sparse interpretable vector representation (topical embedding) of a text, which is used for ranking documents by their similarity to the query. They explore several ways of comparing topical vectors including searching with thematically homogeneous text segments. Topical hierarchies are built using the regularized EM-algorithm from BigARTM project. The topic-based search achieves better precision and recall than other approaches (TF-IDF, fastText, LSTM, BERT) and even human assessors who spend up to an hour to complete the same search task. They also discover that blending hierarchical topic vectors with neural pretrained embeddings is a promising way of enriching both models that helps to get precision and recall higher than 90%.

Article Preview

Top

Introduction

A fast and high-quality retrieval of relevant scientific and technological information becomes an important task in the era of new global challenges, such as a pandemic. The real-time monitoring of domain-oriented papers and news is impossible without fast processing of complicated search queries in order to detect semantically similar text documents without asking the user to formulate new queries. To navigate through a large amount of data query-document matching is not enough for acquiring the full picture of the problem domain which brings us to the idea of switching from known-item to exploratory search.

Exploratory search is a relatively new paradigm in information retrieval. It focuses on learning activities such as understanding new concepts and knowledge acquisition, investigation and analysis (Marchionini, 2006; White & Roth, 2009). Exploratory search setup implies that there is no exact query and unique result of search: a user may not be familiar with the terminology to google with or have no clear road map of the search domain. Current search systems aim to satisfy the needs of known-item search, but solving exploratory search problems using them may require much effort. A user has to formulate many short queries iteratively, gradually expanding the search domain by repeated steps of querying, browsing search results, and refining the query. The described explorative search demands may be fulfilled by completely different approaches to information seeking. Instead of conventional “googling” with a precisely formulated short text query, we use long text search queries. A document, a set of documents, or a document fragment may play a role of the query. Due to significant differences between exploratory and known-item search, standard Learning to Rank (Liu, 2009) techniques cannot be applied here. Besides, we focus on document-by-document search in which both query and documents are long texts.

We present an exploratory search approach based on probabilistic topic modeling (Blei, 2012; Blei, Ng, & Jordan, 2003; Hofmann, 1999). A probabilistic topic model extracts a set of latent topics from a collection of text documents. It represents each document with a vector of a discrete probability distribution over topics also called a topical embedding. We search for semantically similar documents by simply comparing the vectors of query and documents topical embeddings. This approach is similar to standard full text search based on inverted index with the exception that topics take the place of words. In this work, we are focusing on hierarchical multimodal topical embeddings. The hierarchy induces a cascade search, which starts with a search for generalized topics from low-dimensional vectors, then proceeds to search for more specific topics from higher-dimensional vectors. In experiments, we show that cascading increases both precision and recall of the search.

To get desirable topical representation of documents the topics should also be well interpretable and significantly different from each other. In order to combine these requirements with hierarchy and modalities we use additive regularization for topic modeling (ARTM) (Vorontsov, & Potapenko, 2015). As for technical implementation, we use an effective parallel implementation of the online EM-algorithm from open-source library BigARTM (Frei, & Apishev, 2016).

Compared to the previous work (Ianina, Golitsyn, & Vorontsov, 2017; Ianina, & Vorontsov, 2019), in this paper we continue to explore topical hierarchy and take a step further to merge topical embeddings with neural approaches. Thus, we create models that merge pretrained transformer-based representations and LSTM-based embeddings together with topical vectors and show the effectiveness of such a combination in terms of precision and recall of the search. Furthermore, we expand the experimental design by testing more search setups and more ways to compare topical embeddings. Also, we are moving from the conventional paradigm of document-by-document search and develop the segmentation-based search which divides query and document into thematically uniform text pieces and then compares all the text blocks to each other in order to get more accurate ranking.

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 14: 1 Issue (2023)

Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 12: 4 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 2 Issues (2018)

Volume 8: 2 Issues (2017)

Volume 7: 2 Issues (2016)

Volume 6: 2 Issues (2015)

Volume 5: 4 Issues (2014)

Volume 4: 4 Issues (2013)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Hierarchical Interpretable Topical Embeddings for Exploratory Search and Real-Time Document Tracking

Abstract

Introduction

Complete Article List