Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Enhanced Frequent Itemsets Based on Topic Modeling in Information Filtering

Than Than Wai, Sint Sint Aung

Source Title: International Journal of Software Innovation (IJSI) 5(4)

DOI: 10.4018/IJSI.2017100103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In order to generate user's information needs from a collection of documents, many term-based and pattern-based approaches have been used in Information Filtering. In these approaches, the documents in the collection are all about one topic. However, user's interests can be diverse and the documents in the collection often involve multiple topics. Topic modeling is useful for the area of machine learning and text mining. It generates models to discover the hidden multiple topics in a collection of documents and each of these topics are presented by distribution of words. But its effectiveness in information filtering has not been so well explored. Patterns are always thought to be more discriminative than single terms for describing documents. The major challenge found in frequent pattern mining is a large number of result patterns. As the minimum threshold becomes lower, an exponentially large number of patterns are generated. To deal with the above mentioned limitations and problems, in this paper, a novel information filtering model, EFITM (Enhanced Frequent Itemsets based on Topic Model) model is proposed. Experimental results using the CRANFIELD dataset for the task of information filtering show that the proposed model outperforms over state-of-the-art models.

Article Preview

Top

Introduction

Information filtering is a system to remove redundant or unwanted information from an information or document stream based on document representations which represent user’s interests. The input data of IF is usually a collection of documents that a user is interested, which represent the user’s long-term interests often called the user’s profile. Term based approach, one of the IF model, is efficient in computational performance such as BM25, Racchio, etc (Beil et al., 2002; Robertson et al., 2004). But, term-based document representations suffer from the problems of polysemy and synonymy. To overcome the limitation of term based approach, pattern mining technique is used (Bastide et al., 2000; Cheng et al., 2007). Patterns carry more semantic meaning than term. Pattern mining algorithms depends on developing data mining algorithms to find out interesting, surprising and functional pattern in databases. Pattern mining algorithms can be applied on various types of data such as transactional databases, sequence databases, streams, spatial data, graphs, etc. The goal is to discover all patterns whose frequency in the basis dataset exceeds a user specified threshold. Database model filtering that helps you to create mining models that use subset of data in a mining structure. Pattern based topic filtering used to filter out the irrelevant document and gives relevant document from the collection of documents (Vishnu, 2016). The number of test cases are reduced in order to minimize the time and cost of executing them. Sever techniques can be used to reduce test cases such as information retrieval, data mining and pairwise testing. Data mining approach are used, mainly because of the ability of data mining to extract patterns of test cases that are invisible (Saifan et al., 2016). Some work (Néji et al., 2014) focuses on the problem of Information Retrieval System (IRS) that integrates the human emotion recognition to recognize the degree of satisfaction of the user for the result found through its facial expression, its physiological state, its gestures and its voice. They proposed an algorithm for recognizing the emotional state of a user during a search session in order to issue the relevant documents that the user need and also presented the architecture agent of the envisaged system and the organizational model. Topic modeling (Blei & Jordan, 2003; Blei &Wang, 2011; Croft &Wei, 2006) is one of the text modeling techniques. It can automatically classify documents into number of topics and represent every document with multiple topics and their corresponding distribution. Two representative approaches are PLSA (Hofmann, 1999) and LDA (Blei & Jordan, 2003). The topic model contains cluster of words with similar meanings and text, it contains different terms of topic modeling. It also includes model topics with taking into account time based on user interest model and it will cofound the topic discovery. Further it has been mentioned in some of the applications that have been in these methods (Vishnupriya, 2015). The comparison of different topic model features is essential to design a new proposal for information filtering based on user interest model. All of these models considers the time as a most vital factor. Directly applying topic models for IF, two problems are generated. Due to limited number of dimensions to represent documents, the two problems are occurred. First, topic distribution is insufficient. Second, represent documents in word based topic have different semantic content. To overcome these problem patterns enhanced LDA (Gao et al., 2015) is used. It carries more concrete and identifiable meaning than word based representations using LDA (Blei & Jordan, 2003). Number of patterns in some of the topic can be huge and many of the patterns are not distinguishing enough to represent specific topic. To deal with the problem (MPBTM) Maximum matched Pattern Based Topic Modeling is introduced. MPBTM (Gao et al., 2015) consists of topic distributions, describing topic preferences of documents or collection of documents and structured pattern based topic representation, representing semantic meaning of the topics in a document. But, the number of patterns in some of the topics can be huge to represent specific topics. The main distinctive features of the proposed model are as follows:

Complete Article List

Search this Journal:

Reset

Volume 12: 1 Issue (2024)

Volume 11: 1 Issue (2023)

Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 9: 4 Issues (2021)

Volume 8: 4 Issues (2020)

Volume 7: 4 Issues (2019)

Volume 6: 4 Issues (2018)

Volume 5: 4 Issues (2017)

Volume 4: 4 Issues (2016)

Volume 3: 4 Issues (2015)

Volume 2: 4 Issues (2014)

Volume 1: 4 Issues (2013)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Enhanced Frequent Itemsets Based on Topic Modeling in Information Filtering

Abstract

Introduction

Complete Article List