C-HUIM: A Novel Framework for Customized High-Utility Itemset Mining

C-HUIM: A Novel Framework for Customized High-Utility Itemset Mining

Sandipkumar Chandrakant Sagare, Dattatraya Vishnu Kodavade
Copyright: © 2022 |Pages: 11
DOI: 10.4018/IJSI.307015
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

High-utility itemset mining is one of the highly researched area. Many research enthusiasts have discovered various techniques and algorithms to mine high-utility itemsets from transaction databases. One of the limitations of the existing high-utility itemset mining techniques is that there is no any generalized framework for applying the custom combinations of input parameters and any other constraints for mining high utility itemsets. This paper proposes a novel customizable framework to discover customized high utility itemsets (C-HUI). Users can customize the constraints and/or input parameters as per their requirements. A novel C-HUIM algorithm is used to discover customized high utility itemsets (C-HUI) from real-life datasets. The experimental results of the proposed framework and C-HUIM algorithm highlight the effectiveness of the approach.
Article Preview
Top

1. Introduction

Pattern Mining is one of the most popular areas of research now-a-days. It discovers the patterns of data from the input datasets of various kinds. Businesses need the accurate patterns for their decision making for growth of their businesses. Further the machine learning algorithms can be applied on extracted patterns out of datasets to learn how that pattern is going to give predictions and help in making decisions for growth of the business.

High-utility pattern mining is a sub-domain under pattern mining which deals with mining of high-profit or high-utility patterns from the given dataset. Also, High-utility itemset mining is specifically concerned with extraction of itemsets which possess higher profit (or utility). One of the famous data mining tasks is the frequent Itemset mining (FIM) (Agrawal & Srikant, 1994), (Han et al., 2004), (Uno et al., 2004), (Farzanyar et al., 2012), (Fournier-Viger, Lin, Vo et al, 2017), (Fournier-Viger, Lin, Kiran et al, 2017) and (Fournier-Viger et al., 2018). Processes for finding frequent itemsets usually assume a threshold for comparing it with the support of Itemset. The itemsets qualifying after the comparison are termed frequent itemsets. During this process of discovering frequent itemsets, all items in the transaction database are considered to be equally important and also, they can appear at most once per transaction. Three notable limitations of frequent Itemset mining are as follows. First, quantities of items purchased in the transactions are neglected. So, purchasing some units of an item is considered equal as purchasing a single unit. Second,

All items in transactions are given equal importance. But in reality, several items can be more important to users. Third, frequent itemsets may not be more interesting to users. E.g. In the market basket analysis, profit gained may be more important than the frequency of selling.

These limitations of frequent Itemset mining algorithms are addressed by High-Utility Itemset Mining (HUIM) (Liu et al., 2005) to (Fournier-Viger et al., 2019). It involves discovery of itemsets that possess a high utility in the transactional databases. A High Utility Itemset (HUI) is the one having utility not less than minimum threshold utility specified by user. High-Utility Itemset Mining is considered as a complex problem than frequent Itemset mining because of the utility concept which does not possess anti-monotonicity property, that is, the utility of supersets of an itemset may be equal or unequal(smaller/greater) with its utility. So, existing techniques for frequent Itemset mining can’t be directly applied for High-Utility Itemset Mining. To reduce the search space, algorithms like Two-Phase (Liu et al., 2005) compute upper-bounds on the measure of utility, which possess anti-monotonicity.

This research work concentrates on designing a framework consisting of an algorithm for mining Correlated Time Constrained High Utility Itemsets (CTC-HUI) and implementing it on real-life dataset. It involves discovery of itemsets that show a utility that is greater than or equal to a user-specified threshold value during one or more periods of time with a minimum time length and also the itemsets to be discovered are required to satisfy correlation threshold given by user (Fournier-Viger et al., 2016). This framework allows finding the useful patterns e.g. {firecracker} represents a higher profit during the Diwali festival season, but it is not a High Utility Itemset in the whole transaction database or in predefined time periods such as winter or summer. Also, {firecracker, TV} can be a high-utility itemset but items firecracker and TV are not correlated so, it will not be considered as correlated time constrained high-utility itemset. This research proposes to discover CTC-HUIs. It uses a suitable data structure, and makes use of the search procedure to discover CTC-HUIs.

After the section 1 that discussed the introduction part, this paper is further organized to include following sections. Section 2 discusses literature survey. Section 3 introduces problem statements and objectives. Section 4 explains the proposed methodology in detail including the architecture. Section 5 explains the proposed algorithms and data collection. Section 6 presents the experimental evaluation. Lastly, Section 7 discusses the conclusion and future work to summarize the findings and provide direction for further research.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing