Efficient Implementations for UWEP Incremental Frequent Itemset Mining Algorithm

Efficient Implementations for UWEP Incremental Frequent Itemset Mining Algorithm

Mehmet Bicer, Daniel Indictor, Ryan Yang, Xiaowen Zhang
Copyright: © 2021 |Pages: 20
DOI: 10.4018/IJAL.2021010102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Association rule mining is a common technique used in discovering interesting frequent patterns in data acquired in various application domains. The search space combinatorically explodes as the size of the data increases. Furthermore, the introduction of new data can invalidate old frequent patterns and introduce new ones. Hence, while finding the association rules efficiently is an important problem, maintaining and updating them is also crucial. Several algorithms have been introduced to find the association rules efficiently. One of them is Apriori. There are also algorithms written to update or maintain the existing association rules. Update with early pruning (UWEP) is one such algorithm. In this paper, the authors propose that in certain conditions it is preferable to use an incremental algorithm as opposed to the classic Apriori algorithm. They also propose new implementation techniques and improvements to the original UWEP paper in an algorithm we call UWEP2. These include the use of memorization and lazy evaluation to reduce scans of the dataset.
Article Preview
Top

1. Introduction

Association rule mining (ARM) is a common technique used in discovering interesting frequent patterns in data acquired in various application domains. There are many algorithms (Agrawal et al., 1993; Agrawal & Srikant, 1994; Borgelt, 2003) proposed to deal with this problem. One of the classic algorithms is Apriori (Agrawal & Srikant, 1994), which is an iterative, breadth-first, level-wise algorithm with join steps and candidate generation. Apriori requires a full scan of the database at each level. Given that the set of k-itemsets can be grouped into k levels, we would need to scan the database k times.

In ARM, the search space combinatorically explodes as the size of the data increases. If the data is static, that is if it never gets updated, running one of the classical algorithms, such as Apriori, will suffice to discover the association rules. But in the real world, new transactions are added to existing databases every day. This addition may invalidate old association rules and create new ones. While finding the association rules on a fresh dataset efficiently is an important problem, maintaining and updating them is also crucial, as it allows the researchers to avoid rerunning the Apriori for the entire dataset to maintain the association rules each time the database gets updated.

Incremental algorithms were created to deal with periodic or continuous data added to the transaction databases (Cheung et al., 1996; Cheung et al., 1997; Omiecinski & Savasere, 1998; Sarda & Srinivas, 1998). The advantage of incremental algorithms is that they allow the user to avoid reading and analyzing the same dataset multiple times by saving select characteristics of it for future runs. Conventional algorithms like Apriori require the entire dataset to be reevaluated when it is updated. One incremental algorithm is Update with Early Pruning (UWEP) (Ayan et al., 1999). As the database scan is one of the major costly operations in finding frequent itemsets, decreasing the amount of reading can substantially improve runtime performance. UWEP reads the existing database at most once and the new database exactly once, while creating and counting a minimal number of candidate itemsets (Ayan et al., 1999). In addition, the UWEP algorithm has mechanisms to prune infrequent itemsets as soon as they are identified. Apriori, by contrast, waits for the appropriate level to prune infrequent itemsets.

The rest of the article is organized as follows. In Section 2, we will describe the association rule mining and incremental association rule mining . In Section 3 we will describe Apriori and Apriori-based incremental algorithms FUP and FUP2. In Section problem 4 we will explain UWEP algorithm in detail. In Section 5 we describe our algorithm, UWEP2. In Section 6 we will explain the experimental results and the datasets used. We conclude the article in Section 7.

Table 1.
An example database
  TID  Items_purchased
  001  1, 3, 5
  002  2, 3, 5
  003  1, 4, 5
  004  1, 3
  005  2, 3, 4, 5
Table 2.
Support count and relative support (support) of 1-itemsets
  Itemset  Support_count
σ(itemset)
  Support %
σ(X)/N
  1  3  60
  2  2  40
  3  4  80
  4  2  40
  5  4  80

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 11: 2 Issues (2021)
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing