Article Preview
Top1. Introduction
Data mining techniques are useful to acquire enough knowledge to take certain decisions from the huge amounts of available data. Frequent item-sets mining (FIM) is one of the primitive data mining tasks (Agrawal, Imielinski, et al., (1993)), (Rakesh Agrawal and Ramakrishnan Srikant (1994)). It extracts the item-sets having a frequency of occurrence more than user specified cut-off from the transactional database. Most of the basic FIM algorithms are designed to work on binary transactional databases. They assume, each item in a transaction is of one unit quantity and are equally profitable. However, in real life, each item generates different profit (also called external utility) and often their purchase quantity (also called internal utility) is not the same. For example, the quantity and profit generated by the items like rice packet and gold ring are for away from each other (Dawar, Siddharth, et al., (2017)). Hence, the same kind of perception about all the items may not produce valuable results. The Utility item-sets mining (Fournier-Viger, Chun-Wei Lin, et al., (2019)) came into the picture by addressing these issues. Even though utility item-sets are more useful compared with frequent item-sets, UIM is hard and intractable. The reason is FIM methods support anti-monotone property (or downward closure property) over frequency of item-sets which prune search space effectively whereas utility item-sets do not hold the property (Krishnamoorthy, Srikumar, et al., (2015)). The property states that any superset of an infrequent item-set cannot be frequent (Tin Truong, Hai Duong, et al., (2019)).
Utility item-sets mining is created a space for new applications, those address the challenges in the areas of market basket analysis, web click stream analysis, wireless sensor networks, bio-medical data analysis, and stock market prediction (Liu, Junqiang, et al., (2016))(Reddy, Prasad, et al., (2017)). It is also been combined with other mining techniques like sequential pattern mining, episode pattern mining, stream mining, top-k high utility item-sets mining, high utility item-sets mining, high utility rare pattern mining, high average utility item-sets mining (Fournier Viger, Philippe & Lin, et al., (2017)), (Wensheng Gan, Jerry Chun-Wei Lin, et al., (2018)), (Truong-Chi, Fournier-Viger, (2019)), (Zhang, Chongsheng, et al., (2018)). This study aims to develop a high performance algorithm to retrieve high utility item-sets over data stream. It extracts item-sets having utility more than the user specified cutoff.