Article Preview
Top1. Introduction
There are various application areas of data analytics as well as machine learning where the data to be analyzed is organized in terms of sequence of events. It is useful to identify relationships between event occurrences hidden in database as it provides a good understanding of relations of events for prediction of the next event (Mannila et al., 1999). In data mining, one of the useful techniques for discovery of temporal relations between events in discrete time series is sequential pattern mining (Mannila et al., 1999; Agrawal & Srikant 1995; Pei et al., 2004). Sequential pattern mining discovers sequences of events that frequently appear in a sequence database. That is the subsequences which appear in sequence database having support greater than or equal to threshold value of support set by the user can be found using Sequential pattern mining (Mannila et al., 1999; Agrawal & Srikant 1995; Pei et al., 2004).
There are wide varieties of algorithms developed for mining standard sequential rules. CMRules (Fournier-Viger et al., 2012) is the algorithm that mines sequential rules common to several sequences in a sequence database. The algorithm is based on association rule mining and is very efficient. It can be used to find both sequential rules and association rules in a database.
For Partially Ordered Sequential Rules (POSR), RuleGrowth algorithm (Fournier-Viger et al., 2011) was used which utilizes pattern-growth approach to find POSR that are common to several sequences. RuleGrowth (Fournier-Viger et al., 2011) does not use the existing techniques of discovering candidate rules and then testing them. Rules are discovered in incremental fashion by RuleGrowth. The process of rule discovery starts with two items and then rules grow by scanning the database for expanding the left and right part of rule. TRuleGrowth algorithm (Fournier-Viger et al., 2015) takes an extra parameter window size compared to RuleGrowth (Fournier-Viger et al., 2011). TRuleGrowth algorithm (Fournieir-Viger et al., 2015) makes use of window size for discovering the rules that occur within the sliding window. Rules of size 1*1 are enforced by this constraint. Left and right side of the sequential rule is modified accordingly. This makes TRuleGrowth algorithm (Fournier-Viger et al., 2015) an extension of RuleGrowth (Fournier-Viger, 2011) which ensures that the constraint of sliding window is taken into the consideration while generating rules. Finding rules occurring in a sliding-window has several useful advantages. First is it can reduce the time required for execution by reducing the search space. Second is it can generate a much smaller set of sequential rules which minimizes the requirement of disk space for storing sequential rules generated and makes it easy to analyze results (Fournier-Viger, 2015).
Thus, the System for mining POSR is an extension of the TRuleGrowth algorithm (Fournier-Viger et al., 2015). It uses M_TRuleGrowth approach which is multithreaded version of the preprocessing part of existing TRuleGrowth algorithm (Fournier-Viger et al., 2015). This approach analyzes the input and applies the multithreading technique. Use of multithreading minimizes the time required for preprocessing and in turn the overall execution time. Then the sequential rules generated can be used for the decision making in applications such as e-commerce, stock market analysis, etc.