Distributed Frequent Subgraph Mining Using Gaston and MapReduce

Distributed Frequent Subgraph Mining Using Gaston and MapReduce

Jagannadha Rao D. B.
Copyright: © 2021 |Pages: 18
DOI: 10.4018/IJSWIS.2021040103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This paper addresses this issue and devises a new method for frequent subgraph mining in order to retrieve the valuable information from the database that captured the attention of the users. This paper proposes the recurrent-Gaston (R-Gaston) algorithm for the frequent subgraph mining process by enhancing the existing Gaston algorithm. Moreover, the method uses support measures based on the frequency and page duration parameters in order to define the support for the proposed R-Gaston algorithm. The simulation of the proposed R-Gaston is carried out using the weblog and the MSNBC databases. The proposed R-Gaston has attained values of number of structures mined and the execution time as 184, and 1282ms for the MSNBC database, with 60 and 75ms for the weblog database, respectively.
Article Preview
Top

1. Introduction

Graph mining plays an important role in the data analysis, but move violently as the size of the input graph increases. (Purohit, et al., 2017). It is applicable in several areas, such as bioinformatics, computer networks, chemical reactions, social networks, and program flow structures (Kumar & Gupta, 2017). For the graph with edges, the possible frequent subgraph increases in an exponential manner. The subgraph isomorphism is NP-complete problem since it becomes decisive to reduce the number of subgraph, which must be considered for determining the frequency counts. The mining of frequent structures from a graph set is considered as a major issue with applications, like XML documents, chemical and biological data, web link data, and protein folding data. The major issue in mining the graph database is the huge number of recurring patterns (Almethen & Che, 2015).

Extraction of frequent sub-graphs using a large data graphs is a primary task in several information mining applications. Most appropriate and distinct subgraph is produced using weighted frequent subgraph mining (FSM) (Kumar & Gupta, 2017). The number of frequent subgraph minimizes the efficiency and effectiveness of the mining. The FSM is a core graph operation utilized in different domains, like knowledge exploration, graph data management, bioinformatics, and security (Almethen & Che, 2015). Most of the conventional techniques aim to use static graphs, but the advanced applications, like social networks use huge evolving graphs. The mining of these graphs using the conventional methods is not reliable due to high cost of computation (Abdelhamid, et al., 2016).

The FSM is major task and adapted in many domains, like social network analysis, web mining, biology, and chemical structure analysis (Kavitha, et al., 2013). Numerous algorithms are devised for solving the issues of FSM. These algorithms consider that the structure of the data for mining is much smaller to fit in the computer memory. However, the real-world graph data increase in quantity and size. Several graph database-centric methods are devised to solve the FSM, but the distributed solutions based on MapReduce paradigm are not yet solved (Saha & Hasan, 2014). As the MapReduce poses huge paradigm, it is difficult to compute the massive data and thus, the design of an effective FSM algorithm is essential (Bhuiyan & Hasan, 2014).

The primary intention of this paper is to develop a method for mining frequent subgraphs, which has achieved more attention from the users. The proposed method is R-Gaston, which is designed by incorporating the existing Gaston algorithm with the recurrent support measure. As a result, two support measures are devised namely recurrent support measure, and aggregate recurrent support measure for retrieving the frequent subgraphs. The proposed R-Gaston algorithm along with the support measures help to determine the frequent subgraphs using the transaction database.

The major contributions of the research work are mentioned below:

  • Design R-Gaston algorithm by enhancing the recurrent support provided in the existing Gaston algorithm for mining frequent subgraphs.

  • Although, the GASTON method is effective, it failed to be applied in the distributed framework. The proposed method overcomes the flaw of the GASTON method by outperforming in the distributed framework.

  • Even though, several FSM techniques have been developed, they lack algorithmic tools to count the subgraphs present in the extensive network. The proposed R-Gaston algorithm effectively performs the subgraph mining even if the size of the network is very large.

The organization of the paper is done in following manner: Section 1 gives the introductory part of the subgraph mining, Section 2 elaborates the survey of eight existing techniques along with the challenges, Section 3 provides the illustration of proposed R-Gaston algorithm for mining frequent subgraphs, and Section 4 deliberates the results of the comparative methods, and section 5 reveals the short summary of the proposed work.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing