Article Preview
TopIntroduction
Link prediction is outlined as a task of predicting whether any pair of nodes is likely to form links in the future by Lü and Zhou (2011). Recently, it has pulled in expanding consideration from both machine learning communities and data mining as given in Liben‐Nowell and Kleinberg (2007). It has an extensive variety of applications in the field of recommendation systems, e-commerce, bibliography, bioinformatics, world wide web, etc. In an e-commerce domain, the most well-known usages of link prediction is in the construction of recommendation systems (Talasu et al., 2017). In the bibliography, link prediction is mainly used for record linkage and de-duplication according to Al Hasan et al. (2011). In case of bioinformatics, link prediction is used in predicting protein to protein interaction (Lü, & Zhou, 2011) (Xu, & Guan, 2014). Besides all the above applications, it has also been used in security purposed such as identifying a hidden group of radicals and criminal activities (Ozgul, Erdem, & Bowerman, 2009).
In the early stage, similarity scores obtained from different heuristic methods are used for predicting the links between the nodes in the network (Lü & Zhou, 2011), (Liben‐Nowell, & Kleinberg, 2007). A heuristic score is assigned to each unconnected pair of nodes to measure the proximity of connection between the two nodes. Heuristic methods range from simple to more complex. Some of the simplest heuristics include common-neighbors (Liben‐Nowell et al., 2007), Katz index (Katz, 1953), Adamic-Adar (Adamic et al., 2003), etc. Such heuristics work well in social network and is more scalable and interpretable. For heuristics, which lies in mid-level complexities include methods which used network topologies to calculate proximity scores. Later, many sophisticated and complex models have been developed, such as Matrix factorization (Salakhutdinov, & Mnih, 2008, July) and stochastic block model (Airoldi et al., 2008) which utilizes maximum likelihood estimation approaches.
However, in the last decade, an additional interest has arisen for researchers, to determine which method is most suitable for a specific situation. These heuristics are not universally suitable for different kinds of networks. For instance, the simplest heuristic common neighbor is more suitable in a social network for predicting friendships and it is also used in the collaboration network for prediction of co-authorship. This simplest heuristic has also a drawback, as it gives poor performance for networks such as biological networks and electrical networks (Lü & Zhou, 2011). On the contrary, the resistance distance (RD) gives a great performance in electrical and internet router networks (Klein & Randić, 1993). But the inverse happens in case of social networks. From the survey of over 20 different heuristics, it can be concluded that none of the heuristics perform well on all the different types of networks (Lü & Zhou, 2011). The existing heuristics used the topological features of the network. So, from the local patterns of each pair of nodes, one can automatically determine the possibility of formation of links. As the existing heuristics are not applicable across all the networks, the only resolution to find, which heuristics is suitable for a particular network is by using trial and error or expert selection (Brodhead, 2018). Recently, Zhang and Chen (2017) defines Weisfeiler-Lehman (WL) algorithm with a combination of the neural machine as a predictive model. WL algorithm is generally based on graph theory and modified to the palette-WL algorithm, which has order preserving facility and provides better computational efficiency than WL. So, we used palette-WL graph labeling algorithm in this paper to extract the features from the several real complex networks.