Disease Interactome: An Assessment Case Study Based on Analysis and Measures to Predict Secondary Diseases

Disease Interactome: An Assessment Case Study Based on Analysis and Measures to Predict Secondary Diseases

Suma Dawn, Nidhi Jain, Tulika Gangwar
DOI: 10.4018/978-1-7998-4414-3.ch005
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

The disease interactome is a network of genes that are related to each other through some attributes. These genes, being part of various diseases, show a high correlation among many diseases. Genes being a major part of the interactome thus can be used to determine the relationship between various diseases, their symptoms, clinical similarity, and co-morbidity. Subgraphs and similarity factors such as Jaccardian distance, cosine similarities, and others have been exploited to calculate the relationship between two or more diseases. Many diseases that did not show much resemblance on the basis of gene similarity or symptom similarity were seen to be closely related according to network interactome. The quantitative analysis between disease-disease was also done. Clustering algorithms like hierarchical clustering involving single, complete, and average linkage were applied to get a visual representation in the form of a dendrogram. Thus, disease-disease interactome was created, analyzed for finding related secondary diseases, and their basic nature was understood.
Chapter Preview
Top

Introduction

A disease network interactome can be explained to be a network of genes interrelated to each other through some attribute, in our case, there are seven attributes namely regulatory, binary, literature, metabolic, kinase, and complexes and signaling. Any kind of relationship between any two genes leads to an addition of edge in the network and thus the increase in the completion of the interactome. Ongoing research is being sought and promoted aggressively, however, due to the incompleteness of the interactome and limited knowledge of disease-gene association prediction mapping of available data is challenging.

The disease-gene network modules of each disease are interrelated to each other in the form of subgraphs and thus have certain distance factor associated to them which can be exploited to calculate the mathematical relationship between two or more diseases. Mathematical conditions to identify the similarity between any two diseases on the basis of network interactome using shortest distance thus can be determined. Measures such as Jaccardian distance, cosine similarities, amongst others, have been used to determine the similarity between disease-gene associations. A prediction can also be made as to which diseases are strongly correlated and thus are likely to happen along with a primary disease. A quantitative comparative study of disease similarity or dissimilarity can thus be achieved along with the classification of similar or dissimilar diseases. Furthermore, the diseases can be clustered into groups using hierarchical clustering and visual representation in the form of plots has also been prepared. The interactome comprising of 13460 genes and 141296 connections have also been plotted.

The disease interactome is a network of genes which are related to each other through some attribute. These genes, being part of various diseases, show a high correlation among the diseases. Thus, the purpose of the project is to understand the gene-gene interactome and find a disease – disease relationship so as to define the quantified relationship between any two pair of diseases and to predict the secondary disease associated to the given primary disease which is much likely to happen together. Thus, the main objectives of the project are to

  • Quantify the similarity between the diseases.

  • Find distance measures between all diseases and the most similar secondary disease which is much likely to happen along with the primary one.

  • Plot a disease-disease interactome.

  • Determine distance matrix between diseases on the basis of network separation which can be further used for research work.

  • Cluster diseases on the basis of the distance matrix thus received.

  • Use the dataset prepared to calculate a distance matrix representing number of mutations required by a gene to convert into another gene.

  • Plot a mutation path between these genes.

Based on experimental results, it was found that there is a close relationship between diseases that could be determined using network-based distances and similarity measures. Many diseases that did not show much resemblance on the basis of gene similarity or symptom similarity were seen to be closely related according to network interactome. Thus a prediction could be done as to whether the diseases are interrelated or not and if they are what is the extent to which they are related. Comorbidity could even be predicted and a similarity matrix could thus be calculated. Clustering can also be implemented on the basis of the similarity calculated above.

To quantify the similarity between the diseases, network-based distance measures such as Jaccard distance, Cosine Similarity, and Relative Risk Ratio were applied. A distance matrix was then made for all diseases available in the dataset signifying the similarity between the diseases on the basis of these mentioned methods. The related secondary diseases were determined with the help of similarity matrices. The disease exhibiting minimum distance or maximum similarity can be inferred to be the secondary disease. Further, the distance matrix so calculated is also used to implement hierarchical clustering algorithms like single linkage, complete linkage, and average linkage. The complexity to calculate the network-based similarity between the diseases was reduced from n^4 to n^2 by applying the concept of dynamic programming’s memorization method.

Complete Chapter List

Search this Book:
Reset