Role of Data Mining Techniques in Bioinformatics

Role of Data Mining Techniques in Bioinformatics

Pushpa Singh, Narendra Singh
Copyright: © 2021 |Pages: 10
DOI: 10.4018/IJARB.2021010106
(Individual Articles)
No Current Special Offers


Data mining offers a highly effective technique that is useful in research and development of bioinformatics. Bioinformatics consists biological information such as DNA, RNA, and protein. Data mining tasks/techniques are classification, prediction, clustering, association, outlier detection, regression, and pattern tracking. Data mining provides important correlation, hidden patterns, and knowledge from the bioinformatics data set. This paper presents the role of data mining techniques in bioinformatics application. Classification of gene and protein structure, analyzing the gene expression, association of co-disease, outlier detection and gene selection, protein structure prediction, and drug discovery are some typical biological example that has proven data mining as a suitable technique for bioinformatics.
Article Preview


Bioinformatics is the integration of biology, mathematics, statistics, medicines, information technology, and computer science. Bioinformatics is the skill of storing, retrieving and analyzing huge amounts of biological information such as DNA, RNA, and Proteins etc. (Bayat, 2002). Recent technological advancement permits the biologists to produce huge volumes of data ranging from measurements of DNA database, Protein sequence, protein structure database, Phenotype database and Genomic sequence database etc. Bioinformatics holds great potential of analysis in the different areas like genome, proteomics, drug discovery and development, protein structure, cell biology, molecular modelling, gene expression (Khan, 2018) etc. as represented in figure 1. one can analysis and extract valuable pattern in gene expression, classify protein structure, gene prediction, gene identification, diagnosing different types of disease (cancer etc.) on which genes are expressed etc. Data Mining offers capability to analysis of bioinformatics data, and useful to pattern identification, classification, prediction and genetic network induction (Mabu, 2018).

Figure 1.

Bioinformatics areas


In today’s world, data is the base for everything, if it is analyzeand extracted properly. In bioinformatics various types of data is available for mining as shown in figure 2.

Figure 2.

Types of data in bioinformatics


DNA: It’s the genetic code that determines all the characteristics of a living thing. DNA is heridatry material means child got his DNA from his parents. Smaller units of DNA are called as nucleotides. Each nucleotide entails three part nitrogen, sugar (ribose) and phosphate. There are four type of nitrogen bases are adenine (A), thymine (T), guanine (G) and cytosine (C). The order of these bases governs the genetic code (Dua & Chowriappa, 2012).

Proteins: Proteins are huge, complex molecules that very significant for the body. Protein consists twenty different amino acids. Sequence of these amino acids regulates each protein’s unique 3D structure and its precise function.

Gene: A gene is a segment of DNA that buildup of a sequence of As, Cs, Ts and Gs in a particular order. Human genes differ in size ranges from few hundred bases to million bases.

Genome: Complete set of genes of an organism.

Data mining techniques can be useful to identify correlation, pattern and knowledge discovery from bioinformatics datasets. Data mining denotes to digging or “mining” knowledge from vast amounts of data. Data mining techniques discover important pattern, hidden information available from data set. Data mining techniques is successfully applied in diverse domains like retail, e-business, marketing, health care, research etc. Bioinformatics is not exceptional in this line. Actually, domain that is leveraging with rich set of data is the best candidate for data mining. Hence, there is a great potential to enhance the communication between data mining techniques and bioinformatics (Hashemi et al., 2018).

There are various challenges in bioinformatics like classification of proteins, gene etc, and association between co-diseases. Data mining techniques are useful to overcome these challenges and added new insights to finding knowledge and pattern in biological data base.

In this paper author highlights role of data mining techniques in bioinformatics. The remainder of this paper has outlined as follows. Section 2, introduced the challenges involved in the field of bioinformatics. Section 3 provides the different data mining task in bioinformatics. An application of data mining in disease prediction is represented in section 4 and section 5 concludes the paper.

Complete Article List

Search this Journal:
Volume 13: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 12: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 11: 2 Issues (2021)
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
View Complete Journal Contents Listing