Combining BPSO and ELM Models for Inferring Novel lncRNA-Disease Associations

Combining BPSO and ELM Models for Inferring Novel lncRNA-Disease Associations

Wenqing Yang, Xianghan Zheng, QiongXia Huang, Yu Liu, Yimi Chen, ZhiGang Song
Copyright: © 2023 |Pages: 18
DOI: 10.4018/IJDWM.317092
Article PDF Download
Open access articles are freely available for download

Abstract

It has been widely known that long non-coding RNA (lncRNA) plays an important role in gene expression and regulation. However, due to a few characteristics of lncRNA (e.g., huge amounts of data, high dimension, lack of noted samples, etc.), identifying key lncRNA closely related to specific disease is nearly impossible. In this paper, the authors propose a computational method to predict key lncRNA closely related to its corresponding disease. The proposed solution implements a BPSO based intelligent algorithm to select possible optimal lncRNA subset, and then uses ML-ELM based deep learning model to evaluate each lncRNA subset. After that, wrapper feature extraction method is used to select lncRNAs, which are closely related to the pathophysiology of disease from massive data. Experimentation on three typical open datasets proves the feasibility and efficiency of our proposed solution. This proposed solution achieves above 93% accuracy, the best ever.
Article Preview
Top

Introduction

Bioinformatics research of Long non-coding RNAs (lncRNAs) has attracted much attention in academia and industry because of the important role of gene expression in the genome. lncRNAs are defined as transcripts larger than 200nt in length with limited protein-coding potential. LncRNAs cover a large part of the non-coding information of the human DNA, representing over 90% of the whole genome. Furthermore, recent studies showed that lncRNAs are involved in the pathophysiology in various ways, e.g., gene expression, transcription, and post-translational processing.

The initial lncRNA bioinformatics research mainly focuses on sequence acquisition and data collection, e.g., the functionalities to collect and annotate lncRNAs. However, with the deepening understanding of the datasets, more and more research has been transferred to data analysis and application. For instance, via the hypothesis of “Expression-related genes have a relevant function,” “Interacting molecules have a relevant function,” it is possible to evaluate the similarity between different lncRNAs and thus predict the relationship between lncRNA and corresponding disease. However, there are several pending technical challenges:

  • 1.

    Feature Extraction Challenge: High-throughput genomics data has specific features, e.g., high dimension features and lack of noted samples. Therefore, the key technical challenge is the exploration of data distribution, characteristic patent, and potential relationships based on prior knowledge from a few labeled samples.

  • 2.

    Computation Challenge: Under the circumstances of the huge amount of genomics data, the design of computation models, especially lightweight, intelligent, and efficient computation algorithms, is waiting for an urgent answer.

  • 3.

    Transfer Learning Challenge: In case of data distribution changes (for instance, gene expression data change from one species to another), the seamless transfer from the previous training model to another field is another technical challenge.

This paper investigates lncRNA-related issues and proposes a generic, lightweight, intelligent, and efficient computing model to predict key lncRNA related to disease pathophysiology. There are three contributions in our work:

  • 1.

    We proposed a Binary PSO-based algorithm for selecting possible lncRNA subsets based on extracted features and logical connections. As a result, it is possible to acquire optimal lncRNA subset via multiple and iterative optimization.

  • 2.

    ELM-based classification model is imported and implemented to evaluate each lncRNA’s influence on disease. The evaluation result is used to guide future selection preferences.

  • 3.

    We selected three datasets for experiment and evaluation: breast invasive carcinoma, carcinoma of the colon, and lung adenocarcinoma data. The result shows that our proposed solution achieves 93.6% classification accurate, which is the best.

The rest of the paper is organized as follows. We first present the relationship between lncRNA and disease (especially in the field of cancer), and then describes existing machine learning-based lncRNA research. Next, we introduce the lncRNA data collection, pretreatment, and noise filtering. The next section introduces the proposed BPSO-ML-ELM solution for lncRNA function prediction. We then illustrate the corresponding experiment, evaluation, and discuss the proposed solution in three datasets. Finally, the conclusion and future works are suggested.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing