A Unified Multi-View Clustering Method Based on Non-Negative Matrix Factorization for Cancer Subtyping

A Unified Multi-View Clustering Method Based on Non-Negative Matrix Factorization for Cancer Subtyping

Zhanpeng Huang, Jiekang Wu, Jinlin Wang, Yu Lin, Xiaohua Chen
Copyright: © 2023 |Pages: 19
DOI: 10.4018/IJDWM.319956
Article PDF Download
Open access articles are freely available for download

Abstract

Non-negative matrix factorization (NMF) has gained sustaining attention due to its compact leaning ability. Cancer subtyping is important for cancer prognosis analysis and clinical precision treatment. Integrating multi-omics data for cancer subtyping is beneficial to uncover the characteristics of cancer at the system-level. A unified multi-view clustering method was developed via adaptive graph and sparsity regularized non-negative matrix factorization (multi-GSNMF) for cancer subtyping. The local geometrical structures of each omics data were incorporated into the procedures of common consensus matrix learning, and the sparsity constraints were used to reduce the effect of noise and outliers in bioinformatics datasets. The performances of multi-GSNMF were evaluated on ten cancer datasets. Compared with 10 state-of-the-art multi-view clustering algorithms, multi-GSNMF performed better by providing significantly different survival in 7 out of 10 cancer datasets, the highest among all the compared methods.
Article Preview
Top

Introduction

Due to the increasing number of new cancer cases and deaths, even with the rapid development of medical technology, cancer still seriously threatens human health and is an important cause of human death. The latest estimates for cancer from the International Agency for Research on Cancer (IARC, 2021) show 19.3 million new cases of cancer worldwide and 10 million cancer deaths in 2020. Cancer is expected to surpass cardiovascular disease as the main cause of premature death in most countries in this century. The rapid development of high-throughput technologies such as deep sequencing has enabled the discovery of mass amounts of biological information, which is conducive to better characterizing human diseases and facilitating personalized treatments. In oncology, analysis based on high-throughput biological data sets has discovered new cancer subtypes, which have been used for cancer treatment decisions (Parker et al., 2009; Prasad et al., 2016).

Machine learning technology is widely used in the analysis of bioinformatics data, which can support decision-making and treatment planning for the doctors (Amin et al., 2021; Kumar-Sinha & Chinnaiyan, 2018; Rajinikanth & Kadry, 2021). In order to improve cancer diagnosis and treatment, genomic and other molecular profiles of tumor biopsies have been analyzed for precision tumor therapy. By incorporating gene network interaction, a novel coclustering algorithm has been proposed for identifying cancer subtypes (Liu et al., 2014). However, the role of the human genome is complex and chaotic, and it can regulate biological processes at different levels. The human genome could be revealed by integrating various genomics, such as gene expression, copy number variation, and DNA methylation (Huang et al., 2017). Modern genomic and clinical research urgently needs integrated machine learning models of multiomics data to better utilize large amounts of heterogeneous information to deeply understand biological systems. Multiomics data can obtain information from different perspectives and levels, which is conducive to understanding complex biological systems (Li et al., 2016). The integration and clustering of multiomic data are some of the research hotspots of machine learning in the field of bioinformatics.

To take advantage of local geometrical structures and global structures of the bioinformatics data, a novel multiview clustering method based on nonnegative matrix factorization (NMF) is proposed for cancer subtyping. The local geometrical structures of each omics data set were encoded by generating a nearest neighbor graph. The global structures of a multiomics data set were captured by the sparsity regularized constraints. Then, the unified objective function was used by incorporating local geometrical structures of each omics data set and sparsity regularized common consensus matrix into the NMF-based framework. The novel multiview NMF-based method can obtain the common consensus representation of a multiomics data set, while the sparsity constraints are integrated to handle the noise and outliers in bioinformatics data. Figure 1 illustrates the framework of the unified multiview clustering method. The multiview NMF with graph-regularized and sparsity constraints was integrated to form a unified framework. The final clustering results were gained by spectral clustering. The main contributions are as follows:

  • 1.

    A unified framework for cancer subtyping by considering the feature of a cancer data set was proposed, which will be useful to identify cancer subtyping in precision medicine that would otherwise be obscured by noise and outliers in bioinformatics.

  • 2.

    The local geometrical structures and sparsity constraints are incorporated into the multivew clustering process to form a unified objective function for cancer subtyping based on nonnegative matrix factorization.

  • 3.

    By incorporating the local geometrical structures of each omics data set and the sparsity constraints on a common consensus matrix into the clustering process, Multi-GSNMF provides a unified model and a novel solution to fuse multiview data for clustering.

Figure 1.

Framework of the Proposed Algorithm

IJDWM.319956.f01

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing