Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Performance Analysis of Classifiers on Filter-Based Feature Selection Approaches on Microarray Data

Arunkumar Chinnaswamy, Ramakrishnan Srinivasan

Source Title: Bio-Inspired Computing for Information Retrieval Applications

DOI: 10.4018/978-1-5225-2375-8.ch002

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The process of Feature selection in machine learning involves the reduction in the number of features (genes) and similar activities that results in an acceptable level of classification accuracy. This paper discusses the filter based feature selection methods such as Information Gain and Correlation coefficient. After the process of feature selection is performed, the selected genes are subjected to five classification problems such as Naïve Bayes, Bagging, Random Forest, J48 and Decision Stump. The same experiment is performed on the raw data as well. Experimental results show that the filter based approaches reduce the number of gene expression levels effectively and thereby has a reduced feature subset that produces higher classification accuracy compared to the same experiment performed on the raw data. Also Correlation Based Feature Selection uses very fewer genes and produces higher accuracy compared to Information Gain based Feature Selection approach.

Chapter Preview

Top

1. Introduction

Statistical analysis of differentially expressed genes helps to assign them to different classes. This process enhances the basic understanding of the biological processes in the system. The activity of thousands of genes could be investigated simultaneously using the concept of microarray gene expression technology. Gene expression profiles are used to predict the relative abundance and presence of mRNA in the genes. The results obtained using suitable discriminant analysis represent the state of the cell that serves as a tool for the diagnosis, prediction and treatment of diseases. The hybridization process is used for generating DNA microarray samples. This process can be done in two ways. In the first method, during the process of hybridization, the messenger RNA (mRNA) taken from sample tissues or from the blood stream is converted to cDNA if it uses spotted arrays. RNA profiles may be noisy and might be unequally sampled over time. The second method involves the use of Affymetrix chips that hybridizes the oligonucleotides on the surface of the chip array. The simultaneous measurement and monitoring of thousands of genes using a single experiment is made possible by using DNA microarray technology (Li Yeh Chuang, Kuo-Chuan Wu, & Cheng-Hong Yang, 2008). The production of proteins in a gene signifies the gene expression level that aids in identifying the membership of the different classes. The presence of a wide variety of gene expression problems helps in advancement in the field of clinical medicine using results produced by several microarray experiments. Microarray data finds its application in the areas of cancer classification, disease diagnosis, prediction and treatment and most importantly in the area of gene identification that would be used in drug development at later stages. This has been a recent advancement in the area of clinical research. Microarray cancer data is combined with statistical techniques to analyze the gene expression patterns to identify potential bio markers for the diagnosis and treatment of different types of cancer (Arunkumar C & Ramakrishnan S, 2014).

The most common challenge in bioinformatics is the process of selecting relevant and non redundant genes from the dataset. Complex biological problems can only be solved by predicting and classifying the genes in the most efficient way. Feature Selection and Classification are considered to be the two key tasks in microarray gene expression analysis. The process of classification purely depends on Feature selection as the fewer gene subsets will contribute to adequate increase in classifier accuracy. Identification of a subset of differentially expressed genes is the main goal of feature selection. This identified subset would exhibit strong correlation between different classes and this helps to distinguish features between these classes. Another important measure is to avoid overfitting and build faster and cost effective models. During the process of feature selection there might be situations wherein a weakly ranked gene might perform well and a critical gene might be left out during the process of classification. The problem of classification is time consuming because of the fact that the sample size is very small and the dimensionality of the data is very large. The process of feature selection performed before classification reduces the running time and also increases the accuracy of prediction. Lot of research is carried out in predicting the essential features before the classification process and therefore increases the accuracy of prediction. In general, two key aspects govern the process of gene (feature) selection. They are functionally similar and closely related genes and the second is to find the smallest subset of genes that can provide meaningful diagnostic information for disease prediction and treatment without reduction in accuracy (Li Yeh Chuang et al., 2008). The process of disease diagnosis and treatment requires the use of only a small subset of genes and this subset helps in increasing the predictive accuracy. The predictive accuracy could be increased and incomprehensibility could be avoided by choosing the best feature selection method. The primary goal of classification is to build an efficient model that would identify differentially expressed genes and it could further be used to identify the classes in unknown samples. In this study, we used two filter based approaches to perform feature selection. They are easy to use, simple and computationally efficient (Cheng San Yang, Cheng-San Yang, Li-Yeh Chuang, Chao-Hsuan Ke, & Cheng-Hong Yang 2008).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Performance Analysis of Classifiers on Filter-Based Feature Selection Approaches on Microarray Data

Abstract

1. Introduction

Complete Chapter List