Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Novel PSSM-Based Approaches for Gene Identification Using Support Vector Machine

Heena Farooq Bhat, M. Arif Wani

Source Title: Journal of Information Technology Research (JITR) 14(2)

DOI: 10.4018/JITR.2021040108

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

By understanding the function of each protein encoded in genome, the molecular mechanism of the cell can be recognized. In genome annotation field, several methods or techniques have been developed to locate or predict the patterns of genes in genome sequence. However, recognizing corresponding gene of a given protein sequence using conventional tools is inherently complicated and error prone. This paper first focuses on the issue of gene prediction and its challenges. The authors then present a novel method for identifying genes that involves a two-step process. First the research presents new features extracted from protein sequences using a position specific scoring matrix (PSSM). The PSSM profiles are converted into uniform numeric representation. Then, a new structured approach has been applied on PSSM vector which uses a decision tree-based technique for obtaining rules. Finally, the rules of single class are joined together to form a matrix which is then given as an input to SVM for classification purpose. The rules derived from algorithm correspond to genes. The authors also introduce another approach for predicting genes based on PSSM using SVM. Both the methods have been implemented on genome DNAset dataset. Empirical evaluation shows that PSSM based SAFARI approach produces better results.

Article Preview

Top

1. Introduction

As the genome sequence data grows at a very large pace, the various number of gene predicting programs have come into existence. Big data associated with the problem of identification of genes can be projected into sub-spaces or clusters so that the given problem can be divided into sub-problems. Each sub-problem can then be optimized with an appropriate model. One of the approaches of dividing a given problem into sub-problems involves projecting the big data to various sub-space grids (Wani, 2012; Wani & Yesilbudak, 2013), where each sub-space grid represents a sub-problem. Each sub-problem can then be represented independently with various models which can be combined by using a rule based system (Wani, 2001). The gene identification from large genome sequence is found to be one of the significant issues to solve in the field of bioinformatics. There is an essential requirement of developing gene finding methods and their corresponding functions. The primary issue in the process of gene forecasting is to locate the protein coding genes in genomic DNA sequence. In spite of large amount of amino acid sequences of proteins produced, only a small part of protein function has been interpreted. DNA binding proteins plays an essential role in all cell functions such as DNA replication, DNA repair, DNA modification and all the other activities allied with DNA. Most of the genes include statistics for generating proteins at definite level and these proteins then used to carry out a broad diversity of procedures in the unit. Other type of genes, known as non-coding genes, determines efficient genetic material (RNA) which is occupied in the guidelines of appearance of genes and production of proteins. This sequence of DNA is not allowed to transform into amino acids and hence be deficient in the distinctive sequence restriction of coding sequences.

The specific recognition of genes is one of the elementary steps in all meta-genomic sequencing projects (Goel et al, 2013). Gene forecasting also involves the use of Support Vector Machines. The Support Vector Machines (SVM) is a supervised learning algorithm used to categorize reserved blueprint of data (Bhat & Wani, 2014). SVM has been applied to various other domains which incorporate wind speed prediction (Wani & Bhat, 2017), fingerprint recognition (Khan & Wani, 2015), face recognition (Bhat & Wani, 2014), global solar radiation forecasting (Mujtaba & Wani, 2017) and also evaluates various information retrieval algorithms with the use of linear algebra (Bhat & Wani, 2017). The functional proteins included in organisms at the upper level are not adjacent. These are frequently divided into coding and non-coding regions. Such types of coding fragments are recognized as exons. The exonomic sequences are then mixed together by non-coding section of exceedingly changeable length known as introns. The extensively used move toward the genome annotation consists of two methods namely extrinsic and intrinsic methods. The extrinsic methods are used for homology detection (Bhat & Wani, 2017) and intrinsic methods are used for gene prediction (Mathe et al, 2002). The homology methods when executed can forecast only half portion of the genes and the rest of the genes remain unknown. Therefore, more extrapolative, fast and reliable methods are needed which can detect all the protein coding genes accurately. The method of assimilating nucleic acid similarity search has been shown practically in a long line of accomplishment, including GRAIL (Xu et al, 1996), HMMgene (Krogh, 2000), and Genscan (Burge & Karlin, 1997) and GenomeScan (Yeh et al, 2001). The most important challenge that follows the sequencing of either a small segment of DNA sequence or a long genome sequence is to establish the location of functional units like protein coding genes (exons), splice sites, terminators etc. This provides a procedure of identifying the regions that encode proteins. The protein homology detection also takes an account of recognizing the patterns in multidimensional data (Wani, 2012). This type of segment is known as an Open Reading Frame (ORF) (Klasberg et al, 2016) which assembles like a gene but it has not been proved to be a gene yet.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 14: 4 Issues (2021)

Volume 13: 4 Issues (2020)

Volume 12: 4 Issues (2019)

Volume 11: 4 Issues (2018)

Volume 10: 4 Issues (2017)

Volume 9: 4 Issues (2016)

Volume 8: 4 Issues (2015)

Volume 7: 4 Issues (2014)

Volume 6: 4 Issues (2013)

Volume 5: 4 Issues (2012)

Volume 4: 4 Issues (2011)

Volume 3: 4 Issues (2010)

Volume 2: 4 Issues (2009)

Volume 1: 4 Issues (2008)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Novel PSSM-Based Approaches for Gene Identification Using Support Vector Machine

Abstract

1. Introduction

Complete Article List