A way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
Published in Chapter:
High-Throughput GRID Computing for Life Sciences
Giulia De Sario (Istituto di Tecnologie Biomediche at the Consiglio Nazionale delle Ricerche, Italy), Angelica Tulipano (Istituto di Tecnologie Biomediche at the Consiglio Nazionale delle Ricerche, Italy), Giacinto Donvito (INFN, Italy), and Giorgio Maggi (INFN Bari, Italy)
Copyright: © 2009
|Pages: 19
DOI: 10.4018/978-1-60566-374-6.ch010
Abstract
The number of fully sequenced genomes increases daily, producing an exponential explosion of the sequence, annotation and metadata databases. Data analysis on a genome-wide level or investigation within a specific data repository has become a data- and calculation-intensive process occupying single computers and even larger computer clusters for month or even years. In most cases such applications can be subdivided into many independent smaller tasks. The smaller tasks are particularly suited to distribution over a computational GRID infrastructure, which drastically reduces the time to reach the final result. In our analysis of gene ontology data and their associations to gene products of any kind of organism in a search to find gene products with similar functionalities, we developed a system to divide the full search into a large number of jobs and to submit these jobs to the GRID infrastructure as long as all jobs are processed successfully, guaranteeing an analysis of the data without missing any information.