Article Preview
TopIntroduction
Due to modern technological developments, massive amounts of genomic data are becoming available. Next generation sequence technologies allow us to quickly determine genomes (DNA sequences), transcriptomes (quantitative genome-wide gene-expression measurements), and other ‘omes’ (for reviews about the use of next generation sequencing, see the special series on this topic by Nature Review Genetics: http://www.nature.com/nrg/series/nextgeneration/index.html). Inferring (also called ‘reverse-engineering’, network reconstruction’ and ‘network identification’) Gene Regulatory Networks from genome-wide gene-expression measurements is one of the key challenges in modern biology, and a large number of algorithms have been proposed for this task (De Smet & Marchal, 2010; Marbach et al., 2012).
Several terms have been used to indicate models of regulatory processes and functional relations between genes, such as Gene Regulatory Networks, Gene Networks, Gene Expression Networks, Co-Expression Networks, Genetic Regulatory Networks, Transcriptional Regulatory Networks and Genetic Interaction Networks. While often used as such in the literature, not all of the above terms are actually synonyms. I therefore will provide a precise definition of the ‘Gene Regulatory Network’ and point out the essential differences with two other network models frequently used for gene regulation, i.e. Transcriptional Regulatory Networks and Co-Expression Networks.
Before a clear definition of Gene Regulatory Networks can be given, we first need to consider the abstract definition of a ‘network’, also formally called ‘graph’. The mathematical theory of graphs is called graph theory (Bollobas, 1998; Erdös & Renyi, 1959), but recent advances in Complex Network Science go beyond graph theory alone and incorporate ideas from physics, sociology and biology (Barabasi & Oltvai, 2004; Dorogovtsev & Mendes, 2003; Newman, 2003; Pieroni et al., 2008; Watts & Strogatz, 1998). Three main types of graphs are essential in the context of Gene Regulatory Networks:
An undirected graph G is an ordered pair G: = (V, U) that is subject to the following conditions.
V is a set, whose elements are called vertices or nodes (the later will be used in the remainder of the paper) and U is a set of unordered pairs of distinct vertices, called undirected edges, links or lines (‘undirected edges’ will be used in the remainder of the paper). For each edge uij = {vi, vj} the nodes vi and vj are said to be connected, linked or adjacent to each other. Undirected graphs can be effectively used to represent the existence of associations or functional relationships (edges) between entities (nodes).
A directed graph or digraph G is an ordered pair G: = (V, D) with V being a set of nodes and D a set of ordered pairs of vertices, called directed edges, arcs, or arrows (‘directed edges’ will be used in the remainder of the paper). A directed edge dij = {vi, vj} is considered to be directed from node vi to vj; vj is called the head or target and vi is called the tail or source; vj is said to be a direct successor, or child, of vi, and vi is said to be a direct predecessor, or parent, of vj. If a directed path leads from vi to vj, then vi is said to be an ancestor of vj. Directed graphs can be effectively used to represent causal influences or communication between the nodes.