Breaking News

Comparative Genomics

Comparative genomics is a field of biological research in which researchers use a variety of tools to compare the complete genome sequences of different species. By carefully comparing characteristics that define various organisms, researchers can pinpoint regions of similarity and difference.

Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the genetic level. How are genomes compared and what can these findings tell us about how the overall structure of genes and genomes have evolved?

Comparative genomics is a field of biological research in which the genome sequences of different species — human, mouse, and a wide variety of other organisms from bacteria to chimpanzees — are compared. By comparing the sequences of genomes of different organisms, researchers can understand what, at the molecular level, distinguishes different life forms from each other. Comparative genomics also provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give each organism its unique characteristics.

What Is a Genome Made Of?

The genomes of almost all living creatures, both plants and animals, consist of DNA (deoxyribonucleic acid), the chemical chain that includes the genes that code for different proteins and the regulatory sequences that turn those genes on and off. Precisely which protein is produced by any given gene is determined by the sequence in which four building blocks - adenine (A), thymine (T), cytosine (C) and guanine (G) - are laid out along DNA's twisted, double-helix structure.

Although living creatures look and behave in a myriad of ways, all of their genomes consist of DNA, the chemical chain that harbors the genes that code for thousands of different kinds of proteins. Within DNA are the instructions sufficient to make an organism and the means by which organisms pass information along to their offspring. Remarkably, this information is coded by only four nucleotides: adenosine (A), cytosine (C), guanine (G), and thymine (T). Understanding the order of these nucleotides in linear DNA molecules has been an active pursuit since the discovery of DNA’s double-helical structure (Watson et al. 1953). As such, DNA sequencing has emerged as a fundamental approach to molecular biology research. The power of DNA sequencing as a research tool has spurred the dramatic advancement of DNA sequencing technology, which is allowing ever more genomes to be sequenced and making comparative genomics an accessible focal point for the study of any form of life.

What Genomes Have Been Sequenced?

In addition to sequencing the three billion letters in the human “genetic instruction book” (Lander et al. 2001), researchers involved in the International Human Genome Project (HGP) sequenced the genomes of a number of important model organisms. These include chimpanzee (Lander et al. 2005), mouse (Waterston et al. 2002), rat (Gibbs et al. 2004), two puffer fish (Jaillon et al. 2004; Aparicio et al. 2002), fruit fly (Adams et al. 2000), two sea squirts (Dehal et al. 2002; Small et al. 2007), two roundworms (Stain et al. 2003; Stein et al. 1998), baker's yeast (Goffeau et al. 1996), and the bacterium Escherichia coli (Blattner et al. 1997). Since the completion of the HGP, sequence drafts of the chicken (Blattner et al. 2004), cow (Elsik et al. 2009), dog (Lindblad-Toh et al. 2005), honey bee (Lindblad-Toh et al. 2006), sea urchin (Sodergren et al. 2006) and rhesus macaque monkey (Gibbs et al. 2007) (to name just a few) have also been established.

Together with over 1,000 prokaryote genomes, a total of over 1,300 species have been completely sequenced and published (ca. 2010; see here) and this number continues to grow at a prodigious rate, providing a rich source of genomic data for comparison.

How Are Genomes Compared?

A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Data for several fully-sequenced model organisms is shown in Table 1. The comparisons highlight some striking findings. For example, while the tiny flowering plant Arabidopsis thaliana has a smaller genome than that of the fruit fly Drosophila melanogaster (157 million base pairs v. 165 million base pairs, respectively) it possesses nearly twice as many genes (25,000 v. 13,000). In fact A. thaliana has approximately the same number of genes as humans (~25,000). Thus, a very early lesson learned in the "genomic era" is that genome size does not correlate with evolutionary status, nor is the number of genes proportionate to genome size.

Finer-resolution comparisons are possible by direct DNA sequence comparisons between species. Figure 1 depicts a chromosome-level comparison of the human and mouse genomes that shows the level of synteny between these two mammals. Synteny is a situation in which genes are arranged in similar blocks in different species. The nature and extent of conservation of synteny differs substantially among chromosomes. For example, the X chromosomes are represented as single, reciprocal syntenic blocks. Human chromosome 20 corresponds entirely to a portion of mouse chromosome 2, with nearly perfect conservation of order along almost the entire length, disrupted only by a small central segment. Human chromosome 17 corresponds entirely to a portion of mouse chromosome 11. Other chromosomes, however, show evidence of more extensive interchromosomal rearrangement. Results such as these provide an extraordinary glimpse into the chromosomal changes that have shaped the mouse and human genomes since their divergence from a common ancestor 75–80 million years ago.

Comparison of discrete segments of genomes is also possible by aligning homologous DNA from different species. An example of such an alignment is shown in Figure 2, where a human gene (pyruvate kinase: PKLR) and the corresponding PKLR homologs from macaque, dog, mouse, chicken, and zebrafish are aligned. Regions of high DNA sequence similarity with human across a 12-kilobase region of the PKLR gene are plotted for each organism. Notice the high degree of sequence similarity between human and macaque (two primates) in both PKLR exons (blue) as well as introns (red) and untranslated regions (light blue) of the gene. In contrast, the chicken and zebrafish alignments with human only show similarity to sequences in the coding exons; the rest of the sequence has diverged to a point where it can no longer be reliably aligned with the human DNA sequence. Using such computer-based analysis to zero in on the genomic features that have been preserved in multiple organisms over millions of years, researchers are able to locate the signals that represent the location of genes, as well as sequences that may regulate gene expression. Indeed, much of the functional parts of the human genome have been discovered or verified by this type of sequence comparison (Lander et al. 2001) and it is now a standard component of the analysis of every new genome sequence.

We have learned from homologous sequence alignment that the information that can be gained by comparing two genomes together is largely dependent upon the phylogenetic distance between them. Phylogenetic distance is a measure of the degree of separation between two organisms or their genomes on an evolutionary scale, usually expressed as the number of accumulated sequence changes, number of years, or number of generations. The distances are often placed on phylogenetic trees, which show the deduced relationships among the organisms (Figure 3). The more distantly related two organisms are, the less sequence similarity or shared genomic features will be detected between them. Thus, only general insights about classes of shared genes can be gathered by genomic comparisons at very long phylogenetic distances (e.g., over one billion years since their separation). Over such very large distances, the order of genes and the signatures of sequences that regulate their transcription are rarely conserved.

At closer phylogenetic distances (50–200 million years of divergence), both functional and non-functional DNA is found within the conserved segments. In these cases, the functional sequences will show signatures of selection by virtue of their sequences having changed less, or more slowly than, non-functional DNA. Moreover, beyond the ability to discriminate functional from non-functional DNA, comparative genomics is also contributing to the identification of general classes of important DNA elements, such as coding exons of genes, non-coding RNAs, and some gene regulatory sites.

In contrast, very similar genomes separated by about 5 million years of evolution (such as human and chimpanzee) are particularly useful for finding the sequence differences that may account for subtle differences in biological form. These are sequence changes under directional selection, a process whereby natural selection favors a single phenotype and continuously shifts the allele frequency in one direction. Comparative genomics is thus a powerful and promising approach to biological discovery that becomes more and more informative as genomic sequence data accumulate.

What results has the field of comparative genomics produced?

Comparative genomics has yielded dramatic results. Investigators are increasingly using comparative genomics to explore areas ranging from human development and behavior to metabolism and susceptibility to disease. These studies are uncovering new behavioral, neurological and developmental pathways and genes that are shared or related among species. Some researchers are using comparative genomics to reveal the genomic underpinnings of disease in animals with the hope of gaining new insights into disease development in humans.

Among the results so far are the following:

A study discovered that about 60 percent of genes are conserved between fruit flies and humans, meaning that the two organisms appear to share a core set of genes. Two-thirds of human genes known to be involved in cancer have counterparts in the fruit fly.

A comparative genomics analysis of six species of yeast prompted scientists to significantly revise their initial catalog of yeast genes and to predict a new set of functional elements that play a role in regulating genome activity, not just in yeast but across many species.

Researchers studying milk production have mapped genes that increase the yield of high-fat milk in cows, resulting in higher production levels and potentially a significant economic impact. This is one of many studies aimed at increasing food production.

Scientists have found genes that increase muscling in cattle by twofold; they found the same genes in racing dogs, and such results may foster human performance studies.

Comparisons of nearly 50 bird species' genomes revealed a gene network that underlies singing in birds and that may have an important role in human speech and language. The bird researchers also found gene networks responsible for traits such as feathers and beaks.

What Are the Benefits of Comparative Genomics?

Identifying DNA sequences that have been "conserved" - that is, preserved in many different organisms over millions of years - is an important step toward understanding the genome itself. It pinpoints genes that are essential to life and highlights genomic signals that control gene function across many species. It helps us to further understand what genes relate to various biological systems, which in turn may translate into innovative approaches for treating human disease and improving human health.

Comparative genomics also provides a powerful tool for studying evolution. By taking advantage of - and analyzing- the evolutionary relationships between species and the corresponding differences in their DNA, scientists can better understand how the appearance, behavior and biology of living things have changed over time.

As DNA sequencing technology becomes more powerful and less expensive, comparative genomics is finding wider applications in agriculture, biotechnology and zoology as a tool to tease apart the often subtle differences among animal species. Such efforts have led to new insights into some branches on the evolutionary tree, as well as improving the health of domesticated animals and pointing to new strategies for conserving rare and endangered species.

Dramatic results have emerged from the rapidly developing field of comparative genomics. Comparison of the fruit fly genome with the human genome reveals that about sixty percent of genes are conserved (Adams et al. 2000). That is, the two organisms appear to share a core set of genes. Researchers have also found that two-thirds of human genes known to be involved in cancer have counterparts in the fruit fly.

In addition to its implications for human health, comparative genomics may benefit the broader animal world and ecological studies as well. As sequencing technology grows easier and less expensive, it will find wide applications in agriculture, biotechnology, and zoology as a tool to tease apart the often-subtle differences among animal and plant species. Such efforts might also lead to the rearrangement of our understanding of some branches of the evolutionary "tree of life," as well as point to new strategies for conserving rare and endangered species.

No comments