Comparative Genomics
Comparative genomics is a field of biological research in which researchers use a variety of tools to compare the complete genome sequences of different species. By carefully comparing characteristics that define various organisms, researchers can pinpoint regions of similarity and difference.
Comparison of whole genome sequences provides a highly
detailed view of how organisms are related to each other at the genetic level.
How are genomes compared and what can these findings tell us about how the
overall structure of genes and genomes have evolved?
Comparative genomics is a field of biological research in
which the genome sequences of different species — human, mouse, and a wide
variety of other organisms from bacteria to chimpanzees — are compared. By
comparing the sequences of genomes of different organisms, researchers can
understand what, at the molecular level, distinguishes different life forms
from each other. Comparative genomics also provides a powerful tool for
studying evolutionary changes among organisms, helping to identify genes that
are conserved or common among species, as well as genes that give each organism
its unique characteristics.
What Is a Genome Made Of?
The genomes of almost all living creatures, both plants and
animals, consist of DNA (deoxyribonucleic acid), the chemical chain that
includes the genes that code for different proteins and the regulatory
sequences that turn those genes on and off. Precisely which protein is produced
by any given gene is determined by the sequence in which four building blocks -
adenine (A), thymine (T), cytosine (C) and guanine (G) - are laid out along
DNA's twisted, double-helix structure.
Although living creatures look and behave in a myriad of
ways, all of their genomes consist of DNA, the chemical chain that harbors the
genes that code for thousands of different kinds of proteins. Within DNA are
the instructions sufficient to make an organism and the means by which
organisms pass information along to their offspring. Remarkably, this
information is coded by only four nucleotides: adenosine (A), cytosine (C),
guanine (G), and thymine (T). Understanding the order of these nucleotides in
linear DNA molecules has been an active pursuit since the discovery of DNA’s
double-helical structure (Watson et al. 1953). As such, DNA sequencing has
emerged as a fundamental approach to molecular biology research. The power of
DNA sequencing as a research tool has spurred the dramatic advancement of DNA
sequencing technology, which is allowing ever more genomes to be sequenced and
making comparative genomics an accessible focal point for the study of any form
of life.
What Genomes Have Been Sequenced?
In addition to sequencing the three billion letters in the
human “genetic instruction book” (Lander et al. 2001), researchers involved in
the International Human Genome Project (HGP) sequenced the genomes of a number
of important model organisms. These include chimpanzee (Lander et al. 2005),
mouse (Waterston et al. 2002), rat (Gibbs et al. 2004), two puffer fish
(Jaillon et al. 2004; Aparicio et al. 2002), fruit fly (Adams et al. 2000), two
sea squirts (Dehal et al. 2002; Small et al. 2007), two roundworms (Stain et
al. 2003; Stein et al. 1998), baker's yeast (Goffeau et al. 1996), and the
bacterium Escherichia coli (Blattner et al. 1997). Since the completion of the
HGP, sequence drafts of the chicken (Blattner et al. 2004), cow (Elsik et al.
2009), dog (Lindblad-Toh et al. 2005), honey bee (Lindblad-Toh et al. 2006),
sea urchin (Sodergren et al. 2006) and rhesus macaque monkey (Gibbs et al.
2007) (to name just a few) have also been established.
Together with over 1,000 prokaryote genomes, a total of over
1,300 species have been completely sequenced and published (ca. 2010; see here)
and this number continues to grow at a prodigious rate, providing a rich source
of genomic data for comparison.
How Are Genomes Compared?
A simple comparison of the general features of genomes such
as genome size, number of genes, and chromosome number presents an entry point
into comparative genomic analysis. Data for several fully-sequenced model
organisms is shown in Table 1. The comparisons highlight some striking
findings. For example, while the tiny flowering plant Arabidopsis thaliana has
a smaller genome than that of the fruit fly Drosophila melanogaster (157
million base pairs v. 165 million base pairs, respectively) it possesses nearly
twice as many genes (25,000 v. 13,000). In fact A. thaliana has approximately
the same number of genes as humans (~25,000). Thus, a very early lesson learned
in the "genomic era" is that genome size does not correlate with
evolutionary status, nor is the number of genes proportionate to genome size.
Finer-resolution comparisons are possible by direct DNA
sequence comparisons between species. Figure 1 depicts a chromosome-level
comparison of the human and mouse genomes that shows the level of synteny
between these two mammals. Synteny is a situation in which genes are arranged
in similar blocks in different species. The nature and extent of conservation
of synteny differs substantially among chromosomes. For example, the X
chromosomes are represented as single, reciprocal syntenic blocks. Human
chromosome 20 corresponds entirely to a portion of mouse chromosome 2, with
nearly perfect conservation of order along almost the entire length, disrupted
only by a small central segment. Human chromosome 17 corresponds entirely to a
portion of mouse chromosome 11. Other chromosomes, however, show evidence of
more extensive interchromosomal rearrangement. Results such as these provide an
extraordinary glimpse into the chromosomal changes that have shaped the mouse
and human genomes since their divergence from a common ancestor 75–80 million
years ago.
Comparison of discrete segments of genomes is also possible
by aligning homologous DNA from different species. An example of such an
alignment is shown in Figure 2, where a human gene (pyruvate kinase: PKLR) and
the corresponding PKLR homologs from macaque, dog, mouse, chicken, and
zebrafish are aligned. Regions of high DNA sequence similarity with human
across a 12-kilobase region of the PKLR gene are plotted for each organism.
Notice the high degree of sequence similarity between human and macaque (two
primates) in both PKLR exons (blue) as well as introns (red) and untranslated regions
(light blue) of the gene. In contrast, the chicken and zebrafish alignments
with human only show similarity to sequences in the coding exons; the rest of
the sequence has diverged to a point where it can no longer be reliably aligned
with the human DNA sequence. Using such computer-based analysis to zero in on
the genomic features that have been preserved in multiple organisms over
millions of years, researchers are able to locate the signals that represent
the location of genes, as well as sequences that may regulate gene expression.
Indeed, much of the functional parts of the human genome have been discovered
or verified by this type of sequence comparison (Lander et al. 2001) and it is
now a standard component of the analysis of every new genome sequence.
We have learned from homologous sequence alignment that the
information that can be gained by comparing two genomes together is largely
dependent upon the phylogenetic distance between them. Phylogenetic distance is
a measure of the degree of separation between two organisms or their genomes on
an evolutionary scale, usually expressed as the number of accumulated sequence
changes, number of years, or number of generations. The distances are often
placed on phylogenetic trees, which show the deduced relationships among the
organisms (Figure 3). The more distantly related two organisms are, the less
sequence similarity or shared genomic features will be detected between them.
Thus, only general insights about classes of shared genes can be gathered by genomic
comparisons at very long phylogenetic distances (e.g., over one billion years
since their separation). Over such very large distances, the order of genes and
the signatures of sequences that regulate their transcription are rarely
conserved.
At closer phylogenetic distances (50–200 million years of
divergence), both functional and non-functional DNA is found within the
conserved segments. In these cases, the functional sequences will show
signatures of selection by virtue of their sequences having changed less, or
more slowly than, non-functional DNA. Moreover, beyond the ability to
discriminate functional from non-functional DNA, comparative genomics is also
contributing to the identification of general classes of important DNA
elements, such as coding exons of genes, non-coding RNAs, and some gene
regulatory sites.
In contrast, very similar genomes separated by about 5
million years of evolution (such as human and chimpanzee) are particularly
useful for finding the sequence differences that may account for subtle
differences in biological form. These are sequence changes under directional
selection, a process whereby natural selection favors a single phenotype and
continuously shifts the allele frequency in one direction. Comparative genomics
is thus a powerful and promising approach to biological discovery that becomes
more and more informative as genomic sequence data accumulate.
What results has the field of comparative genomics produced?
Comparative genomics has yielded dramatic results.
Investigators are increasingly using comparative genomics to explore areas
ranging from human development and behavior to metabolism and susceptibility to
disease. These studies are uncovering new behavioral, neurological and
developmental pathways and genes that are shared or related among species. Some
researchers are using comparative genomics to reveal the genomic underpinnings
of disease in animals with the hope of gaining new insights into disease
development in humans.
Among the results so far are the following:
A study discovered that about 60 percent of genes are
conserved between fruit flies and humans, meaning that the two organisms appear
to share a core set of genes. Two-thirds of human genes known to be involved in
cancer have counterparts in the fruit fly.
A comparative genomics analysis of six species of yeast
prompted scientists to significantly revise their initial catalog of yeast
genes and to predict a new set of functional elements that play a role in
regulating genome activity, not just in yeast but across many species.
Researchers studying milk production have mapped genes that
increase the yield of high-fat milk in cows, resulting in higher production
levels and potentially a significant economic impact. This is one of many
studies aimed at increasing food production.
Scientists have found genes that increase muscling in cattle
by twofold; they found the same genes in racing dogs, and such results may
foster human performance studies.
Comparisons of nearly 50 bird species' genomes revealed a gene network that underlies singing in birds and that may have an important role in human speech and language. The bird researchers also found gene networks responsible for traits such as feathers and beaks.
What Are the Benefits of Comparative Genomics?
Identifying DNA sequences that have been
"conserved" - that is, preserved in many different organisms over
millions of years - is an important step toward understanding the genome
itself. It pinpoints genes that are essential to life and highlights genomic
signals that control gene function across many species. It helps us to further
understand what genes relate to various biological systems, which in turn may
translate into innovative approaches for treating human disease and improving
human health.
Comparative genomics also provides a powerful tool for
studying evolution. By taking advantage of - and analyzing- the evolutionary
relationships between species and the corresponding differences in their DNA,
scientists can better understand how the appearance, behavior and biology of
living things have changed over time.
As DNA sequencing technology becomes more powerful and less
expensive, comparative genomics is finding wider applications in agriculture,
biotechnology and zoology as a tool to tease apart the often subtle differences
among animal species. Such efforts have led to new insights into some branches
on the evolutionary tree, as well as improving the health of domesticated
animals and pointing to new strategies for conserving rare and endangered
species.
Dramatic results have emerged from the rapidly developing
field of comparative genomics. Comparison of the fruit fly genome with the
human genome reveals that about sixty percent of genes are conserved (Adams et
al. 2000). That is, the two organisms appear to share a core set of genes.
Researchers have also found that two-thirds of human genes known to be involved
in cancer have counterparts in the fruit fly.
In addition to its implications for human health,
comparative genomics may benefit the broader animal world and ecological
studies as well. As sequencing technology grows easier and less expensive, it
will find wide applications in agriculture, biotechnology, and zoology as a
tool to tease apart the often-subtle differences among animal and plant
species. Such efforts might also lead to the rearrangement of our understanding
of some branches of the evolutionary "tree of life," as well as point
to new strategies for conserving rare and endangered species.
No comments