Restricted expression patterns.
While the diversity of expression patterns that we detected was
considerable, our hybrid clustering approach identified a number of tissue or
domain specific expression patterns shared among a significant number of genes.
While these clusters are more easily categorized than the broad clusters, there
is still considerable ambiguity between clusters
Clusters 1R-4R contain 383 genes expressed in various combinations
of the yolk nuclei, fat body and blood related tissues. Clusters
1R and 2R genes are more likely to be expressed in combinations of these
different structures, while 3R genes are primarily expressed in the fat body,
and 4R genes in the head mesoderm and related tissues. Interestingly, the
various tissues included in these expression clusters derive from distinct
developmental lineages, raising the question of whether a single coordinated
expression program may underlie expression in these seemingly unrelated
developmental domains. They may be linked by having a conserved role in
immunity .
Clusters 5R-7R contain 1,160 genes with expression in various
epithelial structures late in embryogenesis . These tissues
include the epidermis, hindgut, foregut, and trachea, among others. The
appearance of the staining pattern is highly dynamic and it appears variable
depending on the precise intermediate stage captured. However, the fully formed
epidermal pattern is common and represents the most
recognizable and most abundant tissue-restricted pattern in embryogenesis. The
epidermal pattern is frequently combined with expression in tracheal system. A subset of genes is expressed earlier in
embryogenesis and most likely carry out a very different set of developmental
roles including morphogenesis. The differences between the late epithelial
clusters and the early epithelial cluster are
apparent not only in the CV annotations, but also in the average microarray
profiles of these clusters.
Cluster 13R-16R contains 525 genes expressed specifically in the
central and peripheral nervous system. In contrast to the broad
clusters 4B and 5B, these genes lack maternally contributed transcripts and any
detectable staining at or immediately after gastrulation. The central nervous
system specific gene expression begins at stage 11 and almost
always includes both the brain and the ventral nerve cord. A subset of genes is also expressed in the midline, with a small number showing
transcription before stage 11. Genes expressed exclusively in the midline were
extremely rare. Many genes are expressed in both the central and peripheral
nervous systems, while a significant number are expressed in the
peripheral nervous system alone.
Cluster 18R and 19R contain 229 genes expressed in either
differentiated somatic muscle or differentiated visceral muscle. Most genes that were detected in the visceral muscle became active
earlier in the mesoderm primordia. As with the head and trunk components of the
nervous system, expression in trunk muscles was almost always accompanied by
expression in head muscles.
Clusters
23R-29R contain 422 genes expressed in a domain-specific manner beginning in
the blastoderm stage embryo and typically continuing in a tissue-specific
manner throughout embryogenesis. These expression patterns tend
to be extremely diverse at every stage of embryogenesis, and many are assigned
to more than a single cluster. In fact, only 148 (35%) are assigned to a single
unique cluster. From our dataset, we can conclude that genes patterned in the
blastoderm have a tendency to be expressed in certain tissues later, especially
the CNS and epidermis. The relationship between blastoderm-stage expression and
later tissue-specific expression is elusive. While continuity of expression in
particular lineage-specific regulatory genes is well-documented, we fail to
detect any statistically significant relationship between annotations at the
blastoderm and later stages in our full, unbiased set of genes. While we cannot
conclusively rule out that this is due to a limitation of our controlled
vocabulary or some other artifact of our approach, it more likely indicates
that expression of such genes is initiated independently at different stages of
development rather then maintained through developmental lineages.
An
additional eight clusters contain 349 genes with various stereotypical
expression patterns. Some of these, like the cluster of
continuous pole and germ-cell expression, are comprised of a single
distinct tissue across stages, while others like the cluster of midgut-specific
genes are primarily expressed in a particular tissue at a
particular time. The fact that these tissues formed their own clusters under
our clustering scheme indicates that many genes are expressed specifically in
these structures, reflecting their functional specialization.
Despite
the significant number of genes that conform well to the patterns represented
by the above clusters, a large fraction are expressed in various and often
unique combinations of structures. We attempted to characterize these genes by
assigning them to the set of clusters
that best described their expression pattern. Of the 1,947 genes expressed in a
restricted manner, 795 (41%) are assigned to
more than one cluster. We illustrate this by showing several examples
of genes assigned to multiple clusters . By categorizing genes into
more than a single expression cluster, we also hope to facilitate more useful
online searches of our dataset by more fully representing the range of each
gene’s expression. The 29 restricted clusters can be viewed as distinct
transcriptional programs and the numerous genes that are expressed in unique combination
of tissues combine these basic programs. Such a view is consistent with our
current understanding of how complex patterns of expression are generated by a
set of independently acting cis-regulatory modules. An interesting direction
for future research will be to uncover the cis-regulatory modules that are
associated with the individual restricted clusters and to examine whether or
how are these modules are utilized to achieve the undisputable diversity in
gene expression regulation.
Can we estimate the
number of distinct expression patterns in Drosophila
embryogenesis? When we apply the criteria that genes with 75% or more of
their annotation terms in common are considered ‘indistinguishable’, we
identify X multi-gene groups
and X singletons among the genes in restricted
clusters. Thus by removing the broad genes, that are prone to inconsistent
annotation, the number of distinct patterns within our dataset drops from 2197
to X, providing an estimate of the number of
‘distinct’ patterns. On the other hand, these patterns are not unrelated. We
consider the 29 restricted clusters the most prominent recurring patterns in
the dataset and these define 29 sets of related patterns. We can only speculate
where to place the ‘real’—that is, biologically significant—number of patterns
within these defined extremes. It is clear that the clusters are not homogenous
since 41% of the genes exhibit composite patterns. We favor the idea that
majority of the composite patterns result from simple additive combination of
the basic patterns rather than a completely new gene specific regulation. We
believe this analysis will be much more informative once we can enumerate each
of the independently acting cis-acting regulatory modules that drive the
various elements of these restricted expression patterns.These cis-acting
regulatory modules are the fundamental units determining gene expression
patterns and we believe that performing the clustering analysis on the patterns
that each of these elements generates, rather than the patterns of entire genes
that result from the combined action of many such modules, will be more
powerful in revealing the underlying mechanisms and logic governing the
generation and evolution of each gene’s expression pattern.
Post Comment
No comments