The evidence for, and purpose of, functional non-coding DNA

"The failure to recognise the implications of the non-coding DNA will go down as the biggest mistake in the history of molecular biology" Prof. John Mattick

1. Introduction

Biochemical experiments have demonstrated that only a small fraction of the genome of most eukaryotic organisms directly codes for proteins. In humans estimates of this fraction vary between ~ 1.5% - 2% [ref 1]. This is an enigmatic observation given the orthodox view of how the information coded in the genome is utilised by the organism. The central dogma of molecular biology states that DNA codes for RNA which in turn is used by ribosomes to produce proteins. These proteins are then assumed to carry out the vast majority of structural and enzymatic functions required by the cell. Until recently the vast majority of DNA in the genome had been labelled as 'junk' and was assumed to have little or no function.

Recent studies however have shown that numerous parts of this 'junk' are in fact functional. Many geneticists have even speculated that the information in these regions of the genome may be far more important to the organism than the protein coding segments. In this review we will look at the evidence for functional non-coding DNA (ncDNA) and the roles it plays in the development and maintenance of the organism. We shall define ncDNA to be any region of DNA that does not code for a protein

2. Evidence for functional ncDNA

There are numerous special cases of ncDNA where a function has already been described. For example the promoter sequence upstream of a coding gene is known to initiate transcription of that gene by allowing the binding of RNA polymerase, but is not itself transcribed. Likewise the repeating sequences at the ends of chromosomes (telomeres) are known to be present to prevent the degradation of functional DNA due to transcription limitations at the end of a DNA molecule. There are also well understood cases of DNA coding for functional RNA that does not in turn code for protein, such as tRNA and rRNA. As we shall see shortly there is also a mounting body of evidence for the existence of an abundance of new varieties of functional RNA (non-coding RNA - ncRNA)

It is observed that large quantities of RNA are transcribed in cells that do not subsequently get translated [ref 2]. Thus ncRNA is abundant in the cell. The question therefore is whether it is functional or simply represent transcriptional wastage.

Compelling evidence of the functionality of ncDNA has come from the field of comparative genomics. By comparing the genomes of different organisms and looking for shared sequences we can deduce the existence of functional regions. If a segment of DNA has diverged much less between the 2 organisms than we would expect from neutral evolution over the time scale since they diverged it is expected that this region must be experiencing negative or 'purifying' selection [ref 3]. That is we assume that the region in question must have some essential function for the organism such that any variation has been selected out. While less than 2% of the human genome is thought to code for proteins, around 5% is found to be comprised of these ultra-conserved regions when compared to the mouse and rat genomes [ref 4]. These regions exhibit almost 100% fidelity over lengths of 200bp or more. Human variation in these ultra-conserved segments has also be investigated and found to be incredibly low. 6 of 106,767 bp in these regions were found to have recognised single nucleotide polymorphisms (SNPs), compared to a predicted value of 119 [ref 5]. This ~20 fold decrease has a P value < 10-42 under the assumption of neutral evolution. We can essentially be sure that some form of negative selection is active on these regions. An estimate of their importance can be made by considering that the conservation is considerably higher in these regions than in coding exons.

A similar study was carried out by Woolfe et al [ref 6]. In their study the human genome was compared to the puffer fish. The rationale behind the comparison was that due to the length of time since the divergence of the two species any common ncDNA conservations would imply a mechanism essential for all vertebrate organisms. The group found over 1400 regions in excess of 500bp where there was a 90% conservation between the genomes. Most of these were found in the vicinity of genes that regulate development. The group therefore concluded that these non-coding segments help regulate the expression of these genes, either as conventional enhancers/silencers (see below) or by some mechanism involving transcribed but untranslated RNA (active RNA, see section 3)

Some of these conserved ncDNA segments almost certainly have functions we are already familiar with, such as enhancers and silencers which allow the binding of transcription factors (proteins) to the chromosome which subsequently cause the transcription of particular genes to be (as the names suggest) enhanced or silenced. It seems however that many more may have functions that had not previously been considered. In the next section we will look at the range of functions that have been proposed.

It should be noted that there are examples in the literature that suggest that the conservation of a DNA sequence does not imply a function. Nobréga et al removed two so called 'gene deserts' of the order of 1Mbp from the mouse genome, within which were over 1000 segments > 100bp that were conserved between humans and mice with a fidelity > 70%. They were subsequently unable to distinguish these mice from unaltered controls by any organism level parameters [ref 7]. On the other hand Martens, Laprade and Winston are one example of a research group who have demonstrated the existence of a non-coding gene that regulates the expression of an adjacent coding gene [ref 8]. The mouse results suggest that a significant portion of the conserved DNA may be dispensable, though the group do not rule out a phenotypic effect that was undetectable. However, a mild phenotypic effect does not sit well with the idea that these sequences have been conserved as a result of their absolute necessity. Another possibility is that the sequence in question is so essential that there is significant redundancy built into the genome. Until more experiments are carried out there is no way to estimate with any certainty the proportion of ncDNA that has an important function. Any demonstration that ncDNA is in fact junk would preferably come with a proposition as to why this conservation effect is observed.

As an aside we should consider the aesthetic argument. While providing no evidence for the functionality of ncDNA the aesthetic belief that the genome should be an efficient design has been important in persuading researchers to look for functions. As so often in science, research has been driven faith in the beauty of the answer.

3. The functions of ncDNA

A large number of different functions have been proposed for ncDNA. We will briefly describe the well understood examples tRNA and rRNA and will not treat genetic introns in great detail as these are described in "Contribution of introns to protein diversity" by Adrian Groves [ref 9].

3.1. Transfer and ribosomal RNA

Transfer and Ribosomal RNA are two well understood forms of RNA that do not code for proteins. Transfer RNA serves the align a particular amino acid with a specific 3bp section (codon) of mRNA in the process of translation. As the name suggests ribosomal RNA is a constituent part of the ribosome, which carries out the process of translating mRNA into a polypeptide chain. To produce these types of RNA one requires the corresponding sequence of DNA in the genome, which according to the definitions in this work are technically ncDNA. However, these examples are not of great interest to us here.

3.2 Introns

A conventional protein coding gene is generally composed of exons (those parts of the sequence that specify amino acids) and introns (non-coding regions separating the exons). It is known that transcribed mRNA has the introns spliced out of the sequence before it is translated. Several functions have been proposed for the introns. Some are assumed to simply be parasitic or junk DNA that serves no purpose at the level of the organism. They almost certainly have a role in the preservation of important genes by reducing the likelihood that recombination will split exonic regions. By separating exons the possibility also emerges of 'alternative splicing' whereby exons can be combined in different combinations to code for multiple proteins, increasing the compression of information in the genome.

3.3 Active RNA

It has been speculated that RNA might play an active role in many more cell processes than had previously been believed. As mentioned before, to a good approximation RNA was presumed to act only via its translation into proteins. The discovery of RNA interference (RNAi) has widened debate regarding RNA functions. There is now experimental evidence of naturally occurring RNAi [ref 10] and many researchers expect to find that it plays a central role in the natural regulation of gene expression. As described before, Martens et al located a non-coding gene which seemingly regulates a nearby gene by the simple act of being transcribed. In 1993 R.Lee et al discovered the first functional microRNA (miRNA) [ref 11]. These are small (20-25bp) strands of RNA that perform regulatory tasks in the cell. Some operate in a similar fashion to RNAi by binding and degrading complementary mRNA. Others are found to regulate gene expression without degrading mRNA and thus are assumed to interact with translation apparatus or directly with proteins in the cytoplasm.

Some miRNA are also produced as the 'waste' from splicing introns out of mRNA. Functions have also been proposed for this 'waste', from the regulation of genes to more mundane operations such as permitting the removal of the next intron in the mRNA sequence.

miRNAs had been ignored for several reasons. Firstly they form a small fraction of untranslated RNA (~95% is tRNA and rRNA). Secondly they had been assumed in many cases to be 'waste' as described before.

3.4. Structural Elements

We will briefly note that the function of some DNA is purely structural. As mentioned before these include the telomeres. It is also presumed to be true of the centromeres, the regions where the two mitotic chromosomes are joined during metaphase. The evidence for the non-transcriptional nature of this region comes from the observation that it remains in a heterochromatin state, whereas transcribed regions tend to form euchromatin when being read. It is also found to consist of a large number of repeat sequences in a similar fashion to the telomeres.

4. Conclusion

That ncDNA is an important, and perhaps the important, part of the genome is now the orthodoxy in genetics. There is a constant stream of papers that show a correlation between the presence of a non-coding gene and the expression of one or several coding genes. Most current research has focussed on these correlations rather than the actual mechanism behind the regulation, but it seems likely that transcribed ncRNA interacts with the cell machinery in a variety of ways. RNAi is one relatively well understood mechanism by which it can do this and one that is now an important part of the geneticists toolkit. The importance of ncDNA means that the task of understanding the genome may be several orders of magnitude greater than previously thought.

5. References

  1. Gibbs W.W. "The unseen genome: gems among the junk", Scientific American, 289(5): 46-53, November 2003.
  2. J. Mattick. "The Functional Genomics of Noncoding RNA", Science, 309: 1527-1528, September 2005
  3. G. Bejerano et al. "Ultraconserved Elements in the Human Genome". Science 304: 1321-1325, May 2004.
  4. G. Bejerano et al. "Ultraconserved Elements in the Human Genome". Science 304: 1321-1325, May 2004.
  5. G. Bejerano et al. "Ultraconserved Elements in the Human Genome". Science 304: 1321-1325, May 2004.
  6. Woolfe, A., et. al. "Highly conserved non-coding sequences are associated with vertebrate development". PLoS Biology 3:e7, January 2005.
  7. A. Nobrega, Y. Zhu, I. Plajzer-Frick, V. Afzal and E.M. Rubin (2004) "Megabase deletions of gene deserts result in viable mice", Nature, 431: 988-993.
  8. J.A. Martens, L.Laprade and F.Winston, "Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene". Nature, 429, 571-574 (3 June 2004)
  9. Adrian Groves. "Contribution of introns to protein diversity", LSI DTC 2005
  10. P.A. Sharp, "The biology and possible uses of short RNAs". Speech to the New York Academy of Sciences, October 27, 2003
  11. Lee RC, Feinbaum RL, Ambros V (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14, Cell, 75(5): 843-854.