[Previous] | [Next]

A Natural System
Difficulties in
Classifying Microbes

Molecular Phylogeny
Molecular Ecology

Search | Send us your comments

Molecular Phylogeny

©2000 Gary Olsen, University of Illinois

Although the morphologies and physiologies of prokaryotes are much simpler than those of eukaryotes, there is a large amount of information in the molecular sequences of their DNA, RNAs and proteins. 15 Thus, it is possible to use molecular similarities to infer the relationships of genes, and, by extension, to learn the relationships of the organisms themselves.

DNA-DNA Hybridization

One important technique for comparing prokaryotes at the molecular level is DNA-DNA hybridization. In this test, the genomic DNA from one species is mixed with the DNA from a second species and the similarity of the DNAs is reflected in the extent to which strands of DNA from one organism anneal with strands of DNA from the other organism. The sensitivity of DNA-DNA hybridization declines rapidly as the organisms become more diverged,l6 limiting the method to characterization of closely related strains, species and genera. In addition, testing the relationships of a new organism can require many hybridizations. If no close relative is found by this test, we have only learned what the new organism is not - whereas we wish to know what it is.

Phylogenies Based on Specific Genes

DNA-DNA hybridization gives a measure of relatedness across the whole genome. As mentioned above, the average similarity falls off rapidly when looking at more diverged species. In principle, the range accessible to molecular analysis could be increased by looking at specific genes with above average conservation.l7 There are several important requirements if we wish to use a gene phylogeny to infer organismal relationships:

  1. The gene must be present in all organisms of interest. Thus, to infer relationships that span the diversity of prokaryotes (or life), we must look at the central (universal) cellular functions. Examples include genes whose products function in replication, transcription, or translation - the processes constituting the "Central Dogma" of molecular biology.
  2. The gene cannot be subject to transfer between species (lateral transfer). Since we wish to infer organismal relationships, if a gene is transferred, then the gene history is not the same as the organismal history. If a gene performs a central function, an organism is unlikely to acquire a copy by lateral transfer, since the organism must already have a functional copy to be alive.
  3. The gene must display an appropriate level of sequence conservation for the divergences of interest. If there is too much change, then the sequences become randomized, and there is a limit to the depth of the divergences that can be accurately inferred. If there is too little change (if the gene is too conserved), then there may be little or no change between the evolutionary branchings of interest, and it will not be possible to infer close (genus or species level) relationships.
  4. The gene must be sufficiently large to contain a record of the historical information. Thus, although transfer RNA (tRNA) genes are present in all species, they are too small (about 75 nucleotides) to provide an accurate sample of evolutionary history.

Ribosomal RNA Genes and Their Sequences

To infer relationships that span the diversity of known life, it is necessary to look at genes conserved through the billions of years of evolutionary divergence. An example of genes in this category are those that define the ribosomal RNAs (rRNAs). Most prokaryotes have three rRNAs, called the 5S, 16S and 23S rRNA.

Ribosomal RNAs in Prokaryotes
NameaSize (nucleotides)Location
5S120 Large subunit of ribosome
16S1500 Small subunit of ribosome
23S2900 Large subunit of ribosome
a The name is based on the rate that the molecule sediments (sinks)
in water. Bigger molecules sediment faster than small ones.

The 5S has been extensively studied, but it is usually too small for reliable phylogenetic inference. The 16S and 23S rRNAs are sufficiently large to be quite useful.l8

The extraordinary conservation of rRNA genes can be seen in these fragments of the small subunit (16S) rRNA gene sequences from organisms spanning the known diversity of life:


As a graduate student at the University of Illinois, Bernadette Pace used the annealing of rRNA with genomic DNA to measure the similarity of rRNAs in various species.19 These experiments demonstrated that rRNA-based methods are applicable to directly comparing a broader range of organisms (i.e., spanning greater phylogenetic distances) than is whole genome DNA-DNA hybridization. However, as with DNA-DNA measurements, it was necessary to have DNA and/or RNA from each species of interest.

If relationships were analyzed by comparing sequence data, rather than hybridizing the molecules, one could infer relationships without having all of the molecules in hand (only the sequence data from previous studies are necessary). This was already being done with protein sequences.20

Carl Woese recognized the full potential of rRNA sequences as a measure of phylogenetic relatedness. He initially used an RNA sequencing method that determined about 1/4 of the nucleotides in the 16S rRNA (the best technology available at the time). This amount of data greatly exceeded anything else then available. Using newer methods, it is now routine to determine the sequence of the entire 16S rRNA molecule. Today, the accumulated 16S rRNA sequences (about 10,000) constitute the largest body of data available for inferring relationships among organisms.

Molecular Phylogenies can Reflect Genealogy and Amount of Change

By comparing the inferred rRNA sequences (or those of any other appropriate molecule) it is possible to estimate the historical branching order of the species, and also the total amount of sequence change. An example of a 16S rRNA-based phylogenetic tree showing the three (identified) Domains of lifeóBacteria, Archaea and Eucarya21óis below. In this tree, lineages diverge from a common ancestral lineage on the far left. The lengths of the individual lines reflect the amount of sequence change (note that some lineages have modified the gene sequence substantially more than others, and thus have accumulated longer total branch lengths).

[Previous] | [Next]

frontierlogo picture This page was last built with Frontier and Web Warrior on a Macintosh on Thu, Sep 21, 2000 at 1:09:14 PM.