Thursday, June 14, 2007
DNA sequencing has revolutionized biology. Not too long ago obtaining the sequence of an individual gene was a big deal. Now sequencing entire genomes is becoming routine. To obtain the sequence of an organism's entire genome, the genome is broken into fragments and many many fragments are sequenced individually using a PCR based method. In the standard process, the sequence generated from each fragment is about 500 to 1,000 bases long. To generate the sequence of the genome, computers are used to align all of the data and identify contiguous sequences that can then be assembled. In order to be confident in the data (errors are made in the process), the entire genome is sequenced multiple times. At least 4 to 5 x coverage is considered necessary for good data.
A typical bacterial genome of 3.5 mega-bases sequenced to 5x coverage amounts to over 21,000 individual sequencing runs. For organisms (such as humans) which have multiple chromosomes each chromosome is sequenced individually. With about 3 billion bases in the human genome and 10x coverage, the data contained in the human genome project represents an enormous amount of sequencing and alignment effort.
A relatively new innovation called 454 sequencing has sped up the process. 454 is still based on PCR but it is much faster as the sequence is read as the PCR progresses. The trade off to the increased speed is that each individual read is somewhat shorter making assembly more challenging.
One 454 application that is becoming more and more widespread is the sequencing of multiple strains of a species of bacteria. In this type of application the genome of one strain is already known. The assembly of the genome of each successive strain is facilitated by the use of the original strains sequence as a template.
454 sequencing was used to determine James Watson's genome. The whole genome was assembled in two months at a cost of $1 million. The assembly was almost certainly accelerated by aligning short reads of Watson's DNA with the pre-existing human genome sequence.
image from the tutorial found on 454 life sciences web page