Rapid DNA sequencing: Technology addressing new problems, solving them, and handing back entire new visions
- Ernest Retzel, National Center for Genetic Research
DNA sequencing is the process of determining the content and order of the G, A, T and C "bases" in the genome of an organism. sometimes called "the code." So, what's the big deal about DNA sequencing? They have known about DNA since the '50s, and they have been doing DNA sequencing since the mid-70s. It's even in text books!
Indeed, we have been doing sequencing at one level or another since I was a graduate student. There are over 300 billion bases of DNA sequence data in GenBank, the U.S. national archive of sequence information, representing species from bacteria and viruses to humans, from arachnids to zebra. Researchers thought that, if we just mined that data, we would probably have ten years of work ahead of us.
In the last two years, however, there have been developments in the technology of DNA sequencing that have changed everything. At one point in my graduate career, generating 140 bases of sequence data took six months with all the bench work that accompanied the preparation. DNA sequencing evolved with what seemed like amazing speed over the following 25 years of applying it to biological problems. Then everything changed � in the past two years, truly new technologies for performing sequencing have been developed. We can now generate 2.5 billion bases of DNA sequence data in less than a week, and a majority of that time is spent on computer analysis.
There are many ideas in biology that "Changed Everything." This can be said a lot less frequently of technologies. DNA sequencing changed everything early on, genes and genomes became accessible. A technology known as the polymerase chain reaction (PCR) changed everything, allowing miniscule amounts of DNA to be amplified. TV shows like CSI demonstrated to the public how powerful it could be. But high-throughput DNA sequencing changed everything in a way that I have not seen in my career. Suddenly, a full genome is accessible to almost every researcher in not very much time and for a relatively small amount of money. The first human genome took 13 years to complete, hundreds (if not thousands) of people working on it, and hundreds of millions of dollars. The work was beyond arduous; the complexity of the project was almost unimaginable.
By contrast, there is now a major program being pushed forward to develop the technology to deliver the sequence of a human genome for $1,000. Not waiting for that, but certainly hoping it will happen, there is the "1,000 Genome Project", seeking to completely sequence a thousand individuals worldwide to understand everything from evolution of humankind to the differences between each of us.
That is exciting, even mind-boggling, when you understand not only the possibilities, but also the scale of that data and the scope of the analysis. Our biology problems are suddenly looking like astrophysics problems in terms of scale. A sequencing run starts with a terabyte (TB) of raw data (TB= 1,000 gigabytes), is reduced to a few gigs of sequence data, and the analysis generates about 300 gigs of information. It doesn�t fit well in a spreadsheet. Each machine we have in the lab generates that much data in two days. And we have six machines just at NCGR.
"But wait, there's more!!"
The idea of a "personal genome" is now within reach, and there is in fact the Personal Genome Project. As cool as that is, there is so much more we can do now. There are only 20,000-30,000 genes in most animals and plants, and a lot of the genome (generally 90% or more) is not accounted for in genes. We used to refer to this as "junk DNA." In the last couple years, because of the deep sequencing we can obtain with the new technologies, we have found that over 95% of the human genome is biologically important and useful. We just don't know what all of it does yet, but we know it happens. Whole new classes of RNA molecules have been defined. It has been shown that there is a dynamic process occurring between these newly discovered classes of molecules and the RNA molecules that code for proteins that define how things are controlled in a cell.
On a whole different topic, we can now take an tissue that is infected with a virus or a bacteria, and see what happens in the process of the infection, what host genes are turned on and when, what viral genes are turned on, and in what order. And we look at them ALL at the same time, in the same sequence-based snapshot of an infection. We have taken plants that have been studied for years, whose genome sequence has been explored in detail, and we have discovered areas that are only expressed in certain tissues at a very specific time in plant replication. In some areas, plants make excellent models even for humans. You might not think that plants have a lot in common with humans, but the replication process is similar in many respects, and we can study mutations made in plant genes without going to jail!
With this depth of potential understanding of the genome, I have noticed that my colleagues have begun talking about not just gene insertion or breeding but to begin engineering plants for extremely complex characteristics. Most recently, this has arisen from plant studies related to bioenergy and biofuels, where we talk about how to increase the levels of certain traits (sugars for fermentation and oils for biodiesel) while modifying the structural characteristics that sequester those products (reducing lignin in trees, for example).
Beyond this, there is an entirely new science of metagenomics. A bit of background: first, over 99.9% of the microbial life on the planet remains completely unidentified, largely because we are not able to grow them in the laboratory, we have not identified the nutritional requirements of these organisms in a way that we can mimic their growth environment. Second, these organisms frequently create what you might call a meta-organism, many organisms living in balance within an environment. That environment might be a soil sample, or an intestinal tract or mouth, a hot spring or an ocean. Small changes in those environments cause shifts in the population; for example, shifting the temperature or the carbon dioxide level over a plot of earth can cause a shift in representation of organisms that are present in the soil. The sensitivity and immense output of even our current sequencing technologies lets us take a sample of those environments, and even though we can't culture those organisms, we can explore the families they likely belong to by sequencing their metagenome, or the aggregate DNA from the pool of organisms.
Everything has changed. The possibilities and questions are endless. There are important questions about the ethics and privacy of genomic information and about the genetic engineering of plants and animals that need to be resolved. Beyond those questions, though, is a goldmine of understanding of the natural world.
It has been a circuitous path to get where I have gotten, influenced perhaps more by serendipity than I should admit to. At times, I have made choices very deliberately, and at other times, not so much. In this bio, I will concentrate more on the part of my life closest to your lives, the choices through high school and college, and less about my later career. I grew up in Detroit, Michigan, a city not as well known for science as for automobiles and trade unions. And I did indeed grow up in the city, rather than the suburbs. At the time, Detroit was the fourth largest city in the country, and was largely a blue-collar world, vibrant in ways that are hard to define.
Read More...
-
Albuquerque
Café
Oct 15
6:30 - 8:00 PM
Center for High Technology Materials Bldg. (CHTM)
Discovery
Oct 29
6:30 - 8:00 PM
Center for High Technology Materials Bldg. (CHTM)
-
Española/Pojoaque
Café
Oct 8
7:00 - 8:30 PM
NNMC AD 101/102
Discovery
Oct 29
7:00 - 8:30 PM
NNMC GE 204/205
Nov 12
7:00 - 8:30 PM
NNMC GE 204/205
RSVP to M Scher Dow
-
Los Alamos
Café
Oct 9
7:00 - 8:30 PM
Bradbury Science Museum
Discovery
Nov 6
7:00 - 8:30 PM
Bradbury Science Museum
-
Santa Fe
Café
Oct 16
7:00 - 8:30 PM
Santa Fe Complex
Discovery
Oct 30
7:00 - 8:30 PM
National Center for Genomic Resources
RSVP to M Scher Dow
Nov 13
7:00 - 8:30 PM
RSVP to M Scher Dow
-
Café Presentation
pdf
Rapid DNA sequencing: Technology addressing new problems, solving them, and handing back entire new visions
Genomes for All "Next-generation technologies that make reading DNA fast, cheap and widely accessible are coming in less than a decade. Their potential to revolutionize research and bring about the era of truly personalized medicine means the time to start preparing is now" - Church, George M., Scientific American January 2002: 46-54
