300 likes | 310 Views
Explore the demand for more bioinformatics software and the improvement of core genetics technologies, including sequencing and genotyping. Learn about the historical costs and methods used for sequencing the human genome, as well as the latest advancements in next-generation sequencing. Discover the potential impact of sequencing 1000 human genomes and the future of medical genetics.
E N D
cheap sequencing for regular Joes and Janes: the demand for more bioinformatics software Gane Ka-Shu Wong: iCORE Chair in BioSystems Informatics The University of Alberta – Biological Sciences and Medicine And also, Associate Director of the Beijing Genomics Institute
2001 2002 2005 2007 the essence of the human genome project and it’s offspring the human HapMap project was really about improving the core technologies of genetics: sequencing, genotyping
Gordon Moore $3 billion $300 million two vendors $300,000 competition historical costs to sequence the 3 billion bp of a human genome
1. fragment and denature 2. add adapters to both ends 3. one fragment per bead Roche-454 (pyro)sequencing
4. emulsion PCR amplification 5. sequencing by synthesis 6. analyze image of bead array chemiluminescent signal generation: dNTP incorporation releases PPi; sulfurylase converts PPi to ATP; luciferase converts ATP to visible light
1. fragment, denature, and add adapters 2. bind randomly to primer lawn, perform bridge amplification Illumina-Solexa sequencing
3. sequencing by synthesis, four-color labeled dNTPs 4. computer analysis of lawn image in contrast to Roche-454, the Illumina-Solexa technology generates multi-colored fluorescent signals on a randomly arrayed 2D surface
capillaries versus next generation (massively parallel) DNA sequencing
excitement about Pacific Biosciences is based on read lengths of many kb, albeit with lower base pair accuracy
BGI Offers Next-Gen Sequencing Service: Kicks Off 100-Genome Sequencing Project [8 January 2008] Knome, BGI Forge Sequencing Alliance; GATC Spins Off Personal Genomics Unit [January 15 2008] Google BGI-Shenzhen 580,000 SNPs 1 million SNPs whole genome competition for the personal genome heats up
Science: 25 January 2008 Nature: 17 January 2008 BGI-Shenzhen and allies in the US and UK will be sequencing 1000 human genomes in the next 3 years
PHENOTYPE TO GENOTYPE • cystic fibrosis CFTR disease affects less than a percent of population • breast cancer BRCA1+BRCA2 genes affect only a few percent of patients • GENOTYPE TO PHENOTYPE • functional polymorphisms identified in 1000 individuals linked to disease by association studies information of value to policy makers in public health 1000 human genomes will turn the medical genetics world upside down
Rommens JM, … Tsui L-C, Collins FS (1989). Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245: 1059-1065. Riordan JR, … Collin FS, Tsui L-C (1989). Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245: 1066-1073. Kerem B, … Tsui L-C (1989). Identification of the cystic fibrosis gene: genetic analysis. Science 245: 1073-1080. 8 September 1989 after 19 years (and 1000 genes) we have not cured a genetic disease
Maynard, I just decided that I hate your generation. You made all those promises about the human genome sequence improving health care, but my generation will have to deliver.
Prof. Maynard Olson That’s right. One of these days one of you will have to actually cure something!
The panda is a Chinese national treasure and the logo for the World Wildlife Fund. While not the first endangered species to be sequenced (chimp was first), it will be the first with a conservation focus. Whole genome shotgun assembly is non-trivial for 45 bp reads even with paired end information and 50x redundancy. Emperor’s Yan and Huang were the first rulers of ancient China, so modern Chinese say that they are descendants of YanHuang. YanHuang and the panda genome (raising awareness for the new technologies)
aftermath of 12 May 2008 earthquake in Sichuan measuring 7.9 on the Richter scale
aftermath of 12 May 2008 earthquake in Sichuan measuring 7.9 on the Richter scale
50x of paired end data using Solexa average read lengths 40~50 bp estimated scaffold sizes 10~100 kbp anchored by synteny to human first assembly by end of August ’08 graph and overlap layout based our plans for the panda genome (whole genome assembly using short reads)
redo experiments on more comprehensive population from every panda reserve, and with 1536 SNPs rather than just 9 microsatellites molecular censusing doubles giant panda population estimate in a key nature reserveZhan X, Li M, Zhang Z, Goossens B, Chen Y, Wang H, Bruford MW, Wei F.Curr Biol. 2006 Jun 20; 16(12): R451-2
There are 96 plant species with more than 20,000 expressed sequence tags (ESTs), but most are crop plants. If we count only medicinal plants, generously defined to include makers of secondary metabolites with purported health benefits, such as lycopene for tomatoes and resveratrol for grapes, there are 16 plant species with more than 20,000 ESTs. If we use a strict definition of medicinal, there are just 4 plant species with more than a mere 5000 ESTs. They are artemesia, Madagascar periwinkle, gingko, and ginseng. expressed gene sequences of 1000 medicinal plants for only $2 million
10 April 2008 – 40 Mb total from ESTs in 29 animals 27 June 2008 – 5.4 Mb total from genome of 169 birds 1/1000 of the proposed data has launched the field of phylogenomics
most effective anti-malarial in leaves of sweet wormwood synthesized by Jay Keasling $40M from Gates foundation Amyris is now into biofuels $600M to Berkeley university CYP71AV1 by x-species EST FPP pathway artemisinin: poster child for the synthetic biology investment world
OPEN SOURCING’s classic example is Linux sophisticated software (e.g. comparable sophistication in bioinformatics is whole genome shotgun assembly) that was developed by a small handful of talented programmers CROWD SOURCING alternative is Wikipedia millions of contributors each writing a small article on a specific topic; similar to much (but not all) of bioinformatics as it does not require PhDs and can be done by students a proposal to crowd source the writing of bioinformatics software
biologist with data to analyze technical specification of issues talented bioinformatics student contributions recorded on website open to prospective employers young people need a chance to prove themselves; we will provide a web based mechanism for them to do so, on a high profile international scale who will work for free and how would we incentivize them to do so
Alberta and China: where is this happening and who is paying for it BGI – Jian Wang, Jun Wang, Huanming Yang, Jun Yu UofA Biological Sciences – Michael Deyholos UofA Medicine – Andrew Mason, Richard Fedorak, Lorne Tyrrell UofA Computing Science – Paul Lu, Guohui Lin Research funding from the Alberta Informatics Circle of Research Excellence and the Government of Shenzhen Additional support from UofA Biological Sciences