1 / 94

Goals of the Human Genome Project (1990 ~) Map and sequence the 3,000 Mb human genome

Genomics Databases and Bioinformatics Applications Wailap Victor Ng Institute of Biotechnology in Medicine Institute of Bioinformatics Dept Biotechnology and Lab Science in Medicine National Yang Ming University wvng@ym.edu.tw March 22, 2005. Goals of the Human Genome Project (1990 ~)

dreama
Download Presentation

Goals of the Human Genome Project (1990 ~) Map and sequence the 3,000 Mb human genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomics Databases and Bioinformatics ApplicationsWailap Victor NgInstitute of Biotechnology in MedicineInstitute of BioinformaticsDept Biotechnology and Lab Science in MedicineNational Yang Ming Universitywvng@ym.edu.twMarch 22, 2005

  2. Goals of the Human Genome Project (1990 ~) • Map and sequence the 3,000 Mb human genome • Map and sequence the genomes of model organism • - The bacterium E. coli (4.6 Mb) • - The yeast S. cerevisiae (12 Mb) • - The roundworm C. elegans (100 Mb) • - The fruit fly D. melanogaster (180 Mb) • - The mouse M. musculus (3,000 Mb) • Collect and distribute data • Study the ethical, legal, and social implications of genetic research • Train researchers • Develop technologies • Transfer technology to the private sector • http://www.genome.gov/Pages/EducationKit/online.htm

  3. What is the total number of human genes? (Science, 2000)

  4. Milestones of Genome Projects • 1995 Haemophilus influenzae (1.83 Mb; 1,742 genes) • 1996 Saccharomyces cerevisae (12 Mb; 6,000 genes) • 1998 Caenorhabditis elegans (97 Mb; 19,000 genes) • 2000 Arabidopsis thaliana (115/125 Mb; 25,000 genes) • 2000 Drosophila melanogaster (~120 Mb; 13,600 genes) • 2001 Homo sapiens(90%; 2,900 Mb; ~30k genes) • 2002 Mus musculus (96%; 2,500 Mb; ~30K genes) • 2002 Oryza sativa L. ssp. indica (92%; 466 Mb; 46-56k genes) • 2002 Fugu rubripes (95%; 365 Mb; 33,609 genes) • 2004 H. sapiens (99% euchromatin; 2,850 Mb; 20,000-25,000 genes)

  5. Homo sapiens • Number of cells: ~1x1014 • Number of genes encoded by the genome: 20,000 – 25,000 • Number of Expressed genes per cell type: 10,000-15,000

  6. Proteome (Proteins) Transcriptome (mRNAs) Genome Complexity 25-30K genes (Human) Alternative splicing Post-translational modifications

  7. Genome Sequencing Strategies • Top-down approach • - Clone large genomic DNA fragments into special vector, • e.g. BAC (bacterial artificial chromosome) • - Create an ordered array of BAC clones • - Carry out full-length BAC clone sequencing • - Assemble the BAC insert sequences • - Identify the next BAC for full length sequencing • (Hybridization method or searching BAC end sequence library) • Bottom-up approach • - Whole genome shotgun sequencing

  8. Small insert library in plasmid Large insert library in BAC/YAC ordered cosmid library Medium insert library in cosmid Genomic DNA Top-down genome sequencing method • Method I. Systematic sequencing of ordered clones • Construct shotgun genomic library in YAC (yeast artificial chromosome) or BAC vector • Use the YAC or BAC clone DNAs to construct smaller insert shotgun cosmid DNA library (~45 kb inserts) • Multiple Complete Digest (MCD) mapping of cosmid DNAs  ordered cosmid clone library • Choose the minimal overlap set of cosmid DNA to construct shotgun libraries in M13 or plasmid vector  DNA sequencing  Assembly

  9. Multiple Complete Digest Mapping (YAC DNA) Flow chart of wet bench procedures for YAC → cosmid and BAC → cosmid MCD mapping. The main difference is that, while BAC DNA can readily be purified from bacterial chromosomal DNA, there is no good preparative method to separate YAC DNA from yeast chromosomal DNA. In the YAC case, the few percent of the cosmids that are derived from the YAC are identified by a hybridization-based colony-screening protocol. With BAC-derived cosmids, this step is unnecessary because the mapping software can readily eliminate the small number of cosmids that do not originate from the BAC. Proc Natl Acad Sci U S A. 94: 5225 (1997)

  10. Schematic representation of MCD mapping process. • Gel image. • (b) List of fragment sizes for each enzyme domain in each clone. Lanes labeled with a number identify the clone as c01 or c02. Lanes labeled with the letter M identify size markers. • (c) Three single-enzyme maps are independently constructed (Right). Synchronization across enzyme domains results in a composite map (Left). Long tick marks indicate boundaries between ordered groups of fragments; short tick marks demarcate unordered fragments within a group, arbitrarily drawn in order of decreasing size.  • Proc Natl Acad Sci U S A. 94: 5225 (1997)

  11. Gray scale image of a typical mapping gel poststained with SYBR–green I. There are five marker lanes, at positions 1, 8, 15, 22, and 29. Two clones, each independently digested with EcoRI, HindIII, and NsiI (and loaded in that order) are placed between every pair of marker lanes. Proc Natl Acad Sci U S A. 94: 5225 (1997)

  12. Representative MCD map from chromosome 7 Proc Natl Acad Sci U S A. 94: 5225 (1997)

  13. Identify neighboring BAC clones for sequencing DNS sequencing and assembly Large insert library in BAC Small insert shotgun library in plasmid Genomic DNA Top-down genome sequencing method • Method II.BAC by BAC sequencing • Choose BAC clone seeds • Construct BAC shotgun library in plasmid vector • Sequence the shotgun plasmid DNAs • Assemble the shotgun reads • Look for adjacent BAC clones for sequencing – • - By colony array hybridization or • - BAC end sequence library

  14. BAC colony array hybridization assay Array E. coli on nylon membrane and grow cells agar plate E. coli transformants Large insert library in BAC Small insert shotgun library in plasmid vector Genomic DNA Lyze E. coli colonies on nylon membrane Autoradiogram Hybridize with PCR amplified BAC end probes Fix the DNA onto nylon membrane 25x25 cm2

  15. BAC colony array hybridization BAC clone genomic DNA insert (sequenced) PCR-1 PCR-2 Restriction fingerprinting

  16. How many reads is needed to determine a genome sequence? • Usually ~8X coverage of each base pair • # reads = ( 8 x genome_size ) / (av._read_length) • e.g. Haloarcula marismortui (4,274,315 bp) • # reads = (8 x 4,274,315 bp) / (550 bp) = 62,172 sequencing reactions

  17. USB Principle of Sanger Dideoxy DNA sequencing http://genetics.nbii.gov/basic2.html

  18. DNA Primer -ddATP Taq DNA Pol -ddCTP -ddGTP -ddTTP Reaction buffer Thermocycling 2-propanol precipitation DNA analyzer Simple one step fluorescent dye-terminator DNA cycle sequencing

  19. ABI 3730 xl GATCAGGGTTACATGCTACGGCTTCACACGTCGACCCATATTAC................... Electropherogram (chromatogram) Applied Biosystems Capillary DNA Sequencer

  20. > vtraceHM023_0188.y1_096.ab1

  21. phred • Function – base calling and quality assignment • chromat files (input)  phd files (output)

  22. Example of phd file q value: numbers in middle column q = -10 log (P) q, quality value P, estimated error rates q20 1 error in 100 bases (p=0.01) q40  1 error in 10,000 bases (p=0.0001)

  23. Sequence Assembly Software • phredPhrap (Phil Green) • cap3 (Xiaoqiu Huang) • TIGRAssembler (TIGR) • ATLAS (BCM) • SPS phrap (Geospiza) • Genome Assembler (Paracel) • Celera Assembler (Celera) • BGI Assembler (BGI)

  24. Basic Functional Genomic Analysis • Gene Prediction (P: Prokaryotes; E: Eukaryotes) • - Glimmer (P) • - GenMark (PE) • - Genscan (E) • - X-grail (E) • - Fgenes (E) • - est2genome (E; EST driven prediction) • * others (http://www.cs.jhu.edu/~salzberg/appendixa.html#Gene_finders) • Gene Functional Analysis • - Blast searches • - Motif analysis • - Structure prediction and homology searches

  25. Sources of genomics databases and bioinformatics applications • Public Data Banks - NCBI, EMBL-EBI, and DDBJ • Genome Centers • - DOE Joint Genome Institute • - Baylor College of Med. Human Genome Sequencing Center • - The Welcome Trust Sanger Institute • - Washington Univ. School of Med. Genome Sequencing Center • - Whitehead Institute/MIT Center for Genome Research • - Others (www.ornl.gov/sci/techresources/Human_Genome/research/centers.shtml)

  26. NCBI Genome Resources

  27. Human Genome Resources

  28. NCBI Map Viewer Human Genome Resources

  29. Human Genome Resources

  30. Human Genome Resources

  31. Human Genome Resources

  32. Human Genome Resources

  33. Human Genome Resources

  34. Human Genome Resources

  35. Human Genome Resources

  36. Human Genome Resources

  37. Human Genome Resources

  38. Human Genome Resources

  39. Human Genome Resources

  40. Human Genome Resources

  41. Human Genome Resources

  42. Human Genome Resources

  43. Human Genome Resources

  44. Human Genome Resources

  45. Human Genome Resources

  46. Full-length cDNA

More Related