1 / 39

Hierarchical Assembly of Genome Sequence

Hierarchical Assembly of Genome Sequence. Clone small inserts into plasmids. Sequence inserts. Transition from maps to continuous—contiguous— sequence. Hierarchical Assembly- assemble BAC sequence, then integrate the sequence data using map information. Refine with new data and re-assemble.

dafydd
Download Presentation

Hierarchical Assembly of Genome Sequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Assembly of Genome Sequence Clone small inserts into plasmids Sequence inserts

  2. Transition from maps to continuous—contiguous— sequence Hierarchical Assembly- assemble BAC sequence, then integrate the sequence data using map information. Refine with new data and re-assemble So how do we know we have the right BACs? Break down puzzle to BAC sized fragments

  3. How do you order a set of BACs to sequence a genome? **Sequence-Tagged Site- (STS) DNA sequence itself as the marker for overlap of DNA fragments Compare sequence content by PCR

  4. Ordering BACs • Sequence-tagged site (STS) • Important concept • BACs that share a STS must overlap • BACs that share two STS share all DNA between the STS • STS can integrate linkage maps and physical maps • Correct positioning of STS is therefore also very important!

  5. Polymerase Chain Reaction (PCR) • What? Sequential amplification of specific region of the genome • Why? Specific yet extremely sensitive. Many uses in analysis of complex genomes, detection of specific sequences in very small samples • How? (steps) • Denature DNA into single strands • Anneal primers to specific locations • Extend/amplify specific region http://www.dnalc.org/resources/animations/pcr.html

  6. How do you order a set of BACs to sequence a genome? **Sequence-Tagged Site- (STS) DNA sequence itself as the marker for overlap of DNA fragments Compare sequence content by PCR

  7. DNA Science 2nd ed. Micklos and Freyer 2003 Genome Sequencing methods BAC ordering critical! Ordering BACs concept of "Minimum tiling path" for efficient sequencing.

  8. Hierarchical versus Shotgun sequencing Summary of Hierarchical or “map-based” approach to genome sequencing and assembly: Genetic and physical maps Generate STS from map loci Align BACs using STS, identify minimum tiling path set of BACs Subclone BACs into plasmids Sequence plasmids Assembly (like DNA Facility example) Clean up/fill in gaps

  9. Alternative methods to order BACs • BAC fingerprinting • requires excellent lab work • Can incorporate sequenced BAC data (virtual restriction maps) • BAC end sequencing • enormous sequence effort required • requires STS mapping to place contigs on chromosomes • Complementary- can be combined

  10. BAC fingerprinting to order BACs • - Digest BAC with restriction enzyme • Precisely measure all fragment sizes by gel electrophoresis • Compare fragment content among BACs • BACs with multiple common fragments (patterns) overlap DNA Science 2nd ed. Micklos and Freyer 2003

  11. Analyzing cut DNA patterns using Gel Electrophoresis How is DNA moved through the gel? Bromphenol blue/xylene cyanol migration dyes Polyacrylamide gels

  12. Restriction enzymes cut DNA Why are restriction enzymes useful? How do they cut DNA? Specificity of recognition sequence Predictability of cutting patterns

  13. Restriction enzymes cut DNA Specificity of recognition sequence Predictability of cutting patterns Naming of restriction enzymes Calculation of cut frequency in DNA: 1/4n

  14. Screening Clone Libraries Once libraries of genomic clones are built, they need to be screened to find a specific fragment in a particular clone. Most screening methods rely on the facts that 1) even relatively short specific DNA sequences are unique in large collections of DNA fragments 2) that specific sequences can be identified by complementary base pairing—hybridization(or annealing)understringent conditions. • Frequency of occurrence of specific DNA sequences  • ***Relates to non-repetitive DNA only***

  15. Coverage of the Genome: make “Libraries” of clones How to completely cover the genome with enzymes that cut at specific places??- use Partial Digestion Also possible to use random shearing and end repair

  16. DNA vectors for Sequencing

  17. Alternative methods to order BACs • BAC end sequencing to generate new STS and contigs • enormous sequence effort required (10x genome!) (So why was this done? I.e., why was it “easy” to do? Hint: think about STS concept…) • use of known BAC size in “double shotgun” approach Common BAC backbone sequence New sequence With enough BAC end sequence, can create new contiguous sequence

  18. Connecting BAC assemblies to the genome… • BAC fingerprinting • BAC end sequencing • Both require STS mapping to place contigs on chromosomes- could also use FISH • Complementary- all methods can be combined and are to generate excellent physical “sequence-ready” maps HTR2A RB1

  19. How to put pieces together? • “chromosome walking” to sequence a region of a chromosome: • Example; sequence region between two genes in the pig HTR2A RB1

  20. Hierarchical versus Shotgun sequencing Text page 89 says Shotgun sequence alignment criteria: “40 base pair overlap with no more than 6 bp different to align DNA fragments” What are the assumptions here? Is an alignment with that criteria expected to be unique?

  21. Shotgun sequence assembly-Contigs and Scaffolds Tri-nucleotide repeat End seq of a 10 kBp clone- defines spacing- helps assembly

  22. DNA vectors for Sequencing

  23. Newest methods to sequence genomes • For bovine, horse, chicken and pig, the methods used are quite similar to the public method as a good linkage map and physical map of the BACs was used. • But clear that a combination of mapped-based approach and WGS approach is optimal • Individual BACs are sequenced based on minimum tiling path- decrease sequencing effort but enormous effort up-front… • WGS data can help where BAC data is poor • Random clones selected so easy to start work

  24. Example: Pig Genome Sequence • Recently completed • Combination of methods • $10 million in sequencing costs • paid by USDA to Sanger in UK • sitting on top of large genetic and physical mapping efforts!! • Slides from Alan Archibald, one of main organizers of the project

  25. The Pig Genome Sequencing Project Alan Archibald The Roslin Institute and R(D)SVS University of Edinburgh

  26. A sequenced genome is a requirement for a 21st Century biomedical model organism

  27. Hybrid Shotgun Sequencing Strategy Minimal set of overlapping BACs selected from physical map Whole- genome shotgun reads BAC shotgun reads Sequence assembly Assemble clone sequences to represent chromosomes and annotate using Ensembl automated pipeline Combine overlapping whole-genome and BAC-derived reads

  28. Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN

  29. Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN

  30. Sequence clone progress 21/12/09 • Clones sequenced cover 95.97% of the physical map • 16,707 sequenced clones with 15,286 at Improved status • Total sequence = 3.014Gb (123.9 Mb Finished quality)

  31. Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN

  32. Add WGS data Same Duroc individual as CHORI-242 BGI • 66.5 Gb of sequence (24-fold) • Read length: 44 WTSI • ~40 Gb of sequence (14-fold) • Read length: 108

  33. Illumina Assembly – using contigs Spanners WGS contig Clone contig Clone contig Hangers WGS contig Clone contig

  34. Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN

  35. Order-orientationSequence vs RH maps

  36. Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN

  37. Genomics-enabled tools

  38. High density SNP genotyping chip August 2009

  39. SNP genotyping • Traceability • Pigs • Clones • Cells • Inbred pigs • How inbred are they?

More Related