400 likes | 626 Views
Hierarchical Assembly of Genome Sequence. Clone small inserts into plasmids. Sequence inserts. Transition from maps to continuous—contiguous— sequence. Hierarchical Assembly- assemble BAC sequence, then integrate the sequence data using map information. Refine with new data and re-assemble.
E N D
Hierarchical Assembly of Genome Sequence Clone small inserts into plasmids Sequence inserts
Transition from maps to continuous—contiguous— sequence Hierarchical Assembly- assemble BAC sequence, then integrate the sequence data using map information. Refine with new data and re-assemble So how do we know we have the right BACs? Break down puzzle to BAC sized fragments
How do you order a set of BACs to sequence a genome? **Sequence-Tagged Site- (STS) DNA sequence itself as the marker for overlap of DNA fragments Compare sequence content by PCR
Ordering BACs • Sequence-tagged site (STS) • Important concept • BACs that share a STS must overlap • BACs that share two STS share all DNA between the STS • STS can integrate linkage maps and physical maps • Correct positioning of STS is therefore also very important!
Polymerase Chain Reaction (PCR) • What? Sequential amplification of specific region of the genome • Why? Specific yet extremely sensitive. Many uses in analysis of complex genomes, detection of specific sequences in very small samples • How? (steps) • Denature DNA into single strands • Anneal primers to specific locations • Extend/amplify specific region http://www.dnalc.org/resources/animations/pcr.html
How do you order a set of BACs to sequence a genome? **Sequence-Tagged Site- (STS) DNA sequence itself as the marker for overlap of DNA fragments Compare sequence content by PCR
DNA Science 2nd ed. Micklos and Freyer 2003 Genome Sequencing methods BAC ordering critical! Ordering BACs concept of "Minimum tiling path" for efficient sequencing.
Hierarchical versus Shotgun sequencing Summary of Hierarchical or “map-based” approach to genome sequencing and assembly: Genetic and physical maps Generate STS from map loci Align BACs using STS, identify minimum tiling path set of BACs Subclone BACs into plasmids Sequence plasmids Assembly (like DNA Facility example) Clean up/fill in gaps
Alternative methods to order BACs • BAC fingerprinting • requires excellent lab work • Can incorporate sequenced BAC data (virtual restriction maps) • BAC end sequencing • enormous sequence effort required • requires STS mapping to place contigs on chromosomes • Complementary- can be combined
BAC fingerprinting to order BACs • - Digest BAC with restriction enzyme • Precisely measure all fragment sizes by gel electrophoresis • Compare fragment content among BACs • BACs with multiple common fragments (patterns) overlap DNA Science 2nd ed. Micklos and Freyer 2003
Analyzing cut DNA patterns using Gel Electrophoresis How is DNA moved through the gel? Bromphenol blue/xylene cyanol migration dyes Polyacrylamide gels
Restriction enzymes cut DNA Why are restriction enzymes useful? How do they cut DNA? Specificity of recognition sequence Predictability of cutting patterns
Restriction enzymes cut DNA Specificity of recognition sequence Predictability of cutting patterns Naming of restriction enzymes Calculation of cut frequency in DNA: 1/4n
Screening Clone Libraries Once libraries of genomic clones are built, they need to be screened to find a specific fragment in a particular clone. Most screening methods rely on the facts that 1) even relatively short specific DNA sequences are unique in large collections of DNA fragments 2) that specific sequences can be identified by complementary base pairing—hybridization(or annealing)understringent conditions. • Frequency of occurrence of specific DNA sequences • ***Relates to non-repetitive DNA only***
Coverage of the Genome: make “Libraries” of clones How to completely cover the genome with enzymes that cut at specific places??- use Partial Digestion Also possible to use random shearing and end repair
Alternative methods to order BACs • BAC end sequencing to generate new STS and contigs • enormous sequence effort required (10x genome!) (So why was this done? I.e., why was it “easy” to do? Hint: think about STS concept…) • use of known BAC size in “double shotgun” approach Common BAC backbone sequence New sequence With enough BAC end sequence, can create new contiguous sequence
Connecting BAC assemblies to the genome… • BAC fingerprinting • BAC end sequencing • Both require STS mapping to place contigs on chromosomes- could also use FISH • Complementary- all methods can be combined and are to generate excellent physical “sequence-ready” maps HTR2A RB1
How to put pieces together? • “chromosome walking” to sequence a region of a chromosome: • Example; sequence region between two genes in the pig HTR2A RB1
Hierarchical versus Shotgun sequencing Text page 89 says Shotgun sequence alignment criteria: “40 base pair overlap with no more than 6 bp different to align DNA fragments” What are the assumptions here? Is an alignment with that criteria expected to be unique?
Shotgun sequence assembly-Contigs and Scaffolds Tri-nucleotide repeat End seq of a 10 kBp clone- defines spacing- helps assembly
Newest methods to sequence genomes • For bovine, horse, chicken and pig, the methods used are quite similar to the public method as a good linkage map and physical map of the BACs was used. • But clear that a combination of mapped-based approach and WGS approach is optimal • Individual BACs are sequenced based on minimum tiling path- decrease sequencing effort but enormous effort up-front… • WGS data can help where BAC data is poor • Random clones selected so easy to start work
Example: Pig Genome Sequence • Recently completed • Combination of methods • $10 million in sequencing costs • paid by USDA to Sanger in UK • sitting on top of large genetic and physical mapping efforts!! • Slides from Alan Archibald, one of main organizers of the project
The Pig Genome Sequencing Project Alan Archibald The Roslin Institute and R(D)SVS University of Edinburgh
A sequenced genome is a requirement for a 21st Century biomedical model organism
Hybrid Shotgun Sequencing Strategy Minimal set of overlapping BACs selected from physical map Whole- genome shotgun reads BAC shotgun reads Sequence assembly Assemble clone sequences to represent chromosomes and annotate using Ensembl automated pipeline Combine overlapping whole-genome and BAC-derived reads
Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN
Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN
Sequence clone progress 21/12/09 • Clones sequenced cover 95.97% of the physical map • 16,707 sequenced clones with 15,286 at Improved status • Total sequence = 3.014Gb (123.9 Mb Finished quality)
Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN
Add WGS data Same Duroc individual as CHORI-242 BGI • 66.5 Gb of sequence (24-fold) • Read length: 44 WTSI • ~40 Gb of sequence (14-fold) • Read length: 108
Illumina Assembly – using contigs Spanners WGS contig Clone contig Clone contig Hangers WGS contig Clone contig
Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN
Build 10 - Strategy Physical Map BAC sequencing Assemble BACs Integrate with Illumina Assembly Build AGPs (sort order & orientation) Annotation Check SNP panel Wageningen Check SNP panel Wageningen Integration of community resources WTSI Transcriptomics Expression Diversity WTSI TGAC BAC-ends, genetic markers, cDNA TGAC BGI TGAC INRIA Annotation NNNNN SNP (G->A) NNNNN NNN Illumina contig NNN NNNNN
High density SNP genotyping chip August 2009
SNP genotyping • Traceability • Pigs • Clones • Cells • Inbred pigs • How inbred are they?