1 / 52

International Tomato Finishing Workshop

Wellcome Trust Medical Photographic Library. International Tomato Finishing Workshop. Wellcome Trust Sanger Institute April 2007. Overview. Tomato Genome Finishing Standards Document on SGN. WTSI Finishing Strategy. WTSI Finishing Pipeline.

Download Presentation

International Tomato Finishing Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wellcome Trust Medical Photographic Library International Tomato Finishing Workshop Wellcome Trust Sanger Institute April 2007

  2. Overview Tomato Genome Finishing Standards Document on SGN WTSI Finishing Strategy WTSI Finishing Pipeline Contiguous Finished Sequence to HTGS Phase 3 All bases above phred 30 Base Error rate <1:10,000 Discussion on Day 2

  3. WTSI Finishing Pipeline Established Clone Pipeline Auto- Prefinishing Shotgun Sequencing Manual Finishing QC Final EMBL Submission HTGS 3 Finishing Software

  4. Finishing Software • Main software tools used in the WTSI finishing process: • Sequence data viewed in Gap4 databases (Staden) • (assemblies created using phrap) • Read pair viewer – Orchid (Flowers) • Restriction Digest Viewer - Confirm (Attwood - WTSI) • Sequence plot viewer – Dotter (Sonnhammer) • (URLs on Handout) • Used throughout the finishing process and for final confirmation of assembly (and QC)

  5. WTSI Finishing Strategy BAC Confirmation Identify Region to be Finished Contig Order and Orientation Assessment of Gap Sizes and Type Selection of Finishing Reactions Improvement of Low Quality Sequence Confirmation of Contiguous Assembly

  6. Finishing Strategy – Getting Started BAC confirmation Confirming BAC Clone Ends Checking for overlapping BACs Identifying Region to be Finished Prevents overlaps being finished twice Confirmation of clone placement on map Resources available for BAC confirmation

  7. Identifying region to be finished in your BAC:Whether overlapping BACs are available No overlaps available Overlapping BAC available Confirm ends of your BAC and overlapping BAC (BES or sequence if available) Confirm BAC Ends (BES) Confirm status of overlap (shotgun, in finishing, finished) If overlap is not being finished by someone else Finish whole BAC insert Confirm region to be finished (who is finishing overlap) Finish 2Kb into overlap if already finished or being finished by someone else

  8. Resources available for BAC confirmation • SOL Genomics Network (SGN) • BES • Marker verification • Blast • repeats, unigenes, ESTs, markers, overlaps

  9. SGN Resources for BAC confirmationBES search

  10. Aligning BES to BAC sequence data BACs are clipped to cloning vector cutsite, dependant on the library used: SL_Mbol Library = GATC LE_HBa Library = AAGCTT

  11. Sequence Resources Searching for Finished Overlaps BLAST Match to self (bTH198L24) Match to left overlap (bTH119A16) Match to right overlap (bTH27G19)

  12. Aligning finished overlapping sequence

  13. Finishing 2Kb into existing finished overlapping BACs Overlapping sequence ends at 48435 Finished region will begin at 46435 to give a 2Kb overlap

  14. Making use of available overlaps Finished consensus spans 2 gaps reducing number of contigs to finish

  15. Sequence VerificationSearching for Expected Markers Marker size would be 1284bp Matches predicted product size for S.lycopersicum on SGN

  16. Finishing Strategy BAC Confirmation Identify Region to be Finished Contig Order and Orientation Use read pair information to order and orientate contigs Plasmid inserts typically 4-6Kb Both strands sequenced to give read pairs

  17. Left Clone End Contig Order and Orientation Shotgun Sequencing Double Stranded Sequencing Vector (pUC) Forward Strand Reverse Strand Inserted sequence (BAC) Assembled Sequence Contigs Forward and Reverse Read pairs 4-6Kb apart in assembly Sequence Gap Look at read pair information across gaps to order contigs Right Clone End Good read pair link across gap

  18. Contig Order and Orientation Assembled sequence contigs Sequence Gap Right Clone End Left Clone End Using Read Pair information to find Assembly Problems Right Clone End Left Clone End

  19. OrchidRead pair Visualisation Tool Contiguous sequence with good read pair coverage

  20. Finishing Strategy BAC Confirmation Identify Region to be Finished Contig Order and Orientation Assessment of Gap Sizes and Type Restriction Digest Data Assess Sequence Use all available information Type of gap and size determines finishing approach

  21. Restriction Digest Data • Used in confirmation of finished contiguous assembly • Also used throughout finishing process • Sizing of gaps within BACs • use appropriate finishing strategy • Identifying assembly problems • caused by repeats • Sizing of repeats • confirming size of assembly of tandem repeats • sizing force joins made in repeats for tagging purposes

  22. Restriction Digests • Minimum of three restriction enzymes used to confirm the assembly • Selection depends on organism and the nature of the sequence • S. lycopersicum BACs are digested with • BamHI • EcoRI • HindIII • Comparison of real and virtual digest of entire BAC sequence

  23. ConfirmWTSI In-house digest visualisation tool

  24. In-house digest visualisation tool

  25. Compare fragment lengths from virtual digest in gap4 to actual fragment sizes on the gel produced in the lab Gap4 - Restriction Digest Viewer

  26. Using Restriction Digest Datato check for Assembly Problems • Identifying assembly problems from digests • mis-assemblies caused by repeats • direct repeats • Inverted repeats • All digests showing similar amount of missing data or extra data at a particular position • Possible repeat with incorrect copy number represented • Certain digests show too much data, others have missing cutsites or data missing • Possible inverted repeat in wrong orientation • Possible E.coli transposon insertion

  27. Assessment of Sequence - Dotter • Sequence plot of BAC used throughout finishing process • Check for repeats sequences at gaps • Highlight any potential areas of mis-assembly • Also used to confirm sequence overlaps • Confirm unique sequence • Not false repeat matches • Used as final assembly check • Repeats • Cross reference sizes with restriction digests

  28. Sequence plotOverlap Confirmation

  29. Sequence Plot – Assembly Check Repeat Examples Inverted Repeat Direct Repeat

  30. Sequence Plot – Assembly CheckRepeat Example

  31. WTSI Finishing Strategy BAC Confirmation Identify Region to be Finished Contig Order and Orientation Assessment of Gap Sizes and Type Selection of Finishing Reactions Improvement of Low Quality Sequence

  32. Options for Gap Closure and Improving Sequence Quality Depending on length of region or gap and associated sequence (repeat, structural problems) Resequencing of subclones across region if appropriate read length, using alternative chemistries if possible Sequence any unpaired reads which may fall in low quality region or in gap Primer walking on subclones across region or gap Direct clone walks PCR SIL or TIL Manual Editing Comment Tag for EMBL submission

  33. Gap Closure in BACs – Gap Types Un-spanned Gap Spanned Gap Re-sequencing (read pairs) Oligo walks Direct clone walks PCR Small Insert Libraries, Transposon Libraries Restriction Fragment Library Repeats Alternative Library Sizes

  34. Assembled Sequence Contigs Forward and Reverse Read pairs 4-6Kb apart in assembly Look at read pair information across gaps to order contigs Sequence Gap Left Clone End Right Clone End Good read pair link across gap Primer Walking into Spanned Gaps Assembled Sequence Contigs Primer 1 Primer 2 Good read pair links across gap Original shotgun templates Primer extended template Gap Closed

  35. Primer Walking into Spanned Gaps Assembled Sequence Contigs Primer 1 Primer 2 Primer 3 Primer 4

  36. Small Insert Library (SIL) Assembled Sequence Contigs Spanning Shotgun Template 4-6Kb insert SIL templates average 300-500bp insert Spanning subclone is shattered into smaller fragments to create a SIL. Smaller insert sizes can break up structural problems.

  37. Transposon Insertion Library (TIL) Double Stranded Sequencing Vector (pUC) Inserted sequence (BAC)

  38. Transposon Insertion Library (TIL) Double Stranded Sequencing Vector (pUC) Normal sequencing from either end of insert Read pairs ~4-6Kb apart Inserted sequence (BAC)

  39. Transposon Insertion Library (TIL) Double Stranded Sequencing Vector (pUC) Normal sequencing from either end of insert Read pairs ~4-6Kb apart Inserted sequence (BAC) Transposon randomly inserts across entire plasmid Sequence outwards from transposon insertion site

  40. TIL Read pairs overlap by 9bp duplication site Transposon Insertion Library (TIL) Double Stranded Sequencing Vector (pUC) Sequence outwards from transposon insertion site Inserted sequence (BAC) Transposon randomly inserts across entire plasmid

  41. Transposon Insertion Library

  42. Unspanned Gaps and gaps unresolved by walking on spanning subclones Assembled Sequence Contigs Resequence any unpaired reads that face into gap Partner may fall in gap, reducing gap size or may fall within other contig and span the gap.

  43. Unspanned Gaps and gaps unresolved by walking on spanning subclones Assembled Sequence Contigs Primer 1 Primer 2 Primer Sequence needs to be unique Sequence search facility in Gap4 No unpaired reads. Design oligo primers from each contig end to read into gap. Use for walking directly on BAC (clone/stock) DNA and PCR Try to find unique sequence within BAC for oligo selection

  44. Primer 1 Primer 2 Primer 3 Primer 4 Direct Clone Walks Assembled Sequence Contigs Depending on gap size (from restriction digest data) the direct clone walks may close the gap. Alternatively they may extend into the gap allowing further primers to be designed on the newly recovered sequence

  45. PCR Assembled Sequence Contigs Primer 1 Primer 2 The same principle applies to PCR. Design unique primers from each contig end to obtain a product that can be sequenced and extended with further primer walking. If confirmed to span the gap a PCR product may be shattered into a SIL but may skip out repetitive sequence.

  46. Cutsite 1 Cutsite 2 SIL from Restriction Fragmet Assembled Sequence Contigs Shatter this Fragment of Digested BAC DNA Sequence gap known to be within this fragment Alternatively a restriction fragment known to contain the missing data can be isolated from the digest gel and be made into a SIL. The fragment of interest must be distinct from other fragments on the gel and be a suitable size.

  47. Repeat Unit Size of Unit Copy Number Direct or Inverted Copies How Conserved? Lower Complexity e.g. Di-nucleotide Runs Higher Complexity e.g. LTRs Gaps and Assembly ProblemsCaused by Repeats Varying complexity of repeats depending on: Importance of visualising repeat sequence to assess repeat type Alter phrap parameters for more stringent assembly Alternative library sizes if necessary Discussion point for Tuesday

  48. Improving Sequence Quality - Summary Depending on length of poor quality region and associated sequence (repeat, structural problems) Resequencing of subclones across region if appropriate read length, using alternative chemistries if possible Sequence any unpaired reads which may fall in region Primer walking on subclones across region Direct clone walks PCR SIL or TIL Manual Editing Comment Tag for EMBL Submission

  49. WTSI Finishing Strategy BAC Confirmation Identify Region to be Finished Contig Order and Orientation Assessment of Gap Sizes and Type Selection of Finishing Reactions Improvement of Low Quality Sequence Confirmation of Contiguous Assembly

  50. Confirmation of contiguous sequence Contiguous Sequence Generated No Quality Issues Remain All Assembly checks completed Read pair coverage Dotplot Restriction Digests Identify any regions to be tagged QC check Final Submission of Finished Sequence to EMBL as HTGS Phase 3

More Related