Second tomato finishing workshop chromosome 4
Download
1 / 27

Second Tomato Finishing Workshop Chromosome 4 - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Second Tomato Finishing Workshop Chromosome 4. Tomato Project Group Wellcome Trust Sanger Institute 25th April 2008. Chromosome 4 Introduction. Data Flow at WTSI Sequencing Method Used Finishing Strategies Use of Overlapping Data Chr4 Sequence Update Discussion points for Workshop

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Second Tomato Finishing Workshop Chromosome 4' - courtney


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Second tomato finishing workshop chromosome 4
Second Tomato Finishing WorkshopChromosome 4

  • Tomato Project Group

  • Wellcome Trust Sanger Institute

  • 25th April 2008


Chromosome 4 introduction
Chromosome 4 Introduction

  • Data Flow at WTSI

  • Sequencing Method Used

  • Finishing Strategies

  • Use of Overlapping Data

  • Chr4 Sequence Update

  • Discussion points for Workshop

  • Unmapped BACs

  • Examples of Problem Clones

  • Dealing with Large Repeats


Uk chromosome 4
UK - Chromosome 4

  • Gene space estimate for Chromosome 4 is 19Mb

  • Mapping, sequencing and finishing at Wellcome Trust Sanger Institute (WTSI)

  • BAC by BAC sequencing approach

  • Approximately 200 BACs

  • Funding at WTSI ends October 31st 2008


Overview of wtsi clone pipeline
Overview of WTSI Clone Pipeline

  • Clone Selection and Verification

  • Clones entered into pipeline

Mapping

BACs assigned to chr4

sequencing project

on SGN BAC registry

  • Clone DNA Prep

  • Digest Confirmation

  • Library Construction (plasmid)

Subcloning

  • Plasmid Prep

  • Sequencing & Processing

Sequence Contigs

>2Kb available on

Sanger FTP site and

Public Databases

“Sequencing in

Progress”

Shotgun

Sequencing

HTGS

Phase 1

  • Sequence Improvement

  • Contig Orientation and Gap Closure

  • Confirmation of Assemby (QC)

Finishing

HTGS

Phase 2

  • Sequences Uploaded

  • to SGN

  • BAC Registry Updated

Finished Sequence

Final EMBL submission

“Complete Sequence”

HTGS Phase 3


Clones selection and verification
Clones Selection and Verification

  • BACs selected primarily from the

  • HindIII (LE-HBa-) and MboI (SL_MboI) libraries

  • Using Seed BACs from SGN,

  • end sequence alignment and FPC analysis

  • New BACs selected from in house overgoes for markers

  • Selected 5 clones from the fosmid library

  • based on end sequence alignments and fingerprints


Plasmid prep and shotgun sequencing
Plasmid Prep and Shotgun Sequencing

  • Optimised for 384 well prep and sequencing

  • Capillary Sequencing

  • AB3730’s with AB Big Dye Terminator

  • pUC118 Double Stranded Sequencing Vector

  • 4-6Kb inserts, double end sequenced

BACs

Aim for 6x-8x Coverage

Average Insert ~100-150Kb

(LE_HBa- and SL_MBol- Libraries)

2x or 3x 384 plates per BAC

~750 paired end reads

~1500 reads in total

Average 10-15 contigs

Fosmids

Average Insert ~35Kb

1x 384 plates


Clone finishing
Clone Finishing

Gap4 (Staden) used to view and manipulate sequence data

  • Sequence Improvement

Manual Finishing

QC Checking


Manual finishing of bacs bacs viewed in relation to map

BACs are viewed in relation to the mapped minimal tile path

Use in house tpf visualisation tool e.g. ctg503

Manual Finishing of BACsBACs viewed in relation to map


Use of overlapping sequences
Use of Overlapping Sequences

  • From Minimal Tile Path the region finished in each clone depends on the order the clones enter finishing

  • Finish unique sequence with a 2000bp overlap between clones

BAC1

BAC4 = gap closure

BAC2

BAC3

= total BAC insert

= finished region

Final order and orientation of finished BACs are given in the AGP file

e.g. BAC1-BAC2-BAC4-BAC3


Summary of clone gap closure strategies
Summary of Clone Gap Closure Strategies

  • Make use of paired ends to order and orientate contigs

  • Identify whether gaps are spanned or unspanned – orchid example

  • Identify any repeats associated with gaps – dotter example

  • Estimate gap sizes using restriction digest data

  • This will determine appropriate strategy for gap closure e.g.

    • primer/oligo walking into regions of low quality or gaps spanned by paired end reads

    • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps)

    • Use of alternative chemistries where appropriate

      • structural problems, mono- & di-nuclotide runs


Orchid read pair visualisation tool
OrchidRead pair Visualisation Tool

Contiguous sequence with good read pair coverage


Visualising repeats associated with gaps
Visualising Repeats associated with gaps

Inverted Repeat

Direct Repeat


Restriction digests
Restriction Digests

  • Minimum of three restriction enzymes used to confirm the assembly

  • Selection depends on organism and the nature of the sequence

  • S. lycopersicum BACs are digested with

    • BamHI

    • EcoRI

    • HindIII

  • Comparison of real and virtual digest of entire BAC sequence


  • Confirm wtsi in house digest visualisation tool
    ConfirmWTSI In-house digest visualisation tool



    Clone gap closure strategies
    Clone Gap Closure Strategies

    • Make use of paired ends to order and orientate contigs

    • Identify whether gaps are spanned or unspanned – orchid

    • Identify any repeats associated with gaps – dotter

    • Estimate gap sizes using restriction digest

    • This will determine appropriate strategy for gap closure e.g.

      • primer/oligo walking into regions of low quality or gaps spanned by paired end reads

      • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps)

      • Use of alternative chemistries where appropriate

        • structural problems, mono- & di-nuclotide runs


    Sequencing chemistries and additives used in finishing
    Sequencing Chemistries and Additives used in Finishing

    • 4:1 mix ratio of AB Big Dye Terminator : AB dGTP Terminator

    • used for general finishing reactions, not problem specific

    • AB dGTP Terminator

    • used for di-nucleotide runs and inverted repeats

    • Additive A (SequenceRx Enhancer Solution A - Invitrogen)

    • Dimethyl sulfoxide (DMSO)

    • Additive A+DMS0+dGTP

    • used for mono-nucloetide runs, inverted repeats

    • Sequence Finishing Kit (SFK) (TempliPhi - Amersham)

    • used to increase DNA yield

    • useful for structural problems caused by inverted repeats


    Alternative gap closure strategies
    Alternative Gap Closure Strategies

    • Specialist Subcloning

      • Small Insert Libraries (SIL)

        Double Stranded pUC or Single Stranded M13

      • Large Insert Libraries (LIL)

      • Transposon Libraries (TIL)

      • Restriction Fragment SIL (RFSIL)

    • Alternative Strategies for dealing with large repeats

    • - points for further discussion on Tuesday

    • - what repeats have other chromosomes found?


    Clone gap closure strategies1
    Clone Gap Closure Strategies

    • Make use of paired ends to order and orientate contigs

    • Identify whether gaps are spanned or unspanned – orchid

    • Identify any repeats associated with gaps – dotter

    • Estimate gap sizes using restriction digest

    • This will determine appropriate strategy for gap closure e.g.

      • primer/oligo walking into regions of low quality or gaps spanned by paired end reads

      • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps)

      • Use of alternative chemistries where appropriate

        • structural problems, mono- & di-nuclotide runs


    Use of misc feature tags in embl genbank ddbj
    Use of Misc_Feature Tags in EMBL/GenBank/DDBJ

    • Used regularly on finished sequence to identify regions of:

      • uni-directional chemistry when dGTP only

      • single subclone regions

        • including SIL and TIL only regions

      • pcr only

      • Single reads from direct walks on BAC DNA

      • data only from overlapping BACs

      • E.coli Transposon insertion sites

      • tag sp6 and t7 ends of overlaps (tomato)

      • gap sizes of force joins in tandem repeats


    Misc feature tag example clone end tags
    Misc_Feature Tag Example Clone End Tags

    Accession

    Length of sequence

    Whole Clone Finished

    Both ends of clone cited



    Qc check of clone assembly
    QC Check of Clone Assembly

    • Before submission to public databases as HTGS phase 3 complete, all assembled BACs undergo several QC checks:

      • all reasonable chemistry attempts have been made for any specific problem types

      • all bases are above phred30

      • orientation of paired end reads checked across assembly

      • assembly is confirmed by restriction digest data

      • correct misc_feature tags have been used to identify any regions where appropriate

    Ensures on high quality contiguous sequence with low error rate


    Chromosome 4 clone pipeline
    Chromosome 4 Clone Pipeline

    Additional 15 BACs finished - not on chromosome 4 from FISH


    Unmapped bacs moved from chr4
    Unmapped BACs moved from chr4

    • bTH82D4 (LE_HBa082D04) moved to chr7 (on FISH map)

    • bTH91D14 (LE_HBa091D14) moved to chr5 (on FISH map)


    Points for discussion at workshop
    Points for Discussion at Workshop

    • What problematic sequence have other groups encountered?

    • Strategies for finishing repeats used by other chromosome groups?

    • Unmapped BACs any from other chromosomes?


    Acknowledgements
    Acknowledgements

    • Cornell University:

    • Lukas Mueller

    • Robert Buels

    • Jim Giovannoni

    • Steve Tanksley

    • Colorado State University:

    • Stephen Stack

    • Suzanne Royer

    • Song-Bin Chang

    • Arizona Genomics Institute:

    • Rod Wing

    • Seunghee Lee

    • MIPS/IBI Institute for Bioinformatics:

    • Klaus Mayer

    • Remy Bruggmann

    • Wageningen University :

    • Rene Klein Lankhorst

    • Hans de Jong

    • Dora Szinay

    • Wellcome Trust Sanger Institute:

    • Karen McLaren

    • Clare Riddle

    • Sean Humphray

    • Christine Nicholson

    • Carol Scott

    • Stuart McLaren

    • Matt Jones

    • Christine Lloyd

    • Sarah Sims

    • Karen Oliver

    • Jane Rogers

    • Imperial College London:

    • Gerard Bishop

    • Daniel Buchan

    • James Abbott

    • Sarah Butcher

    • University of Nottingham:

    • Graham Seymour

    • Scottish Crop Research Institute:

    • Glenn Bryan

    FUNDING


    ad