290 likes | 492 Views
Progress on the sequencing of tomato chromosome 6. Roeland van Ham, Sander Peters, Taco Jesse, Hans de Jong, Erwin Datema, Rene Klein Lankhorst. Outline. Project overview Results mapping and FISH Sequencing status & planning Annotation (Test BAC annotation). Dutch tomato chr. 6 sequencing.
E N D
Progress on the sequencing of tomato chromosome 6 • Roeland van Ham, Sander Peters, Taco Jesse, Hans de Jong, Erwin Datema, Rene Klein Lankhorst
Outline • Project overview • Results mapping and FISH • Sequencing status & planning • Annotation (Test BAC annotation)
Dutch tomato chr. 6 sequencing • Centre for BioSystems Genomics (CBSG) • developments 2004-2005: • only limited number of initial seed BACs anchored (23) • anchoring problems revealed by FISH analysis • reduction in sequencing costs • sequence more BACs with same budget • novel SOL resources: • additional BAC libraries (Mbo I and EcoRI) • BAC-end sequences • new genome-wide selection of seed BACs from Cornell F2.2000 genetic map • ~70 candidate seed BACs for chr. 6
CBSG tomato chr. 6 sequencing goals adjusted: • (originally draft sequence 10Mb) • anchoring of seed BACs over entire euchromatic part • chromosome walking from seed BACs by STC/AFLP fingerprinting approach • special regions of interest remain: Mi, Cf2/5, Ol1/3 loci • sequence complete euchromatic part of chr. 6 (~20.5 Mb) • estimated number of BACs: ~215
Dutch tomato chr. 6 sequencing effort • Centre for BioSystems Genomics (CBSG) and associated projects (EU-)SOL resources InF1 infrastructure TS1 mapping BAC selection (TS3) FISH TS2 sequencing InD1 assembly & annotation technology development bioinformatics projects
Mapping results • Selection of extension BACs: a chromosome walking pilot (Peters et al. Plant Phys, 2006)
Mapping results • Approach Peters et al. Plant Phys, 2006
Analysis of candidate extensions BACs:AFLP and FPC smallest overlap, largest extension: sequencing pipeline
FISH results • TS3 (Ludmila Khrustaleva, Hans de Jong) • 8 out of latest set of 12 seed BACs unambiguously positioned
FISH pipeline chr. 6 sequencing • list of new candidate seed BACs (AGI) analyzed for: • marker or marker in reliable contig • availability of good BAC-end sequences • presence of repeats in marker or in associated BAC-end sequence • 51 candidates analyzed • 24 new seed BACs selected • currently in FISH pipeline
FISH pipeline chr. 6 sequencing • multi-FISH experiment in preparation • determine relative physical position of seed BACs • do we have cross seed BAC oceans?
Sequencing status chr. 6 & planning Results • BACs finished to Phase 1-2 / 353 / 2 (14 ext. BACs) • BACs in sequencing pipeline5( 4 ext. BACs) • ready for sequencing3 • new seed BACs in FISH pipeline24 Planning • data release: 15 BACs at SGN, 24 to be released from August 1st (CBSG partner approval pending) • start phase 3 sequencing (gap closure) in Q4 2006 (EU-SOL) • 454 BAC sequencing pilot underway • extension BAC selection from seed BACs (SNaPshot FP)
CBSG tomato chr. 6 sequencing: Annotation • project connections InD1 assembly & annotation technology development • bioinformatics • projects • structural annotation and curation chr.6 • functional annotation:bayesian gene function prediction • alternative splicing • miRNA prediction
Results chr. 6 sequencing • InD1: development of software • TOPAAS: genome assembly & extension BAC selection • Cyrille2: system for automated, high-throughput genome annotation • CBSG genome sequence database
end user databases annotator & admin core software cluster linux & condor third-party tools e.g. blast, interpro, genscan Cyrille2: system overview (1)
pipeline database status database biological database end user annotator & admin user interface scheduler executor cluster linux & condor third-party tools e.g. blast, interpro, genscan Cyrille2: system overview (2)
upload sequence s gene prediction g g blast b b b Cyrille2: data storage & transport • BioMOBY • easy interaction with 3rd party servers <moby:MOBY> <moby:mobyContent> <moby:mobyData moby:queryID='data'> <moby:Simple> <moby:GenericDnaSequence moby:id="073H08F00068"> <moby:Length>2332</moby:Length> <moby:Sequence> AATCGACGATCTACGTA.... </moby:Sequence> </moby:Integer> </moby:GenericDnaSequence> .....
biological database get from database pipeline database data pointer cyrille2 core node tool wrapper tool status database store in db pointer data biological database Cyrill2: job execution cluster / biomoby service BioMOBY
Cyrille2: BAC annotation pipeline • Ab initio gene predictors • Genscan (Arabidopsis) • GlimmerHMM (Arabidopsis) • GeneId (Solanaceae) • SNAP (Arabidopsis) • under development: JIGSAW (consensus gene modelling) • Other feature predictors • Marscan (EMBOSS) • Tandem Repeats Finder • RepeatMasker (tomato-specific library) • miRNA • InterPro • under development: functional annotation • Transcript datasets (blastn -> Sim4) • SGN tomato UniGenes • SGN potato UniGenes • TIGR LeGI TCs • Kazusa microtom UniGenes • Genbank full-length cDNAs (filtered) • SGN Coffee UniGenes • Protein datasets (tblastn -> GeneWise) • Swiss-Prot Plant • Arabidopsis TAIR6 annotation
Cyrille2: pipeline programming genome annotation pipeline miRNA & target prediction pipeline
Cyrille 2: summary • fully automated, high-throughput • generic bioinformatics workflow management • modular, extensible • generic tool wrapper module • open communication standard • BioMOBY, access to external services • iterative execution • background execution • automated updating • database independent (GGB / Ensembl) • independent GUI
ggb visualization • tomato and potato genome annotations • storage, access, visualization • http://appliedbioinformatics.wur.nl/cbsg-site • Sept. 1st public access to released data
Test BAC annotation Erwin Datema
Conclusions • ~26% (5.5 Mb out of 20.5 Mb) euchr. part draft sequenced • BAC walking strategy successful, continue with SNapshot FP • start closure sequencing BACs Q4 2006 (EU-SOL) • assessment of physical distribution of current set of seed BACs by FISH • improve, deepen and curate structural annotation • integrated in EU-SOL
Yuling Bai Song-Bin Chang Erwin Datema Mark Fiers Mark van Haaren Jan van Haarst Marleen Henkens Thamara Hesselink Taco Jesse Hans de Jong Ludmilla Khrustaleva Pim Lindhout Bas te Lintel Hekkert Fien Meijer Sander Peters Marjo van Staveren Willem Stiekema Keygene NV PRI; Applied Bioinformatics/Greenomics WU; Genetics WU; Plant Breeding Acknowledgements • Rene Klein Lankhorst