1 / 22

Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community.

Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase. Arthropod vectors of human pathogens. Lutzomyia Phlebotomus. Glossina. Ixodes. Pediculus. Aedes. Culex. Rhodnius. Anopheles. Deer tick Ixodes scapularis.

declan
Download Presentation

Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase BRC6 28th October 2008

  2. Arthropod vectors of human pathogens Lutzomyia Phlebotomus Glossina Ixodes Pediculus Aedes Culex Rhodnius Anopheles BRC6 28th October 2008

  3. Deer tick Ixodes scapularis • Vector of Lyme disease (spirochete Borrelia burgdorferi) • Estimated genome size of 2.1 Gb • Sequenced strain: Wikel • 12th generation from ticks sourced from New York, Oklahoma & Connecticut • First Chelicerate genome to be sequenced BRC6 28th October 2008

  4. ESTs, cDNAs Assembly Repeat library (TEs etc) Other genomes, gene sets Manual annotations Community annotations Protein domains Genome annotation cycle Automatic gene build BRC6 28th October 2008

  5. Generating sequence • Sequencing undertaken by established sequencing centres (e.g. Broad, JCVI,) • Initial assembly annotated in collaboration with the sequencing centre(s) • 19,300,000 trace reads generated • Approx. 6x WGS • 570K BAC end sequencing • Assembly produced at JCVI • 194K EST sequences BRC6 28th October 2008

  6. Assembly statistics • This WGS project has the project accession ABJB000000000. The current version of the project (01) has the accession number ABJB010000000, and consists of 1,141,594 scaffolds (ABJB010000001-ABJB011141594). • Released assembly IscaW1 • 570,637 contigs • 369,495 supercontigs • Assembled coverage of 3.8x BRC6 28th October 2008

  7. Preparing for gene build • Repeatmasking • Analyses to identify repeat elements • RepeatScout • RECON • Standard tandem-repeat & low-complexity filtering • Collate data sets • Transcripts (cDNA & EST data) • Peptides (taxonomic groupings, inc. Daphnia pulex) • Train gene predictors, mainly Augustus (JCVI) BRC6 28th October 2008

  8. Annotation plan • First-pass gene prediction • Focused on protein-coding genes CDS’s • Semi-automated approach • This is not manual curation • Involvement of community where possible • Timely delivery of gene set BRC6 28th October 2008

  9. Gene Prediction • Each group/centre has it’s own gene prediction pipeline/protocol. • Each group produces a 1st pass ‘best guess’ set of predictions • 0.5 sets, public release • These sets are merged into a single set • 1.0 set, not released • Quality control activities • 1.1.set, public release • Which is annotated with protein features • .. And released to the wider world BRC6 28th October 2008

  10. Merging gene predictions Gene set #1 Gene set #2 Reduce to single predictions per locus Compare exon/intron structures Identical structures Compatible structures Different structures Merge/Split structures Complex No Map Add isoform predictions based on EST/Peptide data Canonical gene set BRC6 28th October 2008

  11. Merge annotation comparisons BRC6 28th October 2008

  12. Examples Isoform-compat Isoform-diff BRC6 28th October 2008

  13. Examples Merge/Splits Difficult BRC6 28th October 2008

  14. GBrowse viewer BRC6 28th October 2008

  15. VectorBase browser BRC6 28th October 2008

  16. Final gene set (IscaW1.1) • 20,486 protein-coding genes • 48% have Pfam domain • 40% have supporting EST evidence • 8,138 tRNAs • Over-prediction of Ser (4425) and Thr (1527) predictions • 301 ncRNA • Submitted to GenBank last week, release to be coordinated in the next couple of weeks BRC6 28th October 2008

  17. ESTs, cDNAs Assembly Repeat library (TEs etc) Other genomes, gene sets Manual annotations Community annotations Protein domains Genome annotation cycle Automatic gene build BRC6 28th October 2008

  18. Community annotation Gene Build GFF3 Web submission CHADO Researcher Approval Appraisal Total: 13,339 entries An. gambiae 9,423 Cx. quinquefasciatus 2,598 Ae. aegypti 1,281 Ix. scapularis 37 vb! Community representative BRC6 28th October 2008

  19. Community annotation track in browser BRC6 28th October 2008

  20. Lessons • Annotation plan for sequencing and annotation of new genomes is well established (MSC & BRC) • Clearly defining the data release strategies (0.5,1.0 & 1.1) • Monthly conference calls • Face to face meeting when merging 0.5 gene predictions • Coordinated release between MSC, VectorBase and GenBank BRC6 28th October 2008

  21. But we can always improve • Agreement on project/public identifiers at the start of the project • Primarily contigs and supercontigs • Overall nomenclature applied as final step in annotation • More QC before the major milestones • Better communication BRC6 28th October 2008

  22. Acknowledgements EMBL-EBI Harvard IMBB Imperial Notre Dame • Ewan Birney • Martin Hammond • Daniel Lawson • Karyn Megy • Bill Gelbart • Kathy Campbell • Kitsos Louis • Pantelis Topalis • Emmanuel Dialynas • Fotis Kafatos • George Christophides • Bob MacCallum • Seth Redmond • Frank Collins • Greg Madey • Scott Emrich • Ryan Butler • Katie Cybulski • Nate Konopinski • Rob Bruggner (alumni) • E.O. Stinson (alumni) Aedes Anopheles Culex Ixodes • Frank Collins • Neil Lobo • Peter Atkinson • Peter Arensburger • Catherine Hill • Jason Meyer • Dave Severson • Neil Lobo Colleagues Sequencers { JCVI & Broad Institute } BRCs { Pathema, ApiDB } Ensembl { Genebuilders, Web, Compara, Core, Outreach } BRC6 28th October 2008

More Related