1 / 43

Welcome - webinar instructions

Welcome - webinar instructions. The webinar will start soon GoToTraining works best in Chrome or on Linux, Firefox All microphones will be muted while the trainer is speaking If you have a question please use the chat box at the bottom of the GoToTraining box

nwallace
Download Presentation

Welcome - webinar instructions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome - webinar instructions • The webinar will start soon • GoToTraining works best in Chrome or on Linux, Firefox • All microphones will be muted while the trainer is speaking • If you have a question please use the chat box at the bottom of the GoToTraining box • Please complete the feedback survey which will launch at the end of the webinar • The webinar will be recorded and added to Train online

  2. An Introductory Webinar Wojtek Bazant & Faye Rodgers https://parasite.wormbase.org parasite-help@sanger.ac.uk

  3. Outline • Why WormBase ParaSite? • Our genomes • Data available • BioMart • Questions

  4. Why WormBase ParaSite? • Helminths (parasitic roundworms and flatworms) are the causative agents of many diseases of humans, animals and plants • Increasing amounts of genomic data are becoming available to the helminth research community • WormBase ParaSite processes and presents that data in a consistent and accessible way

  5. Genomes and primary annotation (from the community) Analyses run for all genomes Protein domain prediction, GO term annotation, repeat annotation, ncRNA annotation, alignment of publicly available RNASeq data, linking IDs to external databases Comparative analysis Build gene trees incorporating all genomes in the release (plus comparators) to predict orthologues and paralogues. Website - browsing • Gene and species pages • JBrowse REST API Website - tools • BLAST • BioMart

  6. Structure and features of the front page

  7. Our genomes

  8. Genome and species descriptions

  9. Finding information related to your scientific question If you know the gene name or ID, it’s just a search task! Otherwise, it more like research. Common avenues: • BLAST the sequence • Text search to try match a gene description • Search through a protein feature or GO term • Navigate through an orthologous gene in other species

  10. Data available for each gene

  11. Transcript and protein pages

  12. Data available for each gene

  13. “Region in detail” - embedded genome browser

  14. Alternative genome browser – JBrowse Better for a workbench view with multiple tracks

  15. Data available for each gene

  16. Links and references - UniProt etc.

  17. Literature

  18. Comparative Genomics Gene trees are computed with every release, classifying genes into families. These are reconciled with the species trees to infer orthologous and paralogous relationships. Speciation node Duplication node Tree views can be configured for exploring the gene family https://www.ensembl.org/info/genome/compara/homology_method.html

  19. Comparative Genomics Eg, highlight all of the paralogues:

  20. Comparative Genomics Orthologues and paralogues are also available in tabular format: • Lists can be exported from BioMart • Full gene trees can be accessed programmatically via the API

  21. BioMart A very powerful tool for accessing data in bulk without any programming knowledge. Filters can be combined to build more complex queries Filters The data type you’re basing your query on, eg: Genome Genomic region A list of gene IDs All genes annotated with a protein domain or a GO term All genes that have an orthologue in a species Values The actual data you’re basing your query on, eg: Schistosoma mansoni PRJEA36577 Schistosoma mansoni Sm_V7_1 Smp_035270, Smp_010250, Smp_244010… SignalP Genes with an orthologue in Schistosoma haematobium Attributes The data you want, eg: Protein stable IDs cDNA sequences Uniprot IDs Protein domains Orthologue names, % identity

  22. BioMart Walk-through example: using BioMart to retrieve S. mansoni genes from the ZW chromosome that have an orthologue in S. japonicum and S. haematobium. Want to return the S. mansoni, S. haematobium and S. japonicum gene IDs.

  23. To access BioMart from the home page

  24. Add a species filter

  25. Add a region filter

  26. Add homology filters

  27. Count how many genes fulfil our filter criteria

  28. Select output attributes

  29. Previewing the results we get by default

  30. Add orthologues to output attributes

  31. Scroll down to find the species that we’re interested in

  32. View a preview of your output, and download full results.

  33. BioMart • For a list of gene IDs: • Convert to other types of identifier (Uniprot, RefSeq, NCBI) • Retrieve associated protein domains, GO terms • Retrieve their genomic coordinates • Generate FASTA files of protein, cDNA, UTR, flanking region sequences etc • Retrieve a list of genes that: • Have a given protein domain/GO term • Have/do not have orthologues in species X,Y,Z. • Are on genomic region X Other examples of questions that can be answered with BioMart: For R users, WormBase ParaSite BioMart supports the biomaRt R package: see our help and documentation pages to get started.

  34. Outline • Why WormBase ParaSite? • Our genomes • Data available • BioMart • Questions

  35. Outline If we don’t get to your question: email parasite-help@sanger.ac.uk • Why WormBase ParaSite? • Our genomes • Data available • BioMart • Questions

  36. Sample question I need the sequences for a set of Schistosoma mansoni genes. I have the chromosome, start, and stop for each. The suggested option Other, more creative approaches? • download the GFF and the sequence files from the FTP, and write a program • check the cases one by one • use the API, first „region” endpoint to get gene IDs, then „sequence” endpoint • email the helpdesk ( it might work )

  37. BioMart Example 2 Using BioMart to generate a protein FASTA file from a list of gene IDs

  38. Select filter(s).

  39. Paste in gene IDs.

  40. In output attributes, select “Retrieve sequence”

  41. Select the type of sequence we’re interested in. Select the information we’d like in the FASTA header.

  42. Preview and download output.

  43. Upcoming webinars See the full list of upcoming webinars at https://www.ebi.ac.uk/training/webinars Don’t forget! Please fill in the survey that launches after the webinar – thanks!

More Related