430 likes | 442 Views
Welcome - webinar instructions. The webinar will start soon GoToTraining works best in Chrome or on Linux, Firefox All microphones will be muted while the trainer is speaking If you have a question please use the chat box at the bottom of the GoToTraining box
E N D
Welcome - webinar instructions • The webinar will start soon • GoToTraining works best in Chrome or on Linux, Firefox • All microphones will be muted while the trainer is speaking • If you have a question please use the chat box at the bottom of the GoToTraining box • Please complete the feedback survey which will launch at the end of the webinar • The webinar will be recorded and added to Train online
An Introductory Webinar Wojtek Bazant & Faye Rodgers https://parasite.wormbase.org parasite-help@sanger.ac.uk
Outline • Why WormBase ParaSite? • Our genomes • Data available • BioMart • Questions
Why WormBase ParaSite? • Helminths (parasitic roundworms and flatworms) are the causative agents of many diseases of humans, animals and plants • Increasing amounts of genomic data are becoming available to the helminth research community • WormBase ParaSite processes and presents that data in a consistent and accessible way
Genomes and primary annotation (from the community) Analyses run for all genomes Protein domain prediction, GO term annotation, repeat annotation, ncRNA annotation, alignment of publicly available RNASeq data, linking IDs to external databases Comparative analysis Build gene trees incorporating all genomes in the release (plus comparators) to predict orthologues and paralogues. Website - browsing • Gene and species pages • JBrowse REST API Website - tools • BLAST • BioMart
Finding information related to your scientific question If you know the gene name or ID, it’s just a search task! Otherwise, it more like research. Common avenues: • BLAST the sequence • Text search to try match a gene description • Search through a protein feature or GO term • Navigate through an orthologous gene in other species
Alternative genome browser – JBrowse Better for a workbench view with multiple tracks
Comparative Genomics Gene trees are computed with every release, classifying genes into families. These are reconciled with the species trees to infer orthologous and paralogous relationships. Speciation node Duplication node Tree views can be configured for exploring the gene family https://www.ensembl.org/info/genome/compara/homology_method.html
Comparative Genomics Eg, highlight all of the paralogues:
Comparative Genomics Orthologues and paralogues are also available in tabular format: • Lists can be exported from BioMart • Full gene trees can be accessed programmatically via the API
BioMart A very powerful tool for accessing data in bulk without any programming knowledge. Filters can be combined to build more complex queries Filters The data type you’re basing your query on, eg: Genome Genomic region A list of gene IDs All genes annotated with a protein domain or a GO term All genes that have an orthologue in a species Values The actual data you’re basing your query on, eg: Schistosoma mansoni PRJEA36577 Schistosoma mansoni Sm_V7_1 Smp_035270, Smp_010250, Smp_244010… SignalP Genes with an orthologue in Schistosoma haematobium Attributes The data you want, eg: Protein stable IDs cDNA sequences Uniprot IDs Protein domains Orthologue names, % identity
BioMart Walk-through example: using BioMart to retrieve S. mansoni genes from the ZW chromosome that have an orthologue in S. japonicum and S. haematobium. Want to return the S. mansoni, S. haematobium and S. japonicum gene IDs.
BioMart • For a list of gene IDs: • Convert to other types of identifier (Uniprot, RefSeq, NCBI) • Retrieve associated protein domains, GO terms • Retrieve their genomic coordinates • Generate FASTA files of protein, cDNA, UTR, flanking region sequences etc • Retrieve a list of genes that: • Have a given protein domain/GO term • Have/do not have orthologues in species X,Y,Z. • Are on genomic region X Other examples of questions that can be answered with BioMart: For R users, WormBase ParaSite BioMart supports the biomaRt R package: see our help and documentation pages to get started.
Outline • Why WormBase ParaSite? • Our genomes • Data available • BioMart • Questions
Outline If we don’t get to your question: email parasite-help@sanger.ac.uk • Why WormBase ParaSite? • Our genomes • Data available • BioMart • Questions
Sample question I need the sequences for a set of Schistosoma mansoni genes. I have the chromosome, start, and stop for each. The suggested option Other, more creative approaches? • download the GFF and the sequence files from the FTP, and write a program • check the cases one by one • use the API, first „region” endpoint to get gene IDs, then „sequence” endpoint • email the helpdesk ( it might work )
BioMart Example 2 Using BioMart to generate a protein FASTA file from a list of gene IDs
Select the type of sequence we’re interested in. Select the information we’d like in the FASTA header.
Upcoming webinars See the full list of upcoming webinars at https://www.ebi.ac.uk/training/webinars Don’t forget! Please fill in the survey that launches after the webinar – thanks!