220 likes | 498 Views
Welcome - webinar instructions. The webinar will start soon GoToTraining works best in Chrome or on Linux, Firefox All microphones will be muted while the trainer is speaking If you have a question please use the chat box at the bottom of the GoToTraining box
E N D
Welcome - webinar instructions • The webinar will start soon • GoToTraining works best in Chrome or on Linux, Firefox • All microphones will be muted while the trainer is speaking • If you have a question please use the chat box at the bottom of the GoToTraining box • Please complete the feedback survey which will launch at the end of the webinar • The webinar will be recorded and added to Train online
Variant submission and accessioning at the European Variation Archive Baron Koylass www.ebi.ac.uk/eva eva-helpdesk@ebi.ac.uk
Webinar roadmap • European Variation Archive overview & demo • Variant representation and submission • dbSNP data exchange and release • Take away message and additional information Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
European Variation Archive overview & demo https://www.ebi.ac.uk/ena/about/data-repositories Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
European Variation Archive overview & demo 900+ million variants 63 species Annotation Metadata Variants • Study • Experiment/Analyses pipeline • Samples • Effect on genes/transcripts • Functional consequence • Population frequencies • Genetic variants from any species • Within/across population, including subspecies Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
Variant submission and accessioning at the European Variation Archive EVA WEBSITE DEMO
European Variation Archive overview & demo • Programmatic access through web services API • Results provided in JSON • Easily parsed by Python, R, Java • Web services for • Files, segments (regions), studies, variants, genes • Full documentation at EVA website https://www.ebi.ac.uk/eva/?API Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
Variant representation and submission BioSamples Submission metadata Submission metadata ENA Data validation VCF files VCF files EVA PROJECT ACCESSION: PRJEB27789 Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
Variant submission and accessioning at the European Variation Archive Metadata and Variant Call Format demo
Variant representation and submission Human Non-human Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
Variant representation and submission Initial submitted variant Remapped to assembly 2 Remapped to assembly 1 Chr9 2548987 T A Btau_5.0.1 - GCA_000003205.6 Chr9 2609956 A G Bos_taurus_UMD_3.1.1 - GCA_000003055.5 Chr9 2453564 A G ARS-UCD1.2 - GCA_002263795.2 • ss333 • ss111 • ss222 • rs123456789 Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
dbSNP data exchange and release • Import of all non-human variant data from dbSNP. • Variants will be available in the Variant Browser if they satisfy the EVA submission requirements. • dbSNP variants that don't satisfy these requirements will still be imported, and searchable via a separate web view and API. • We will work to make this experience as intuitive as possible, while keeping our commitment to only make high-quality variants part of the core EVA database • First release consists of a variety of files for each species/assembly………. Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
dbSNP data exchange and release GCA_000002285.2 _current_ids.vcf *species*_*taxID*_unmapped_ids.txt GCA_000002285.2 _merged_ids.vcf GCA_000002285.2 _deprecated_ids.txt GCA_000002285.2 _merged_deprecated_ids.txt GCA_000002285.2 _current_ids.vcf • Contains the active variants that satisfy the EVA submission requirements • RS IDs which can be browsed from the EVA website • RS ID originally assigned by dbSNP and variants have been validated and can be mapped back to associated assembly • Contig/chromosome name provided in header column Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
dbSNP data exchange and release GCA_000002285.2 _current_ids.vcf *species*_*taxID*_unmapped_ids.txt GCA_000002285.2 _merged_ids.vcf GCA_000002285.2 _deprecated_ids.txt GCA_000002285.2 _merged_deprecated_ids.txt *species*_*taxID*_unmapped_ids.txt • Contains RS IDs that couldn't be mapped against an assembly by dbSNP • Flanking sequences are provided when possible Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
dbSNP data exchange and release GCA_000002285.2 _current_ids.vcf *species*_*taxID*_unmapped_ids.txt GCA_000002285.2 _merged_ids.vcf GCA_000002285.2 _deprecated_ids.txt GCA_000002285.2 _merged_deprecated_ids.txt GCA_000002285.2 _merged_ids.vcf • Contains RS IDs that should NOT be used • They have been merged into other active RS IDs that represent the same variants • Searchable on EVA website but link to parent RS ID will be provided Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
dbSNP data exchange and release GCA_000002285.2 _current_ids.vcf *species*_*taxID*_unmapped_ids.txt GCA_000002285.2 _merged_ids.vcf GCA_000002285.2 _deprecated_ids.txt GCA_000002285.2 _merged_deprecated_ids.txt GCA_000002285.2 _deprecated_ids.txt • Contains a list of RS IDs that should also NOT be used • these RS IDs were deprecated (e.g. due to missing information) Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
dbSNP data exchange and release GCA_000002285.2 _current_ids.vcf *species*_*taxID*_unmapped_ids.txt GCA_000002285.2 _merged_ids.vcf GCA_000002285.2 _deprecated_ids.txt GCA_000002285.2 _merged_deprecated_ids.txt GCA_000002285.2 _merged_deprecated_ids.txt • Contains RS IDs that should NOT be used • They have been merged into an RS ID that was deprecated later on Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
dbSNP data exchange and release GCA_000002285.2 _current_ids.vcf - Contains the RS IDs which can be browsed from the EVA website. *species*_*taxID*_unmapped_ids.txt - Contains RS IDs that couldn't be mapped against an assembly by dbSNP. Flanking sequences are provided when possible. GCA_000002285.2 _merged_ids.vcf - Contains RS IDs that should NOT be used because they have been merged into other active RS IDs that represent the same variants. GCA_000002285.2 _deprecated_ids.txt - Contains a list of RS IDs that should NOT be used since these RS IDs were deprecated (e.g. due to missing information) GCA_000002285.2 _merged_deprecated_ids.txt - Contains RS IDs that should NOT be used because they have been merged into an RS ID that was deprecated later on. Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
Take away message • The EVA provides permanent archival and accessioning for genetic variation from any species where data can be consumed via study/variant browser and API • Variants accepted in VCF format, various tools online to aid in generation of VCF files such as EVA validation suite, EVA VCF template and other 3rd party tools • dbSNP non-human species RS release - useful for previous dbSNP users and those working with non-human species Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk
The EVA team Thomas Keane Cristina Yenyxe Gonzalez Andres Silva Kirill Tsukanov Sundararaman Venkataraman Jose Miguel Mut Lopez Baron Koylass Funding
Validation suite for VCF files - https://github.com/EBIvariation/vcf-validator • EVA help page for VCF file generation, including VCF file template - https://wwwdev.ebi.ac.uk/eva/?Help • Official VCF specification - https://samtools.github.io/hts-specs/VCFv4.3.pdf • EVA dbSNP import progress page: https://www.ebi.ac.uk/eva/?dbSNP-Import-Progress Additional information
Upcoming webinars See the full list of upcoming webinars at https://www.ebi.ac.uk/training/webinars Don’t forget! Please fill in the survey that launches after the webinar – thanks! Baron Koylass @evarchive www.ebi.ac.uk/evaeva-helpdesk@ebi.ac.uk