200 likes | 364 Views
How to Access the Data. Laura Clarke. Data Types. Sequence Fastq Pilot and Phase1 represents a mix of technologies/read lenghts Final Phase represents >=70bp paired end Illumina Alignment BAM Variants VCF Meta data and Reference Data Sets b ed g ff fasta. Data Availability.
E N D
How to Access the Data Laura Clarke
Data Types • Sequence • Fastq • Pilot and Phase1 represents a mix of technologies/read lenghts • Final Phase represents >=70bp paired end Illumina • Alignment • BAM • Variants • VCF • Meta data and Reference Data Sets • bed • gff • fasta
Data Availability • FTP site: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/ • Raw Data Files • AWS Amazon Cloud: http://aws.amazon.com/1000genomes/ • FTP mirror • Web site: http://www.1000genomes.org • Release Announcements • Documentation • Ensembl Style Browser: http://browser.1000genomes.org • Browse 1000 Genomes variants in Genomic Context • Variant Effect Predictor • Data Slicer • Other Tools
FTP Site • Two mirrored ftp sites • ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp • ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp • NCBI site is direct mirror of EBI site • Can be up to 24 hours out of date • Both also accessible using aspera • http://asperasoft.com/ • EBI site has http mirror • http://ftp.1000genomes.ebi.ac.uk/vol1/ftp
ftp://ftp.1000genomes.ebi.ac.ukftp://ftp-trace.ncbi.nih.gov/1000genomes/ftpftp://ftp.1000genomes.ebi.ac.ukftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp Documentation Raw Data Phase 1 Data Pilot Data Release Data Technical Data
The FTP Site: Data Sample Level Files sequence_read alignment cg_data
FTP Site: Technical Reference Data Sets Experimental Data
FTP Site: Phase 1 Ancestry Deconvolution Functional Annotation Paper Files Integrated Call sets Input call sets Experimental Validation Consensus Call Sets Supporting Info
Finding Data current.tree at the route of the ftp provides complete listing of all files on the ftp site FTP search Text search Based on current tree Can provide md5s Can exclude high volume results EBI or NCBI urls
Browser Help ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/browser/1000genomes_browser_main_project_20110521/The_1000_Genomes_Browser_Tutorial.ensembl_65.doc http://www.ensembl.org/info/website/tutorials/index.html info@1000genomes.org
Tools http://browser.1000genomes.org/tools.html Data Slicer Variation Pattern Finder VCF to PED Converter Variant Effect Predictor Forge
Variant Effect Predictor (VEP) • Predicts Functional Consequences of Variants • SNPs • Indels • Structural Variation • Web and API based • Can provide • Sift and PolyPhen • HGVS • Refseq gene name • Offline mode • Input format Conversion • http://www.ensembl.org/info/docs/tools/vep/index.html
Variation Annotation : functional consequences SNP (regulatory) SNP (coding) AG REF: TTCCGA ALT: TTCCAA TF SO:0001583 : missense variant SO:0001782 : TF binding site variant ++ Increased binding affinity Structural variant (deletion) Short insertion REF: AGTT--GCGAA ALT: AGTTCCGCGAA SO:0001589 : frameshift_variant SO:0001893 : transcript ablation > mutated protein MLRKFAFSICNDAEGMFCVANAIQRMTIKCTAPHYEVAHIQAQWLIELDWADPQASRSL Phenotype VEP plugin Custom data
Announcements and Contact Info http://1000genomes.org 1000announce@1000genomes.org http://www.1000genomes.org/1000-genomes-annoucement-mailing-list http://www.1000genomes.org/announcements/rss.xml http://twitter.com/#!/1000genomes Please send questions to info@1000genomes.org
Acknowledgements The 1000 Genomes Consortium Ensembl Variation Paul Flicek Fiona Cunningham Holly Zheng Bradley Will McLaren Bert Overduin Laurent Gil Emily Pritchard AnjaThormann Ian Streeter Sarah Hunt AvikDatta The Rest of Ensembl David Richardson Forge Ian Dunham