1 / 20

How to Access the Data

How to Access the Data. Laura Clarke. Data Types. Sequence Fastq Pilot and Phase1 represents a mix of technologies/read lenghts Final Phase represents >=70bp paired end Illumina Alignment BAM Variants VCF Meta data and Reference Data Sets b ed g ff fasta. Data Availability.

duscha
Download Presentation

How to Access the Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Access the Data Laura Clarke

  2. Data Types • Sequence • Fastq • Pilot and Phase1 represents a mix of technologies/read lenghts • Final Phase represents >=70bp paired end Illumina • Alignment • BAM • Variants • VCF • Meta data and Reference Data Sets • bed • gff • fasta

  3. Data Availability • FTP site: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/ • Raw Data Files • AWS Amazon Cloud: http://aws.amazon.com/1000genomes/ • FTP mirror • Web site: http://www.1000genomes.org • Release Announcements • Documentation • Ensembl Style Browser: http://browser.1000genomes.org • Browse 1000 Genomes variants in Genomic Context • Variant Effect Predictor • Data Slicer • Other Tools

  4. FTP Site • Two mirrored ftp sites • ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp • ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp • NCBI site is direct mirror of EBI site • Can be up to 24 hours out of date • Both also accessible using aspera • http://asperasoft.com/ • EBI site has http mirror • http://ftp.1000genomes.ebi.ac.uk/vol1/ftp

  5. ftp://ftp.1000genomes.ebi.ac.ukftp://ftp-trace.ncbi.nih.gov/1000genomes/ftpftp://ftp.1000genomes.ebi.ac.ukftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp Documentation Raw Data Phase 1 Data Pilot Data Release Data Technical Data

  6. The FTP Site: Data Sample Level Files sequence_read alignment cg_data

  7. FTP Site: Technical Reference Data Sets Experimental Data

  8. FTP Site: Phase 1 Ancestry Deconvolution Functional Annotation Paper Files Integrated Call sets Input call sets Experimental Validation Consensus Call Sets Supporting Info

  9. Finding Data current.tree at the route of the ftp provides complete listing of all files on the ftp site FTP search Text search Based on current tree Can provide md5s Can exclude high volume results EBI or NCBI urls

  10. Browser : Search Results

  11. Browser : SNP View

  12. Browser : SNP View

  13. Browser : SNP View

  14. Browser Help ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/browser/1000genomes_browser_main_project_20110521/The_1000_Genomes_Browser_Tutorial.ensembl_65.doc http://www.ensembl.org/info/website/tutorials/index.html info@1000genomes.org

  15. Tools

  16. Tools http://browser.1000genomes.org/tools.html Data Slicer Variation Pattern Finder VCF to PED Converter Variant Effect Predictor Forge

  17. Variant Effect Predictor (VEP) • Predicts Functional Consequences of Variants • SNPs • Indels • Structural Variation • Web and API based • Can provide • Sift and PolyPhen • HGVS • Refseq gene name • Offline mode • Input format Conversion • http://www.ensembl.org/info/docs/tools/vep/index.html

  18. Variation Annotation : functional consequences SNP (regulatory) SNP (coding) AG REF: TTCCGA ALT: TTCCAA TF SO:0001583 : missense variant SO:0001782 : TF binding site variant ++ Increased binding affinity Structural variant (deletion) Short insertion REF: AGTT--GCGAA ALT: AGTTCCGCGAA SO:0001589 : frameshift_variant SO:0001893 : transcript ablation > mutated protein MLRKFAFSICNDAEGMFCVANAIQRMTIKCTAPHYEVAHIQAQWLIELDWADPQASRSL Phenotype VEP plugin Custom data

  19. Announcements and Contact Info http://1000genomes.org 1000announce@1000genomes.org http://www.1000genomes.org/1000-genomes-annoucement-mailing-list http://www.1000genomes.org/announcements/rss.xml http://twitter.com/#!/1000genomes Please send questions to info@1000genomes.org

  20. Acknowledgements The 1000 Genomes Consortium Ensembl Variation Paul Flicek Fiona Cunningham Holly Zheng Bradley Will McLaren Bert Overduin Laurent Gil Emily Pritchard AnjaThormann Ian Streeter Sarah Hunt AvikDatta The Rest of Ensembl David Richardson Forge Ian Dunham

More Related