1 / 18

Bioinformatics tools and techniques Into the heart of darkness

Bioinformatics tools and techniques Into the heart of darkness. Elaine Kenny Colm O’Dushlaine 15/11/07. Summary. Simple overviews of some of the tools and methods used by EK and CO’D TK notebook get_hapmap_snps.pl: retrieve HM genotype information for a list of SNPs

jshaddix
Download Presentation

Bioinformatics tools and techniques Into the heart of darkness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics tools and techniquesInto the heart of darkness Elaine Kenny Colm O’Dushlaine 15/11/07

  2. Summary • Simple overviews of some of the tools and methods used by EK and CO’D • TK notebook • get_hapmap_snps.pl: retrieve HM genotype information for a list of SNPs • GeneViewer.pl & cross_ref.pl: visualise e.g. SNPs in the context of other genomic landmarks. Score SNPs depending on how many of these landmarks they overlap with • ld_expander.pl: find SNPs in LD with SNPs of interest, based on user-specified r2 and “LD window” (distance between SNPs) • STATA • VIM: command line text editor • Lab website

  3. TK notebook • Application for saving notes, to-do lists, daily logs, and any other kind of textual information in a place where you can find it all again, and where related information is easily found • Easy to edit and rapidly searchable • DEMO – editing • DEMO – search

  4. get_hapmap_snps.pl • Simple script to read in a 1-column list of SNPs and retrieve HapMap genotypes • Can select population and strand • DEMO • Retrieved data can be loaded into HaploView • DEMO

  5. cross_ref_scored.pl • Score SNPs based on how many putatively functional regions they overlap with: • On a per gene / chromosome basis • Gene basis: • Type: perl cross_ref_scored.pl file_A file_B file_C ... where file_A - 2-column file of SNPs (format = id, location) file_B - 3-column file of EXONS (format = id/name, start, stop) file_C ... - whatever you want, (format = id/name, start, stop) i.e. other regions like CpGs, TFBS, clusters. Any order. …

  6. cross_ref_scored.pl example output: Can then be merged with HapMap / Perlegen to retrieve MAF data for SNPs

  7. Merge cross_ref_scored data with HapMap/ Perlegen data using merge_per_hap.pl • Type: perl merge_per_hap.pl perlegen.txt hapmap.txt overlapped_region_scored.txt • Where: hapmap.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq), perlegen.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq)

  8. cross_ref.pl applied to WGA data • cross_ref.pl: Scoring SNPs throughout genome • Data analysed on coding/non-coding basis (coding) • perl cross_ref.plOverlapped_regions_scored.WTCCC.chr22.coding.txt 22WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/coding_non_synon_SNPs_UCSC.clean=3 WGA_databases/coding_synon_SNPs_UCSC.clean=2 WGA_databases/RefSeq_Genes_UCSC.byExon.uniqid=1 WGA_databases/Triplexes_may2006.bed=2 WGA_databases/splice_site_SNPs_UCSC.clean=2>Overlapped_regions_scored.WTCCC.chr22.coding.log & (input-dependent,coding/non-coding dependent, arbitrary) (noncoding) • perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.NONcoding.txt 22 WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/TFBS.chr22=1 WGA_databases/CpG_islands_UCSC.uniqid=1 WGA_databases/Most_conserved_phastConsElements17way_UCSC.clean=1 WGA_databases/promoters_knowngene_hg18.txt=1 WGA_databases/sno_or_miRNA_UCSC.uniqid=1 > Overlapped_regions_scored.WTCCC.chr22.NONcoding.log &

  9. cross_ref.pl • cross_ref.pl output: • Load into STATA. If SNPs have e.g. association p-values, calculate adjusted p-value (R. Anney) as -log10[P] + [cross_ref_score]

  10. GeneViewer.pl • GeneViewer.pl: Visualise overlapping features (e.g. exons, SNPs etc.) along e.g. your gene of interest (html output)

  11. ld_expander.pl • Find proxies (SNPs in LD) for a list of SNPs • User specifies the r2 and “LD window” • Currently configured to obtain proxies from HM CEU • Result is a list of additional proxy SNPs that have been obtained by LD expansion • DEMO • Note: don’t LD expand >150000 SNPs, or HapMap will ban you! CO’D has an alternative version that uses local pre-computed pairwise LD SNP files

  12. STATA • Extremely powerful and flexible • >65k rows handled – shock horror! • Can write scripts to automate tasks, e.g. read in file, do analysis, save results • When use GUI to run some commands, the commands are shown in the command window, so can save in a do file • CO’D, EK and R. Anney strongly advocate this as a platform for both file manipulation and statistical analysis

  13. http://www.wtccc.org.uk/ STATA example using WTCCC data Bipolar Disorder, Coronary Artery Disease, Crohn's Disease, Hypertension, Rheumatoid Arthritis, Type 1 Diabetes, Type 2 Diabetes

  14. DATA FORMAT • 3 folders: • Basic • Each case collection against the pooled control groups 58C and UKBS • Combined cases • Combining other case collections as controls • Combined controls • Combining phenotypically relevant case collections (e.g. RA/T1D, autoimmune ) • Data are split by chromosome

  15. Questions • How do I get all of the chromosome data for my gene of interest into one file? • How do I search easily all of the SNP information for my gene(s) of interest? • Create a “.do” file for all manipulations that you want to carry out to the data • DEMO • Good starting resource: http://www.ats.ucla.edu/stat/stata/

  16. VIM • “Vi Improved”. Mainly UNIX but cross-platform text editor (available for Windows). • Full list of commands outside scope of this demonstration • Very fast and efficient, esp. with search and replace functions on large datasets • Regular expression pattern matching • DEMO • Integrates with Cygwin (www.cygwin.com – very useful UNIX emulator for windows)

  17. Group website • Some useful stuff up there! • Please send information about current projects etc. Good for our image as a group and minimal effort required on your part • DEMO

  18. Conclusions • Small summary of some things you can do • Slides and video demonstrations will be online at: http://www.medicine.tcd.ie/psychiatry/research/neuropsychiatry/Protocols/ • CO’D & EK available for advice(Friday’s 9-9.02am) • These things will help you in your work!!

More Related