1 / 32

Topic #2 Introduction to Bioinformatic Databases

Topic #2 Introduction to Bioinformatic Databases. University of Wisconsin Genetic Analysis Workshop June 2011. Goals. Use bioinformatic databases to: Determine basic properties of genes Identify common genetic variants in and around genes

delora
Download Presentation

Topic #2 Introduction to Bioinformatic Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic #2Introduction to Bioinformatic Databases University of Wisconsin Genetic Analysis Workshop June 2011

  2. Goals • Use bioinformatic databases to: • Determine basic properties of genes • Identify common genetic variants in and around genes • Characterize genetic variants in terms of frequency and functionality

  3. Possible Stages in Candidate-Gene Study Design Select a Candidate System (e.g., Dopaminergic) Select a Candidate Genes in System Select Genetic Variants in Candidate Genes Knowledge of the biology of the phenotype • Expert Opinion • Literature Search • Pathway Analysis • (Positional) • Literature Search • Bioinformatic Databases • SNP Tagging

  4. Dopamine Candidate Genes ANKK1 COMT DRD1 DRD2 DRD3 DRD4 DRD5 SLC6A3 SLC18A2 TH But what about: DBH, MAO-A, MAO-B?

  5. Pathway analysis • KEGGhttp://www.genome.jp/kegg/ • BioCartahttp://www.biocarta.com/ • Gene Ontology: http://www.geneontology.org/

  6. UCSC Genome Browser(http://genome.ucsc.edu/) For a gene of interest (here COMT) • Determine basic properties: • Location • Size, #exons • Identify genetic variants • SNPs, in-dels, STRs • Optionally download these into a file

  7. Genome Browser Gateway Search by Position Different builds Search by Gene

  8. Transcript location Alternative Forms SNPs Default Colors: Red – NS-Coding Green – SynCoding Red – Splice Site Blue – UTR Black – Intron Black – Locus Black - Unknown

  9. COMT Transcripts

  10. rs4680 (Val/Met)

  11. COMT Transcripts

  12. UCSC Table Browser

  13. SNP Validation • Validated by multiple, independent submissions to the refSNP cluster • Validated by frequency or genotype data: minor alleles observed in at least two chromosomes • Validated by submitter confirmation • All alleles have been observed in at least two chromosomes apiece • Genotyped by HapMap project • SNP has been sequenced in 1000Genome project.

  14. UCSC Table Browser Output #chromchromEnd name class func alleles alleleFreqs chr22 19929286 rs45593642 single missense,UTR-5 A,C, 0.004167,0.995833, chr22 19950150 rs6270 single missense,UTR-5 G, 1.000000, chr22 19950164 rs75012854 single missense,UTR-5 G,A, 0.126316,0.873684, chr22 19950263 rs6267 single missense G,T, 0.972273,0.027727, chr22 19950323 rs13306281 single missense G, 1.000000, chr22 19950329 rs76452330 single missense G,A, 0.983333,0.016667, chr22 19951103 rs5031015 single missense G,A, 0.991059,0.008941, chr22 19951236 rs4986871 single missense T,C, 0.014286,0.985714, chr22 19951271 rs4680 single missense N,G,T,A,C, 0.000253,0.649012,0.002027,0.345160,0.003548, chr22 19951803 rs13306279 single missense T,C, 0.001706,0.998294, • Settings (COMT): • ALL SNPs (132) - hg19 • Class = single - Validated by 1 method • Function = Missense, Nonsense, or Frameshift

  15. nsSNPS in COMT (N=10 all missense)

  16. Entrez(http://www.ncbi.nlm.nih.gov/sites/gquery)

  17. Entrez Gene

  18. Chromosome 22 (hg19) TXNRD2 19,881,781 (3’) 19,929,359 (5’) COMT 19,929,263 (5’) 19,957,496 (3’) ARVCF 19,957,421 (3’) 2,004,309 (5’)

  19. SNP: Geneview Where is rs4680?

  20. dbSNP Color Code

  21. dbSNP: rs4680

  22. dbSNP: rs4680

  23. Finding Genes in Region of SNP • Purcell et al. (2009) report that the strongest SNP effect in their GWAS is: rs3130297, which gives p = 4.79 x 10-8

  24. Capturing cis-regulatory polymorphisms • Extend search in 5’ and 3’ directions. Arbitrary, but sometimes 5k in 5’ and 1k in 3’ • Use functional elements databases (FESD, http://sysbio.kribb.re.kr:8080/fesd/index.jsp)

  25. Functional Element SNPs Database II(http://sysbio.kribb.re.kr:8080/fesd/index.jsp) • Promoter • CpG islands – in or near promoters, methylation sites • 5’-UTR – may affect mRNA stability, etc. • 3’-UTR – mRNA properties • Start/stop codons • Splice sites • Introns • Exons • Polyadenylation signal sites – protects mRNA at 3’ end

  26. 1312=COMT

  27. ~ 6K upstream

  28. Conclusions • UCSC Browser and Entrez Gene • Characteristics of gene transcripts • Characteristics of genetic variation in gene transcripts • Selection of genetic variants for genotyping • Previous research • Functionality • Statistical properties – next LD and MAF

More Related