320 likes | 480 Views
Topic #2 Introduction to Bioinformatic Databases. University of Wisconsin Genetic Analysis Workshop June 2011. Goals. Use bioinformatic databases to: Determine basic properties of genes Identify common genetic variants in and around genes
E N D
Topic #2Introduction to Bioinformatic Databases University of Wisconsin Genetic Analysis Workshop June 2011
Goals • Use bioinformatic databases to: • Determine basic properties of genes • Identify common genetic variants in and around genes • Characterize genetic variants in terms of frequency and functionality
Possible Stages in Candidate-Gene Study Design Select a Candidate System (e.g., Dopaminergic) Select a Candidate Genes in System Select Genetic Variants in Candidate Genes Knowledge of the biology of the phenotype • Expert Opinion • Literature Search • Pathway Analysis • (Positional) • Literature Search • Bioinformatic Databases • SNP Tagging
Dopamine Candidate Genes ANKK1 COMT DRD1 DRD2 DRD3 DRD4 DRD5 SLC6A3 SLC18A2 TH But what about: DBH, MAO-A, MAO-B?
Pathway analysis • KEGGhttp://www.genome.jp/kegg/ • BioCartahttp://www.biocarta.com/ • Gene Ontology: http://www.geneontology.org/
UCSC Genome Browser(http://genome.ucsc.edu/) For a gene of interest (here COMT) • Determine basic properties: • Location • Size, #exons • Identify genetic variants • SNPs, in-dels, STRs • Optionally download these into a file
Genome Browser Gateway Search by Position Different builds Search by Gene
Transcript location Alternative Forms SNPs Default Colors: Red – NS-Coding Green – SynCoding Red – Splice Site Blue – UTR Black – Intron Black – Locus Black - Unknown
SNP Validation • Validated by multiple, independent submissions to the refSNP cluster • Validated by frequency or genotype data: minor alleles observed in at least two chromosomes • Validated by submitter confirmation • All alleles have been observed in at least two chromosomes apiece • Genotyped by HapMap project • SNP has been sequenced in 1000Genome project.
UCSC Table Browser Output #chromchromEnd name class func alleles alleleFreqs chr22 19929286 rs45593642 single missense,UTR-5 A,C, 0.004167,0.995833, chr22 19950150 rs6270 single missense,UTR-5 G, 1.000000, chr22 19950164 rs75012854 single missense,UTR-5 G,A, 0.126316,0.873684, chr22 19950263 rs6267 single missense G,T, 0.972273,0.027727, chr22 19950323 rs13306281 single missense G, 1.000000, chr22 19950329 rs76452330 single missense G,A, 0.983333,0.016667, chr22 19951103 rs5031015 single missense G,A, 0.991059,0.008941, chr22 19951236 rs4986871 single missense T,C, 0.014286,0.985714, chr22 19951271 rs4680 single missense N,G,T,A,C, 0.000253,0.649012,0.002027,0.345160,0.003548, chr22 19951803 rs13306279 single missense T,C, 0.001706,0.998294, • Settings (COMT): • ALL SNPs (132) - hg19 • Class = single - Validated by 1 method • Function = Missense, Nonsense, or Frameshift
Chromosome 22 (hg19) TXNRD2 19,881,781 (3’) 19,929,359 (5’) COMT 19,929,263 (5’) 19,957,496 (3’) ARVCF 19,957,421 (3’) 2,004,309 (5’)
SNP: Geneview Where is rs4680?
Finding Genes in Region of SNP • Purcell et al. (2009) report that the strongest SNP effect in their GWAS is: rs3130297, which gives p = 4.79 x 10-8
Capturing cis-regulatory polymorphisms • Extend search in 5’ and 3’ directions. Arbitrary, but sometimes 5k in 5’ and 1k in 3’ • Use functional elements databases (FESD, http://sysbio.kribb.re.kr:8080/fesd/index.jsp)
Functional Element SNPs Database II(http://sysbio.kribb.re.kr:8080/fesd/index.jsp) • Promoter • CpG islands – in or near promoters, methylation sites • 5’-UTR – may affect mRNA stability, etc. • 3’-UTR – mRNA properties • Start/stop codons • Splice sites • Introns • Exons • Polyadenylation signal sites – protects mRNA at 3’ end
Conclusions • UCSC Browser and Entrez Gene • Characteristics of gene transcripts • Characteristics of genetic variation in gene transcripts • Selection of genetic variants for genotyping • Previous research • Functionality • Statistical properties – next LD and MAF