160 likes | 316 Views
Annotation. SPH 247 Statistical Analysis of Laboratory Data. Annotation. Given that one has found one of more genes that are differentially expressed, there are a number useful things to know What is the putative function? What pathways are know to contain this gene?
E N D
Annotation SPH 247 Statistical Analysis of Laboratory Data SPH 247 Statistical Analysis of Laboratory Data
Annotation • Given that one has found one of more genes that are differentially expressed, there are a number useful things to know • What is the putative function? • What pathways are know to contain this gene? • What other proteins interact with the given protein? • etc. SPH 247 Statistical Analysis of Laboratory Data
Two-color array example > alldata[1,] [1] 473 888 170 1137 86 290 109 226 370 659 359 484 102 293 174 [16] 324 196 638 102 293 > geneID[1,] Name ID 1 NM_006182 discoidin domain receptor family, member http://www.ncbi.nlm.nih.gov/genome/guide/human/resources.shtml SPH 247 Statistical Analysis of Laboratory Data
Official Symbol DDR2 provided by HGNC Official Full Name discoidindomain receptor tyrosine kinase 2 provided by HGNC Primary source HGNC:2731 Locus tag RP11-572K18.1 See related Ensembl:ENSG00000162733;HPRD:01868;MIM:191311;Vega:OTTHUMG00000034423 Gene type protein coding RefSeq status REVIEWED Organism Homo sapiens Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo Also known as TKT; MIG20a; NTRKR3; TYRO10 Summary Receptor tyrosine kinases (RTKs) play a key role in the communication of cells with their microenvironment. These molecules are involved in the regulation of cell growth, differentiation, and metabolism. In several cases the biochemical mechanism by which RTKs transduce signals across the membrane has been shown to be ligand induced receptor oligomerization and subsequent intracellular phosphorylation. This autophosphorylation leads to phosphorylation of cytosolic targets as well as association with other molecules, which are involved in pleiotropic effects of signal transduction. RTKs have a tripartite structure with extracellular, transmembrane, and cytoplasmic regions. This gene encodes a member of a novel subclass of RTKs and contains a distinct extracellular region encompassing a factor VIII-like domain. Alternative splicing in the 5' UTR results in multiple transcript variants encoding the same protein. [provided by RefSeq, Jul 2008] SPH 247 Statistical Analysis of Laboratory Data
Affy Example > source("http://bioconductor.org/biocLite.R") > biocLite("annaffy") > biocLite("hgu95av2.db") > library(annaffy) > library(affy) Loading required package: Biobase Loading required package: tools … Loading required package: GO Loading required package: KEGG SPH 247 Statistical Analysis of Laboratory Data
> probeids <- featureNames(eset)[pv2$Posterior.FDR < .05] > probeids[1:5] [1] "1005_at" "1009_at" "1034_at" "1035_g_at" "1045_s_at" > symbols <- aafSymbol(probeids,"hgu95av2.db") Loading required package: hgu95av2 > symbols[1] An object of class "aafList" [[1]] An object of class “aafSymbol” [1] "DUSP1" > getText(symbols[1]) [1] "DUSP1" > descs <- aafDescription(probeids,"hgu95av2.db")[1] > getText(descs)[1] [1] "dual specificity phosphatase 1" > gos <- aafGO(probeids,"hgu95av2.db") SPH 247 Statistical Analysis of Laboratory Data
> gos[1] An object of class "aafList" [[1]] An object of class "aafGO" [[1]][[1]] An object of class "aafGOItem" @id "GO:0006470" @name "protein amino acid dephosphorylation" @type "Biological Process" @evid "IEA" [[1]][[2]] An object of class "aafGOItem" @id "GO:0006979" @name "response to oxidative stress" @type "Biological Process" @evid "TAS" [[1]][[3]] An object of class "aafGOItem" @id "GO:0007049" @name "cell cycle" @type "Biological Process" @evid "IEA” SPH 247 Statistical Analysis of Laboratory Data
[[1]][[4]] An object of class "aafGOItem" @id "GO:0004726" @name "non-membrane spanning protein tyrosine phosphatase activity" @type "Molecular Function" @evid "TAS" [[1]][[5]] An object of class "aafGOItem" @id "GO:0005515" @name "protein binding" @type "Molecular Function" @evid "IPI" [[1]][[6]] An object of class "aafGOItem" @id "GO:0016787" @name "hydrolase activity" @type "Molecular Function" @evid "IEA" SPH 247 Statistical Analysis of Laboratory Data
[[1]][[7]] An object of class "aafGOItem" @id "GO:0017017" @name "MAP kinase tyrosine/serine/threoninephosphatase activity" @type "Molecular Function" @evid "IEA" SPH 247 Statistical Analysis of Laboratory Data
GO Evidence Codes • IEA = inferred from electronic annotation (e.g., BLAST). Uncurated • TAS = traceable author statement (i.e., someone said so). SPH 247 Statistical Analysis of Laboratory Data
IDA = inferred from direct assay • IEP = inferred from expression pattern • IGI = inferred from genetic interaction • IMP = inferred from mutant phenotype • IPI = inferred from physical interaction • ISS = inferred from sequence similarity • NAS = non-traceable author statement • ND = no biological data available • NR = not recorded SPH 247 Statistical Analysis of Laboratory Data
Online Access > gbs <- aafGenBank(probeids,"hgu95av2.db") > getURL(gbs[[1]]) [1] "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=nucleotide&term=X68277%5BACCN%5D&doptcmdl=GenBank" > lls <- aafLocusLink(probeids,"hgu95av2.db") > getURL(lls[[1]]) [1] "http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=DetailsSearch&Term=1843" SPH 247 Statistical Analysis of Laboratory Data
Abstracts > pmids <- aafPubMed(probeids,"hgu95av2.db") > pmids[[1]] An object of class "aafPubMed" [1] 15111304 14764702 14500648 12935821 12477932 9659899 8977099 8796349 8682498 8622863 8390675 8302582 [13] 8226977 7848919 7789998 7774938 > pmids[1] An object of class “aafPubMed” [1] 1406996 7535770 7806236 8106404 8168826 8221888 8389479 8390041 [9] 9571625 9599409 10617468 11062068 11278799 12080474 12356755 12391149 [17] 12432554 12477932 12506119 12765304 12890671 12947325 12960255 14551204 [25] 14680833 14702039 14724291 15059515 15173070 15247770 15339908 15448190 [33] 15489334 15569826 15590693 15614136 15677475 16044158 16081065 16224818 [41] 16286470 16289033 16293973 16387640 17073741 17131384 17489738 > browseURL(getURL(lls[[1]])) SPH 247 Statistical Analysis of Laboratory Data
Direct Browsing > browseURL(getURL(lls[[1]])) > browseURL(getURL(gbs[[1]])) > browseURL(getURL(pmids[1])) SPH 247 Statistical Analysis of Laboratory Data
Top Genes > probeids.ord <- featureNames(eset)[order(pv1$Posterior)] > getText(aafSymbol(probeids.ord[1:10],"hgu95av2.db")) [1] "" "PSPHP1" "" "COPA" "" "GM2A" "S100A2" "RPLP1" "" "" > getText(aafDescription(probeids.ord[1:10],"hgu95av2.db")) [1] "" "phosphoserine phosphatase pseudogene 1" [3] "" "coatomer protein complex, subunit alpha" [5] "" "GM2 ganglioside activator" [7] "S100 calcium binding protein A2" "ribosomal protein, large, P1" [9] "" "" > aafGO(probeids.ord[7],"hgu95av2.db") An object of class "aafList" [[1]] An object of class "aafGO" [[1]][[1]] An object of class "aafGOItem" @id "GO:0005509" @name "calcium ion binding" @type "Molecular Function" @evid "NAS" [[1]][[2]] An object of class "aafGOItem" @id "GO:0005575" @name "cellular_component" @type "Cellular Component" @evid "ND" [[1]][[3]] An object of class "aafGOItem" @id "GO:0043542" @name "endothelial cell migration" @type "Biological Process" @evid "IMP" SPH 247 Statistical Analysis of Laboratory Data
> aafGO(probeids.ord[7],"hgu95av2.db") An object of class "aafList" [[1]] An object of class "aafGO" [[1]][[1]] An object of class "aafGOItem" @id "GO:0005509" @name "calcium ion binding" @type "Molecular Function" @evid "NAS" [[1]][[2]] An object of class "aafGOItem" @id "GO:0005575" @name "cellular_component" @type "Cellular Component" @evid "ND" [[1]][[3]] An object of class "aafGOItem" @id "GO:0043542" @name "endothelial cell migration" @type "Biological Process" @evid "IMP" SPH 247 Statistical Analysis of Laboratory Data