260 likes | 447 Views
Annotation for Gene Expression Analysis with Reactome.db Package. Utah State University – Spring 2012 STAT 6570 : Statistical Bioinformatics Cody Tramp. References. Ligtenberg W. 2011. Reactome.db : How to use the reactome.db package. www.reactome.org. Reactome.db Overview.
E N D
Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp
References • LigtenbergW. 2011. Reactome.db: How to use the reactome.db package. • www.reactome.org
Reactome.db Overview • “Open souce, open access, manually curated, and peer-reviewed pathway database” – www.reactome.org • Reactome.db is an R interface that allows queries to the SQL database containing pathway information • Contains functions for converting between annotation IDs and names for GO, Entrez, and Reactome
Getting Help on Specific Reactome.db Functions #Load the Reactome.db package library(reactome.db) #Check for main manual pages ?reactome.db #This won't get the actual manual #List all reactome.db objects ls("package:reactome.db") # [1] "reactome“ "reactome_dbconn“ "reactome_dbfile" # [4] "reactome_dbInfo“ "reactome_dbschema“ "reactomeEXTID2PATHID" # [7] "reactomeGO2REACTOMEID“ "reactomeMAPCOUNTS“ "reactomePATHID2EXTID" #[10] "reactomePATHID2NAME“ "reactomePATHNAME2ID“ "reactomeREACTOMEID2GO" #Look up specific manual for an object ?reactome_dbInfo #Still not very useful – poor documentation
How IDs and names are stored in Reactome.db • The reactome.db links to a SQL database • Functions are interfaces to the database • SQL databases are relational databases (think of Excel spreedsheets, but better) • Data is stored as key:value pairs
Reactome.db Function Uses(NOTE: all return a key:value list) Converting Between Entrez and Reactome reactomeEXTID2PATHID = Entrez ID to Reactome.db ID reactomePATHID2EXTID = Reactome.db Name to Entrez ID > xx <- toTable(reactomeEXTID2PATHID) > head(xx) reactome_idgene_id 1 168253 10898 2 168254 10898 3 168253 8106 4 168254 8106 5 168253 5610 6 168254 5610 Use toTable() instead of as.list() that is shown in manuals
Reactome.db Function Uses(NOTE: all return a key:value list) Converting from GO ID and Reactome ID reactomeREACTOMEID2GO = Reactome.db ID to GO IDs reactomeGO2REACTOMEID = GO ID to Reactome.db ID > xx <- toTable(reactomeGO2REACTOMEID) > head(xx) reactome_idgo_id 1 168276 GO:0019054 2 168276 GO:0019048 3 168276 GO:0044068 4 168276 GO:0022415 5 168276 GO:0051701 6 168276 GO:0044003
Reactome.db Function Uses(NOTE: all return a key:value list) Retrieving Pathway Names from Reactome IDS reactomePATHNAME2ID = Reactome.db Name to Reactome.db ID reactomePATHID2NAME = Reactome.db ID to Reactome.db Name > xx <- toTable(reactomePATHID2NAME) > head(xx) reactome_idpath_name 1 15869 Homo sapiens: Metabolism of nucleotides 2 68616 Homo sapiens: Assembly of the ORC complex at the origin of replication 3 68689 Homo sapiens: CDC6 association with the ORC:origin complex 4 68827 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex 5 68867 Homo sapiens: Assembly of the pre-replicative complex 6 68874 Homo sapiens: M/G1 Transition
Reactome.db Function Uses(NOTE: all return a key:value list) reactomeMAPCOUNTS = shows number of rows in each function’s relational database (not very useful unless error checking) > xx <- as.list(reactomeMAPCOUNTS) > xx $reactomeEXTID2PATHID [1] 28363 $reactomeGO2REACTOMEID [1] 3217 $reactomePATHID2EXTID [1] 8320 $reactomePATHID2NAME [1] 13778 $reactomePATHNAME2ID [1] 13876 $reactomeREACTOMEID2GO [1] 47575
Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10) # Get data.framesummarizing all reactome.dbpathways including a certain string xx <- toTable(reactomePATHNAME2ID) all.pathways<- xx$path_name # get name of each reactome.dbpathway t <- grep('apoptosis',all.Terms) # get index where Term includes #use agrep() for approximate term searching reactome.Term <- unlist(all.pathways[t]) reactome.IDs <- unlist(xx$reactome_id[t]) reactome.frame <- data.frame(reactome.ID=reactome.IDs, reactome.Term=reactome.Term) rownames(reactome.frame) <- 1:length(reactome.ID) reactome.frame # 13 terms
Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10)
Ex. Pathway Term Search Function ##Define Function to search for pathways with given key word ##agrep.bool is indicator to use agrep (TRUE) or grep (FALSE) searchPathways2REACTOMEID <- function(term, agrep.bool) { xx <- toTable(reactomePATHNAME2ID) all.pathways <- xx$path_name # get name of each reactome.db pathway #get index where Term is found if (agrep.bool==FALSE) (t <- grep(term, all.pathways)) else (t <- agrep(term, all.pathways)) unlist(xx$reactome_id[t]) } apop.IDs <- searchPathways2REACTOMEID("apoptosis", FALSE) length(apop.IDs) #13 pathways matched apop.IDs <- searchPathways2REACTOMEID("apoptosis", TRUE) length(apop.IDs) #85 pathways matched
Getting GO Terms from single Reactome ID ##Get List of GO Terms from Reactome ID xx <- toTable(reactomeGO2REACTOMEID) t <- xx$reactome_id == "15869" GOTerms <- xx$go_id[t] > GOTerms [1] "GO:0055086" "GO:0006139" "GO:0044281" [4] "GO:0034641" "GO:0044238" "GO:0008152" [7] "GO:0006807" "GO:0044237" "GO:0008150" [10] "GO:0009987" > xx <- toTable(reactomeGO2REACTOMEID) > head(xx) reactome_idgo_id 1 168276 GO:0019054 2 168276 GO:0019048 3 168276 GO:0044068 4 168276 GO:0022415 5 168276 GO:0051701 6 168276 GO:0044003
Getting GO Terms from list of Reactome IDs ##Define Function to get all GO Terms for all Reactome IDs in a list getGOTerms <- function(list_reactome) { listGO = list(); xx <- toTable(reactomeGO2REACTOMEID); for(i in 1:length(list_reactome)) {t <- xx$reactome_id==list_reactome[i]; temp_list = xx$go_id[t] listGO = c(listGO, temp_list)} unlist(listGO) } GOTerms.all <- getGOTerms(apop.IDs)#From slide 10 length(GOTerms.all) #136 GO Terms from 13 apop.IDs Should have yielded 169 terms (Notes 4.1 slide 10) – reactome.db might not be complete
Pathway Viewer on reactome.org http://www.reactome.org/userguide/Usersguide.html#Introduction
Pathway Viewer on reactome.org • Details Panel
Pathway Viewer on reactome.org http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
Reactome Pathway Symbols Upregulation and participating proteins Inhibition http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
Reactome Database Assignment Method • Genes seem to be assigned to pathways in a similar manner to GO database • If gene is up-regulated, it is included • Genes that are down-regulated in a condition are NOT mapped to the condition/pathway • Haven’t received official response from reactome.org, but from general browsing this seems to be the case
Pathway Analysis Tool http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
Pathway Analysis Tool http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
Summary • Reactome.db provides an interface to the SQL database containing IDs • Functions for converting between ID types • No functionality for gene testing through R • Online tools include pathway maps and ID lookup tables • Some limited expression testing (with unknown statistical methods)