210 likes | 317 Views
EGAN Tutorial: Loading Network Data. October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center jesse.paquette@cc.ucsf.edu. Preamble. This document has many slides with multi-step animations Best viewed in Slide Show mode
E N D
EGAN Tutorial:Loading Network Data October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center jesse.paquette@cc.ucsf.edu
Preamble • This document has many slides with multi-step animations • Best viewed in Slide Show mode • The EGAN graphical user interface is evolving • Icons may change • Menus may change • Button/widget placement may change • This document probably won’t change as quickly • Please contact the developers if you notice major discrepancies between this and EGAN
Loading network data: An overview The EGAN pre-collated network represents only a fraction of available data Additional data can be loaded as Gene sets/association nodes Pathways, annotation terms, articles, transcription factor targets, miRNA targets, conserved domains, significant gene sets/clusters from experiments, etc. Gene-gene edges Protein-protein interactions, literature co-occurrence, expression correlation, sequence homology, transcription factor targets, kinase targets, etc. This document will outline the steps for loading additional gene sets and gene-gene edges into EGAN
Loading gene sets into EGAN:Gene set file formats • Two possible tab-delimited text formats • GMT • All default pre-collated gene sets in EGAN are all specified via GMT files • Each row represents a different gene set • GMX • Transposed GMT • Each column represents a different gene set • First two columns of GMT (or rows for GMX) specify • Gene set ID (first column) • Can potentially be used to link out to the gene set’s web page via URL • Gene set name (second column) • Can be empty or same as the ID • Subsequent columns list the genes in each set • Gene identifiers must be mappable to Entrez Gene IDs • EGAN provides a wide variety of mapping file options • Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc. • EGAN expects that all entity IDs are the same type for each file
Loading gene sets into EGAN: An example Each row is a gene set Later columns: gene identifiers First column: gene set IDs Second column: gene set names
Loading gene sets into EGAN: An example Save as tab-delimited text
Loading gene sets into EGAN: An example • Download or construct a gene set file • This example will use c2.cgp.v2.5.symbols.gmt from MSigDB(download this file to follow along) • You’ll have to log-in with your email address to download MSigDB gene sets • Launch EGAN H. sapiens
Loading gene sets into EGAN: An example Click “Browse…” Now specify that these gene sets are of type “MSigDB C2: chemical and genetic perturbations” by selecting that option from the drop-down menu. This MSigDB type has been pre-defined for EGAN, which is why it exists in this menu. Shown are the default pre-collated gene sets. We want to load a new one. This GMT file uses Gene Symbols for gene identifiers. Select “HUGO Gene Symbol” from the drop-down menu. Select your GMT file and click “Specify gene association set”. Click on “7) Association Data” When you are finished loading data, click “Finish – Launch EGAN”. Finally, click “Add Set”
Loading gene sets into EGAN: An example Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file. If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).
Loading gene sets into EGAN: An example When EGAN finishes loading, your new set(s) will be available for exploration
Loading gene-gene edges into EGAN:File formats • Two possible tab-delimited text formats • SIF (Simple Interaction File) format commonly used in Cytoscape • .sif extension (required in EGAN) • Each line represents a gene-gene relationship • Three columns • First column is first gene • Middle column is ignored in EGAN • Third column is second gene • EGAN interaction file format • .txt file extension • Three columns, like SIF • Middle column is a PubMed ID • Gene identifiers must be mappable to Entrez Gene IDs • EGAN provides a wide variety of mapping file options • Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc. • EGAN expects that all entity IDs are the same type for each file
Loading gene-gene edges into EGAN: An example Each row is a gene-gene relationship Third column: second gene First column: first gene
Loading gene-gene edges into EGAN: An example Save as tab-delimited text
Loading gene-gene edges into EGAN: An example • Download or construct a gene-gene edge file • This example will use HPN.sif, a set of kinase-target relationships available in the “.sif Gzip-ed files” link at NetworKIN (download this file to follow along) • You’ll have to accept the NetworKIN license in order to download data • Launch EGAN H. sapiens
Loading gene-gene edges into EGAN: An example Click “Browse…” Now specify that these gene sets are of type “NetworKIN” by selecting that option from the drop-down menu. The NetworKIN type has been pre-defined for EGAN, which is why it exists in this menu. Shown are the default pre-collated gene-gene edge files. We want to load a new one. This SIF file uses Gene Symbols for gene identifiers. Select “HUGO Gene Symbol” from the drop-down menu. Select your SIF (or EGAN .txt) file and click “Specify gene-gene edge set” Click on “8) Gene Relationship Edges” When you are finished loading data, click “Finish – Launch EGAN” Finally, click “Add Set”
Loading gene-gene edges into EGAN: An example Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file. If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).
Loading gene-gene edges into EGAN: An example When EGAN finishes loading, your new gene-gene edges will be available for exploration
Loading network data: Tips and hints • Both the MSigDB and NetworKIN types were pre-defined in EGAN • This may not be the case for your new data • You can use the “Custom Node/Custom Edge” types as a default • You can specify your own type definitions in a Type Definition file • Give your added nodes and edges distinct colors and links • See item 4 in the Launch EGAN Wizard • Use this type definition file as a template – just add the appropriate lines for your new types • You can specify gene set, gene-gene edge and mapping files via URL (or .jar file, but that’s tricky) • Just type or paste the URL into the appropriate text field instead of clicking “Browse…” • Potential issues to consider • Identifiers used in your gene set/gene-gene edge file might not be found in the mapping file • Genes in your mapping file might not be present in the network • These issues are written (rather crudely) to the Log • Inspect the log file if you notice unexpected behavior
Questions/comments? • Visit http://groups.google.com/group/ucsf-egan for downloads, documentation and discussion • Requires an account with Google Groups