400 likes | 615 Views
Managing Data Modeling. GO Workshop 3-6 August 2010. Managing Data. Functional modeling strategy Converting between Database IDs Ensembl Biomart UniProt DAVID AgBase ArrayIDer Arrays examples to work on. Types of data sets and modeling.
E N D
Managing Data Modeling GO Workshop 3-6 August 2010
Managing Data • Functional modeling strategy • Converting between Database IDs • Ensembl Biomart • UniProt • DAVID • AgBase ArrayIDer • Arrays • examples to work on
Types of data sets and modeling • Commercial array data – more likely to have ID mapping to support functional modeling. • Custom/USDA array data – may need to do your own ID mapping: see examples on workshop page. • Proteomics data • RNA-Seq data sets – computational pipelines to assign GO (GOanna is limited; contact AgBase). • Real-time data or quantitative proteomics data – hypothesis testing.
Overview of Functional Modeling Strategy Microarray Ids GOModeler hypothesis testing Pathways and network analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID ArrayIDer Protein/Gene identifiers GO Enrichment analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID EasyGO/AgriGO Onto-Express Onto-Express-to-go (OE2GO) GORetriever Genes/Proteins with no GO annotations GO annotations summarizes GO function GOSlimViewer GOanna Yellow boxes represent AgBase tools Green/Purple boxes are non-AgBase resources
Functional Modeling Considerations • Should I add my own GO? • use GOSlimViewer to see how much GO is available for your species • use GORetriever to see how much GO is available for your dataset • Should I do GO analysis and pathway analysis and network analysis? • different functional modeling methods show different aspects about your data (complementary) • is this type of data available for your species (or a close ortholog)? • What tools should I use? • which tools have data for your species of interest? • what type of accessions are accepted? • availability (commercial and freely available)
structurally and functionally re-annotated a microarray • quantified the impact of this re-annotation based on GO annotations & pathways represented on the array • tested using a previously published experiment that used this microarray • re-annotation allows more comprehensive GO based modeling and improves pathway coverage • re-annotation resulted in a different model from previously published research findings
Converting accessions • Depending on your data set & the tools you use, you are likely to need to convert between database accessions to do your functional modeling. • UniProt database – ID mapping tab • Ensembl BioMart • Online analysis tools: • DAVID • g:profiler • GORetriever • ArrayIDer – converts EST accessions for some species (by request)
Commercial arrays Custom arrays EST arrays Proteomics RNA-Seq data Commercial ID mapping eg. NetAffy Ensembl BioMart Online tools (g:convert, DAVID) ArrayIDer UniProt ID Conversion ID Mapping
Working on your own data: • New to GO • GO browser tutorials to familiarize yourself with the GO • learn what GO is available for your species • Your own data set • functional grouping to get overview (eg. GOSlimViewer • GO enrichment analysis (tools available for your species) • Pathway analysis • Example data sets available – use as worked examples
Working on your own data: • New to GO • GO browser tutorials to familiarize yourself with the GO • learn what GO is available for your species • Your own data set • functional grouping to get overview (eg. GOSlimViewer • GO enrichment analysis (tools available for your species) • Pathway analysis • Example data sets available – use as worked examples Most of these tools (including Pathways Analysis) accept only certain database accessions need to convert accessions between databases
Example: ID conversion • Ensembl Plant Biomart tool • currently limited species, but Ensembl is adding more plants • BioMart allows sophisticated querying of genomic data • DAVID ID conversion tool • allows users to convert IDs and do GO enrichment analysis • UniProt ID conversion • highly annotated data • ArrayIDer • links ESTs to public database IDs
http://plants.ensembl.org/index.html NOTE: Ensembl is adding new plant species…
Clicking on these headings allows you to set up searches. Selecting FILTERS gives you different filtering options:
Expand GENE and check “ID list limit” to select a defined list of accessions. Enter your list of accessions.
Selecting ATTRIBUTES allows you to choose what information is reported: Check accessions from external databases (UniProt & RefSeq).
Clicking on RESULTS will show you the output information. • Output can be displayed online and/or downloaded (text, Excel). • Selecting FILTERS or ATTRIBUTES will allow you to go back and make changes. • Limited to species represented in Ensembl
2. Online analysis tools Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.abcc.ncifcrf.gov/conversion.jsp This tool works for a wide range of species.
Paste in your accession list (You can also upload a file of accessions.)
Select accession type. NOTE: If you choose “Note Sure” the tool will try to decide what type of accession you have.
Select gene list. Submit list.
Paste accession list (>1000 may cause errors). COMMENT: Note the difference between UniProt Accessions and UniProt IDs. UniProt accessions are a short string a letters and numerals 6-8 characters long. UniProt IDs have a suffix related to the species name. Eg: Cassava Hydroxynitrilase Accession: P52705 ID: HNL_MANES
Select the accession type you have: and the accession type you want to convert to: Click on MAP
The mapping link will display a tab separated file that can be displayed in Excel:
4. AgBase: ArrayIDer Maps ESTs to gene/protein accessions. Contact AgBase to request additional species.
An email will be sent with a link to the results. Results are formatted as an Excel file.
For additional help with database accessions please contact AgBase.
Working on your own data: NOTE: • Always keep note of what tool you used to do the accession ID mapping/conversion and its version/update/date. • Keep a copy of your original IDs and what they mapped to so that you can refer back to this during your modeling.
Tutorial 1: ID conversion The AgriGO GO enrichment analysis tool accepts the following inputs for rice: • GenBank ID: AAP50233.1 • DDBJ ID: BAB11514.1 • EMBL ID: CAA18188.1 • UniProt ID: Q9LYA9 • RefSeq Peptide ID: NP_564434 We will convert a list of Rice Affy IDs to these IDs for use in the AgriGO tool.
Arrays: ID Mapping • “annotation” file that shows which database accessions the probes were based on • array annotation files may include multiple database IDs • Commercial arrays – may be updated regularly • Custom/Research arrays – not updated as often • Always check when the last ID mapping was updated, as this data changes continually
Array annotation available: FHCRC chicken 13K GPL2863 Agilent-015068 Chicken Gene Expression Microarray 4x44k GPL8764 Avian Innate Immunity Microarray (AIIM) GPL1461 Affymetrix Chicken Genome Array GPL3213* UIUC Bos taurus 13.2K 70-mer oligoarray GPL2853 Affymetrix Bovine Genome Array GPL2112 Agilent-015354 Bovine Oligo Microarray (4x44K) Equine Whole Genome Oligonucleotide (EWGO) array Array annotation in progress: ARK-Genomics G. gallus 20K v1.0 GPL5480 FHCRC Chicken 13K v2.0 GPL1836 Chicken cDNA DDMET 1700 array version 1.0 GPL3265
Tutorial 1: ID conversion Work through tutorial 1 on the workshop website. Alternatively – work on your own data set during this time, using the tutorial as a guide.