210 likes | 341 Views
Working with gene lists: Finding data using GEO & BioMart. June 5 , 2014. Analyzing a gene list. With hundreds of genes but a limited budget and lab personnel, you need to prioritize the gene list to candidate genes for follow-up Pick ones that are “interesting”
E N D
Working with gene lists:Finding data using GEO& BioMart June 5, 2014
Analyzing a gene list • With hundreds of genes but a limited budget and lab personnel, you need to prioritize the gene list to candidate genes for follow-up • Pick ones that are “interesting” • Known to be involved in other related processes but not (yet) in your process of interest • Has protein features which suggest a function in your process, but it has not been characterized • No known function or domain, but it shows up in other, related high-throughput experiments suggesting a key role in your process of interest
Our approach Analyzing gene lists by: • Finding overlap with other high-throughput experiments • Finding additional information using BioMart • Mouse/human homologs • Protein domain content • GO classification
GEO (gene expression omnibus) • GEO Datasets • Curated gene expression datasets • i.e. there is backlog of experiments that haven’t made it into the database • Can search for experiments and conduct differential gene expression queries on some datasets • Can download datasets & do offline analyses • GEO Profiles • Profiles of expression data for genes
Why search GEO? • What other experiments have been done that are similar to yours? • GEO datasets • How do my genes of interest behave in other large scale experiments • GEO profiles
GEO Profile search Search on a gene name (C04F5.7):
GEO Dataset search “C. elegans”: 4434
Once dataset identified • Download data • SOFT format: tab-delimited data • Issues: • Not necessarily processed such that they have the ratios of experiment/control • If starting with raw data, may not be able to replicate exactly what authors did or lack expertise/software to generate a list of DE genes • Look for supplementary data from publication • Usually they provide a list of all DE genes
Choice of dataset for comparison In class demo
Biomart – EBI Ensembl • Use series of menus • Data source – organism (genes, variation, ect) • Filters -- reduce the number of results • Attributes – what data to return • Can set up very precise and multilayered queries • Can query across multiple organisms • Simple query: • Given a list of gene IDs, you can obtain attributes or sequences for the entire list • Tools • ID converter – very useful, easy to use
Two sites for BioMart access www.biomart.org
Biomart • Filters • C. elegans genes with a human homolog • Specify only genes with >= # isoforms • protein coding genes with a transmembrane domain • Attributes • Entrez Gene IDs, WormBase IDs, Affy IDs • Sequence data • transcript, protein, UTRs, flanking regions, ect.
BioMart • In class demo
Today’s exercise • Compare current dataset from PLoS Pathogens paper to data from a different dataset • Identify & retrieve additional information about C. elegans genes using BioMart