140 likes | 279 Views
Mining publicly available microarray data. Frances Turner Fsturner@ic.ac.uk. Introduction. Publicly available data Method for data mining Application to Tuberculosis and Campylobacter. Capsule synthesis in C.jejuni. In which dataset(s) do these genes show changed expression?
E N D
Mining publicly available microarray data Frances Turner Fsturner@ic.ac.uk
Introduction • Publicly available data • Method for data mining • Application to Tuberculosis and Campylobacter
Capsule synthesis in C.jejuni • In which dataset(s) do these genes show changed expression? • Identify useful data • Improve biological understanding
Publicly available data • Increasing volume of data • Different depositories • Different standards • Difficult to compare experiments
Publicly available data Campylobacter 18 experiments 126 conditions M.bovis/M.tuberculosis 34 experiments 539 conditions
Identification of sets of differentially expressed genes • GSEA commonly used (Subramanian et al 2005) • Threshold independent • Small but biologically significant changes
GSEA applied to multiple expression datasets Cj1099 Cj0812 Cj1494c Cj1457c Cj0434 Cj1307 Cj0028 Cj1294 Cj1393 Cj1303 Cj1368 Cj0597 Cj1309c Cj0505c
GSEA applied to multiple expression datasets Cj1099 Cj0812 Cj1494c Cj1457c Cj0434 Cj1307 Cj0028 Cj1294 Cj1393 Cj1303 Cj1368 Cj0597 Cj1309c Cj0505c Cj0172 Cj1099 Cj0028 Cj0812 Cj1494c Cj0741 Cj1457c Cj1303 Cj0434 Cj1393 Cj1307 Cj1294 Cj1393 Cj1309c Cj0812 Cj1494c Cj1307 Cj0434 Cj1393 Cj0028 Cj1294 Cj0597 Cj0145c Cj1368 Cj0432 Cj1309c Cj0505c
GSEA applied to multiple expression datasets • Allows correction for multiple datasets • Not confounded by correlations between datasets
Summary Collect available microarray data GSEA based analysis Put different datasets in to comparable formats Identification of experimental conditions of interest
Work in progress • Collaboration with Chris Tomlison to create user interface • Host of CISBIC server • Allow users to test their own gene sets or expression datasets.