220 likes | 362 Views
A blind search for patterns. Unravelling low replicate data. ExSpec Pipeline. Data: Structure and variability. Structure Between 500-10,000+ features Each feature has an associate ion count for each sample aligned. Data is not normally distributed. Variability
E N D
A blind search for patterns Unravelling low replicate data
Data: Structure and variability • Structure • Between 500-10,000+ features • Each feature has an associate ion count for each sample aligned. • Data is not normally distributed. • Variability • Up to 30% technical variability • Each feature is effected differently
Data: Structure and variability The majority of features that are detected are singletons.
Low Replicate data • “Suck it and see” • One off project • Pump priming projects • Medical samples • Biopsy • Difficult to access • Ecological data • Resampling is difficult
Methods • Finger printing • PCA • Basic scoring • PDE model • Gradient search • Differential analysis
PCA • Very simple • Can be highly informative • Depends on the data • Used in pipeline • Data quality
Bruno Project • Samples : • Human biopsy • Replication – biopsy cut into equal parts PCA Analysis
N group • Non-cancer biopsy • T group • Cancer biopsy PCA Analysis Using PCA clustering we are able to distinguish between healthy and sick patients
PCA Analysis PCA reveled profile similarity which correlated with biological evidence
PCA Analysis • Human Urine project • 22 patients sampled • 11 healthy and 11 sick patients • Sample labels dropped
PCA Analysis Ecological Data Large number of samples without clear replication.
PCA Analysis • Cluster pattern: • Find the features which hold the cluster pattern
PCA Analysis Using PCA and profile similarity analysis subset of features of interest were found
Basic Scoring • Use Z-score to sort data • Use this to pull out important features. • Control – Exp • With two class problem we can use PDE modelling.
Basic Scoring : PDE modelling • Multi class problem • Plants • Wild type • act ko mutant • Treatments • Normal light • High light
Gradient Analysis • Use rate of change of abuandace to • Mine data for spesifc trends • Find features of intrest • Use PDE modelling of rates
Gradient Analysis Mining for features which showed rapid increase due to a specific treatment
Data Provided by: • Ecological data • Dave Hodgson • Nicole Goody • Gradient analysis • John Love • Data scoring • Nicholas Smirnoff • Mike Page • Brno • Ted Hupp • Rob O’Neill • Urine study • Steve Michell • John Mcgrath
Metabolomics and Proteomics Mass Spectrometry Facility @ The University of Exeter http://biosciences.exeter.ac.uk/facilities/spectrometry/ http://bio-massspeclocal.ex.ac.uk/ Nick Smirnoff (Director of Mass Spectrometry) N.Smirnoff@exeter.ac.uk Hannah Florance (MS Facility Manager) H.V.Florance@exeter.ac.uk Venura Perera (Bioinformatics and Mathematical Support) V.Perera@exeter.ac.uk
About me • Background • Applied Maths • Untargeted metabolite profiling • Research interests • Data driven modelling • Small molecule profiling • Gene regulatory network modelling • Application of mathematical methods • Metabolite identification using LC-MS/MS