210 likes | 298 Views
Andrew Smith. Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011. Describing diet with cluster analysis. Pauline M. Emmett P. Kirstin Newby Kate Northstone World Cancer Research Fund MRC, Wellcome Trust, University of Bristol. Outline.
E N D
Andrew Smith Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011
Describing diet with cluster analysis • Pauline M. Emmett • P. Kirstin Newby • Kate Northstone • World Cancer Research Fund • MRC, Wellcome Trust, University of Bristol
Outline • Introductions • ALSPAC • Food frequency questionnaires • Dietary patterns • Cluster analysis • k-means cluster analysis • Results • 3 cluster solution • Associations with socio-demographic variables
ALSPAC • Avon Longitudinal Study of Parents and Children • Birth cohort study • 14,541 pregnant women and their children • www.bris.ac.uk/alspac
Dietary patterns • Examine diet as a whole • Analyse multivariate FFQ data • Use correlations between foods • PCA • Cluster analysis Image: Paul / FreeDigitalPhotos.net
Cluster analysis • Separate subjects into non-overlapping groups • Based on ‘distances’ between individuals • Unsupervised learning Image: Boaz Yiftach / FreeDigitalPhotos.net
k-means cluster analysis • Most widely used for dietary patterns • Number of clusters, k, is specified beforehand • Minimises • Distance from each subject to his/her cluster mean • Summed over all subjects in that cluster • Summed over all clusters
Problems with the standard algorithm • Short-sighted • Tends to find solutions that are at a local minimum • So run algorithm 100 times and choose solution that is minimum out of all minima
Reliability of the cluster solution • Split sample in half • Perform separate analyses on each half • See how many children change clusters • Repeat 5 times • 32 out of 8,279 children changed cluster (0.4%)
4177 children Processed Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net
2065 children Plant-based Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net
2037 children Traditional British Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net
Summary • Multivariate methods to compress FFQ data into dietary patterns • k-means cluster analysis is widespread but must be applied carefully • Processed, Plant-based and Traditional British clusters in 7-year-old children • Associated with various socio-demographic variables