280 likes | 455 Views
Andrew Smith. Describing childhood diet with cluster analysis 6th September 2012. Describing diet with cluster analysis. Kate Northstone Pauline Emmett PK Newby World Cancer Research Fund MRC, Wellcome Trust, University of Bristol. Describing diet with cluster analysis. Outline.
E N D
Andrew Smith Describing childhood diet with cluster analysis 6th September 2012
Describing diet with cluster analysis • Kate Northstone • Pauline Emmett • PK Newby • World Cancer Research Fund • MRC, Wellcome Trust, University of Bristol
Outline • Introductions • ALSPAC • Food frequency questionnaires / diet diaries • Dietary patterns • Cluster analysis • k-means cluster analysis • Results • 4 cluster solution • Associations with socio-demographic variables
ALSPAC • Avon Longitudinal Study of Parents and Children • Birth cohort study • 14,541 pregnant women and their children • www.bris.ac.uk/alspac
Diet diaries • Records all food and drink consumed over 3 day period • 2 weekdays and 1 weekend day • Parent completes age 7 • Child completes age 10 and 13
Dietary patterns • Examine diet as a whole • Start with many variables(food group intakes) • Express as a small number of variables Image: Paul / FreeDigitalPhotos.net
Principal components analysis (PCA) • Examine diet as a whole • Start with many variables • Use correlations between foods • Express as a small number of components Image: Paul / FreeDigitalPhotos.net
Cluster analysis • Examine diet as a whole • Start with many variables • Use similarities between people • Express as a small number of clusters Image: Paul / FreeDigitalPhotos.net
Cluster analysis • Separate subjects into non-overlapping groups • Based on ‘distances’ between individuals • Unsupervised learning Image: Boaz Yiftach / FreeDigitalPhotos.net
k-means cluster analysis • Most widely used for dietary patterns • Number of clusters, k, is specified beforehand • Minimises • Distance from each subject to his/her cluster mean • Summed over all subjects in that cluster • Summed over all clusters
Problems with the standard algorithm The algorithm for k-means cluster analysis is: • Short-sighted • Tends to find solutions that are at a local minimum • So run algorithm 100 times and choose solution that is minimum out of all minima
Reliability of the cluster solution • Split sample in half • Perform separate analyses on each half • See how many children change clusters • Repeat 5 times • 32 out of 8,279 children changed cluster (0.4%)
Results • Food frequency questionnaire (FFQ) data • Age 7 • 3 clusters • Diet diary data • Age 7, 10 and 13 • 4 clusters
30.2% of children Processed Image: Suat Eman, artemisphoto, -Marcus- / FreeDigitalPhotos.net
27.8% of children Plant-based (Healthy) Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net
21.3% of children Traditional British Image: Suat Eman, Maggie Smith, Simon Howden / FreeDigitalPhotos.net
20.6% of children Packed Lunch Image: Grant Cochrane, luigi diamanti, Rawich, Master Isolated Images / FreeDigitalPhotos.net
Summary • Multivariate methods to compress dietary data into dietary patterns • k-means cluster analysis is widespread but must be applied carefully • 3 clusters in FFQ data (Processed, Plant-based and Traditional British) • 4 clusters in diet diary data ( + Packed Lunch)
References • Northstone, AS et al. (2012) ‘Longitudinal comparisons of dietary patterns derived by cluster analysis in 7 to 13 year old children’ British Journal of Nutrition to appear. • AS et al. (2011) ‘A comparison of dietary patterns derived by cluster and principal components analysis in a UK cohort of children.’ European Journal of Clinical Nutrition65, p1102-9.