1 / 21

Andrew Smith

Andrew Smith. Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011. Describing diet with cluster analysis. Pauline M. Emmett P. Kirstin Newby Kate Northstone World Cancer Research Fund MRC, Wellcome Trust, University of Bristol. Outline.

Download Presentation

Andrew Smith

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andrew Smith Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011

  2. Describing diet with cluster analysis • Pauline M. Emmett • P. Kirstin Newby • Kate Northstone • World Cancer Research Fund • MRC, Wellcome Trust, University of Bristol

  3. Outline • Introductions • ALSPAC • Food frequency questionnaires • Dietary patterns • Cluster analysis • k-means cluster analysis • Results • 3 cluster solution • Associations with socio-demographic variables

  4. ALSPAC • Avon Longitudinal Study of Parents and Children • Birth cohort study • 14,541 pregnant women and their children • www.bris.ac.uk/alspac

  5. Food frequency questionnaires

  6. Dietary patterns • Examine diet as a whole • Analyse multivariate FFQ data • Use correlations between foods • PCA • Cluster analysis Image: Paul / FreeDigitalPhotos.net

  7. Cluster analysis • Separate subjects into non-overlapping groups • Based on ‘distances’ between individuals • Unsupervised learning Image: Boaz Yiftach / FreeDigitalPhotos.net

  8. k-means cluster analysis • Most widely used for dietary patterns • Number of clusters, k, is specified beforehand • Minimises • Distance from each subject to his/her cluster mean • Summed over all subjects in that cluster • Summed over all clusters

  9. k-means cluster analysis

  10. Problems with the standard algorithm • Short-sighted • Tends to find solutions that are at a local minimum • So run algorithm 100 times and choose solution that is minimum out of all minima

  11. Standardising the input variables

  12. Reliability of the cluster solution • Split sample in half • Perform separate analyses on each half • See how many children change clusters • Repeat 5 times • 32 out of 8,279 children changed cluster (0.4%)

  13. 4177 children Processed Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net

  14. 2065 children Plant-based Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net

  15. 2037 children Traditional British Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net

  16. Associations with socio-demographic vars

  17. Associations with socio-demographic vars

  18. Associations with socio-demographic vars

  19. Associations with socio-demographic vars

  20. Associations with socio-demographic vars

  21. Summary • Multivariate methods to compress FFQ data into dietary patterns • k-means cluster analysis is widespread but must be applied carefully • Processed, Plant-based and Traditional British clusters in 7-year-old children • Associated with various socio-demographic variables

More Related