180 likes | 294 Views
Using large data sets to study factors associated with the incidence of multiple sclerosis. Tamah Fridman David Glick John Kidd. Multiple Sclerosis (MS). A complex autoimmune disease with both acute and chronic phases. Confounding factors include: genetic background
E N D
Using large data sets to study factors associated with the incidence of multiple sclerosis. TamahFridman David Glick John Kidd
Multiple Sclerosis (MS) • A complex autoimmune disease with both acute and chronic phases. • Confounding factors include: • genetic background • viral infections including EBV and HSV • nutritional factors • environmental factors such as latitude and smoking
Multiple Sclerosis (MS) • In a more general way, this module could be used to explore the difference between correlation and causation. • For use in a course, the instructor will supply appropriate background information on the immune response as applied to MS.
Multiple Sclerosis (MS) • There is a vast literature examining the effects of • geography • migration • infectious diseases • sunlight related to vitamin D levels • cigarette smoking • diet • hormones
Multiple Sclerosis (MS) • Over time a number of data sets have been published that explore relationships between environmental factors and MS. • Many of these are single studies that were later included in one or more “meta-analysis” articles. • In addition, there are incidence statistics available from a variety of sources such as CDC, World Life Expectancy.com, WHO, and others.
Multiple Sclerosis (MS) • In order to demonstrate the module’s potential, we have constructed several examples of analysis using a variety of techniques linking MS incidence to rainfall and viral diseases via: • A GIS plot • A scatter plot • 3-D Principle Component Analysis (PCA) • These are based on the same data to demonstrate that large data sets can be visualized and analyzed in a variety of ways.
Multiple Sclerosis (MS) • Link to interactive ArcGIS plot: • http://arcgis.com/explorer/?open=2e7723700ef942b7a5aa2f8cbd96a5fc&extent=37882315.9514645,2989772.13723539,44144037.3085845,6061929.17807238
Multiple Sclerosis (MS) • The Excel function “Correl” was used to look for correlations with MS rates and a series of viral diseases and a “lifestyle” disease. • Hepatitis C: -0.0152 • Cervical cancer: -0.34991 • Liver cancer: -0.25501 • HIV: -0.1451 • Lung cancer: 0.547928
Multiple Sclerosis (MS) This slide is a sample—the complete spreadsheet contains 192 countries.
Multiple Sclerosis (MS) • The above spreadsheet data were also used to construct scatter plots of MS v Hepatitis C (a viral disease) and also v Lung Cancer (an environmental/lifestyle disease). These plots follow.
Multiple Sclerosis (MS) • The complete Excel spreadsheet was also used in Principal Component Analysis (PCA). • The data were saved in a tab delimited format and then imported into the NIA Array Analysis Tool for Principle Component Analysis. • The results are password protected on this site: http://lgsun.grc.nia.nih.gov/ANOVA/index.html
Multiple Sclerosis (MS) • As something completely different, meta-analysis data were extracted into Excel, transformed into a PGPLOT, and a Fortran program was written to analyze and display these data. • A great deal of difficulty was encountered fitting disparate data points into congruent categories, so the following graph are shown with some reservation. • However, students “inventing” their own analysis can be expected to encounter similar problems.
Multiple Sclerosis (MS) • We are deeply indebted to: • Ileana Betancourt and Colleen McLinn for help with GIS • Jeff Lutgen and Bruce Wiggins for help with Excel.