1 / 38

Data analysis: Explore

GAP Toolkit 5 Training in basic drug abuse data management and analysis. Data analysis: Explore. Training session 9. Objectives. To define a standard set of descriptive statistics used to analyse continuous variables To examine the Explore facility in SPSS

cutler
Download Presentation

Data analysis: Explore

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GAP Toolkit 5Training in basic drug abuse data management and analysis Data analysis: Explore Training session 9

  2. Objectives • To define a standard set of descriptive statistics used to analyse continuous variables • To examine the Explore facility in SPSS • To introduce the analysis of a continuous variable according to values of a categorical variable, an example of bivariate analysis • To introduce further SPSS Help options • To reinforce the use of SPSS syntax

  3. SPSS Descriptive Statistics • Analyse/Descriptive Statistics/Frequencies • Analyse/Descriptive Statistics/Explore • Analyse/Descriptive Statistics/Descriptives

  4. Exercise: continuous variable • Generate a set of standard summary statistics for the continuous variable Age

  5. Explore: Age

  6. Explore: Descriptive Statistics Descriptives

  7. Exercise: Help • What’s This? • Results Coach • Case Studies

  8. Measures of central tendency • Most commonly: • Mode • Median • Mean • 5 per cent trimmed mean

  9. The mode • The mode is the most frequently occurring value in a dataset • Suitable for nominal data and above • Example: • The mode of the first most frequently used drug is Alcohol, with 717 cases, approximately 46 per cent of valid responses

  10. Bimodal • Describes a distribution • Two categories have a large number of cases • Example: • The distribution of Employment is bimodal, employment and unemployment having a similar number of cases and more cases than the other categories

  11. The median • The middle value when the data are ordered from low to high is the median • Half the data values lie below the median and half above • The data have to be ordered so the median is not suitable for nominal data, but is suitable for ordinal levels of measurement and above

  12. Example: median • Seizures of opium in Germany, 1994-1998 (Kilograms) • Source:United Nations (2000). World Drug Report 2000 (United Nations publication, Sales No. GV.E.00.0.10).

  13. Sort the seizure data in ascending order • The middle value is the median; the median annual seizures of opium for Germany between 1994 and 1998 was 42 kilograms Ranked:1 2 3 4 5

  14. The mean • Add the values in the data set and divide by the number of values • The mean is only truly applicable to interval and ratio data, as it involves adding the variables • It is sometimes applied to ordinal data or ordinal scales constructed from a number of Likert scales, but this requires the assumption that the difference between the values in the scale is the same, e.g. between 1 and 2 is the same as between 5 and 6

  15. Example: mean • Seizures of opium in Germany, 1994-1998 • Sample size = 5 • 36 + 15 + 45 + 42 + 286 = 424 • 424/5 = 84.8

  16. The 5 per cent trimmed mean • The 5 per cent trimmed mean is the mean calculated on the data set with the top 5 per cent and bottom 5 per cent of values removed • An estimator that is more resistant to outliers than the mean

  17. 95 per cent confidence interval for the mean • An indication of the expected error (precision) when estimating the population mean with the sample mean • In repeated sampling, the equation used to calculate the confidence interval around the sample mean will contain the population mean 95 times out of 100

  18. Measures of dispersion • The range • The inter-quartile range • The variance • The standard deviation

  19. The range • A measure of the spread of the data • Range = maximum – minimum

  20. Quartiles • 1st quartile: 25 per cent of the values lie below the value of the 1st quartile and 75 per cent above • 2nd quartile: the median: 50 per cent of values below and 50 per cent of values above • 3rd quartile: 75 per cent of values below and 25 per cent of the values above

  21. Inter-quartile range • IQR = 3rd Quartile – 1st Quartile • The inter-quartile range measures the spread or range of the mid 50 per cent of the data • Ordinal level of measurement or above

  22. Variance • The average squared difference from the mean • Measured in units squared • Requires interval or ratio levels of measurement

  23. Standard deviation • The square root of the variance • Returns the units to those of the original variable

  24. Example: standard deviation and variance Seizures of opium in Germany, 1994-1998

  25. Distribution or shape of the data • The normal distribution • Skewness: • Positive or right-hand skewed • Negative or left-hand skewed • Kurtosis: • Platykurtic • Mesokurtic • Leptokurtic

  26. f(X) Mean Median Mode X The normal distribution • Symmetrical data: the mean, the median and the mode coincide

  27. f(X) Mode Median Mean X Right-hand skew (+) • Right-hand skew: the extreme large values drag the mean towards them

  28. Left-hand skew: the extreme small values drag the mean towards them f(X) Mean Median Mode X Left-hand skew (-)

  29. Bivariate analysis • Continuous Dependent Variable • Categorical Independent Variable

  30. Explore

  31. Explore: Options button

  32. Explore: Plots button

  33. Explore: Statistics button

  34. Descriptives

  35. Male Female

  36. Boxplot of Age vs Gender Outlier Median Inter-quartile range

  37. Syntax: Explore EXAMINE VARIABLES=age BY gender /ID=id /PLOT BOXPLOT HISTOGRAM /COMPARE GROUP /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL.

  38. Measures of central tendency Measures of variation Quantiles Measures of shape Bivariate analysis for a categorical independent variable and continuous dependent variable Histograms Boxplots Summary

More Related