380 likes | 500 Views
GAP Toolkit 5 Training in basic drug abuse data management and analysis. Data analysis: Explore. Training session 9. Objectives. To define a standard set of descriptive statistics used to analyse continuous variables To examine the Explore facility in SPSS
E N D
GAP Toolkit 5Training in basic drug abuse data management and analysis Data analysis: Explore Training session 9
Objectives • To define a standard set of descriptive statistics used to analyse continuous variables • To examine the Explore facility in SPSS • To introduce the analysis of a continuous variable according to values of a categorical variable, an example of bivariate analysis • To introduce further SPSS Help options • To reinforce the use of SPSS syntax
SPSS Descriptive Statistics • Analyse/Descriptive Statistics/Frequencies • Analyse/Descriptive Statistics/Explore • Analyse/Descriptive Statistics/Descriptives
Exercise: continuous variable • Generate a set of standard summary statistics for the continuous variable Age
Explore: Descriptive Statistics Descriptives
Exercise: Help • What’s This? • Results Coach • Case Studies
Measures of central tendency • Most commonly: • Mode • Median • Mean • 5 per cent trimmed mean
The mode • The mode is the most frequently occurring value in a dataset • Suitable for nominal data and above • Example: • The mode of the first most frequently used drug is Alcohol, with 717 cases, approximately 46 per cent of valid responses
Bimodal • Describes a distribution • Two categories have a large number of cases • Example: • The distribution of Employment is bimodal, employment and unemployment having a similar number of cases and more cases than the other categories
The median • The middle value when the data are ordered from low to high is the median • Half the data values lie below the median and half above • The data have to be ordered so the median is not suitable for nominal data, but is suitable for ordinal levels of measurement and above
Example: median • Seizures of opium in Germany, 1994-1998 (Kilograms) • Source:United Nations (2000). World Drug Report 2000 (United Nations publication, Sales No. GV.E.00.0.10).
Sort the seizure data in ascending order • The middle value is the median; the median annual seizures of opium for Germany between 1994 and 1998 was 42 kilograms Ranked:1 2 3 4 5
The mean • Add the values in the data set and divide by the number of values • The mean is only truly applicable to interval and ratio data, as it involves adding the variables • It is sometimes applied to ordinal data or ordinal scales constructed from a number of Likert scales, but this requires the assumption that the difference between the values in the scale is the same, e.g. between 1 and 2 is the same as between 5 and 6
Example: mean • Seizures of opium in Germany, 1994-1998 • Sample size = 5 • 36 + 15 + 45 + 42 + 286 = 424 • 424/5 = 84.8
The 5 per cent trimmed mean • The 5 per cent trimmed mean is the mean calculated on the data set with the top 5 per cent and bottom 5 per cent of values removed • An estimator that is more resistant to outliers than the mean
95 per cent confidence interval for the mean • An indication of the expected error (precision) when estimating the population mean with the sample mean • In repeated sampling, the equation used to calculate the confidence interval around the sample mean will contain the population mean 95 times out of 100
Measures of dispersion • The range • The inter-quartile range • The variance • The standard deviation
The range • A measure of the spread of the data • Range = maximum – minimum
Quartiles • 1st quartile: 25 per cent of the values lie below the value of the 1st quartile and 75 per cent above • 2nd quartile: the median: 50 per cent of values below and 50 per cent of values above • 3rd quartile: 75 per cent of values below and 25 per cent of the values above
Inter-quartile range • IQR = 3rd Quartile – 1st Quartile • The inter-quartile range measures the spread or range of the mid 50 per cent of the data • Ordinal level of measurement or above
Variance • The average squared difference from the mean • Measured in units squared • Requires interval or ratio levels of measurement
Standard deviation • The square root of the variance • Returns the units to those of the original variable
Example: standard deviation and variance Seizures of opium in Germany, 1994-1998
Distribution or shape of the data • The normal distribution • Skewness: • Positive or right-hand skewed • Negative or left-hand skewed • Kurtosis: • Platykurtic • Mesokurtic • Leptokurtic
f(X) Mean Median Mode X The normal distribution • Symmetrical data: the mean, the median and the mode coincide
f(X) Mode Median Mean X Right-hand skew (+) • Right-hand skew: the extreme large values drag the mean towards them
Left-hand skew: the extreme small values drag the mean towards them f(X) Mean Median Mode X Left-hand skew (-)
Bivariate analysis • Continuous Dependent Variable • Categorical Independent Variable
Boxplot of Age vs Gender Outlier Median Inter-quartile range
Syntax: Explore EXAMINE VARIABLES=age BY gender /ID=id /PLOT BOXPLOT HISTOGRAM /COMPARE GROUP /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL.
Measures of central tendency Measures of variation Quantiles Measures of shape Bivariate analysis for a categorical independent variable and continuous dependent variable Histograms Boxplots Summary