560 likes | 741 Views
Topic 2. Summarising data / Levels of measurement / Introduction to SPSS. Main Issues for this session. Levels of measurement Data types: nominal, ordinal, interval, ratio Linking data types to statistical analyses Introduction to SPSS. Reading. Chapter 2 and Chapter 3
E N D
Topic 2 Summarising data / Levels of measurement / Introduction to SPSS
Main Issues for this session • Levels of measurement • Data types: nominal, ordinal, interval, ratio • Linking data types to statistical analyses • Introduction to SPSS
Reading Chapter 2 and Chapter 3 Frequency Distributions and Graphic Representation Fundamentals of Statistical Reasoning in Education, Colardarci et al.
Preparing a questionnaire and codebook • Example questionnaire: • Example codebook: • Example codebooks: http://pisa2006.acer.edu.au/downloads.php WB_Pupil_MP.doc Pupil_codebooks.xls
Codebook - 1 • A codebook should be prepared as a questionnaire is developed • The purposes of a codebook are • To facilitate data entry, with codes shown on the questionnaire if possible • To plan for analysis; to help with determining the types of analyses that are appropriate.
Codebook - 2 • Numeric codes are easier to enter than alphabetic codes • Consider the appropriate field width and range of answers. These can be useful feedback to questionnaire design as well. • Decide how to handle missing responses
Getting data into SPSS - 1 • The EXCEL file contains the pupil questionnaire data • Import this data set into SPSS: • Start SPSS Puipl_data.xls
Getting data into SPSS - 2 • Select from Menu • File -> Open -> Data
Getting data into SPSS - 3 • Find the folder where the EXCEL file is stored. • In the file open dialog box, make sure the file type is set to xls. Select file Pupil_data.xls File type set to “xls”
Getting data into SPSS - 4 • Make sure the check box for “Read variable names from the first row of data” is checked. (The EXCEL file has variable names in the first row, and these will be read in as SPSS variable names as well. Check this box
Toggle between data view and variable view • The tab at the bottom left corner shows the data view or variable view. Data view or Variable view
Add Variable labels for variables 4 to 9 (PDOBDD to PHOMLANG) Variable label
Add Value labels for variable PSEX • (The column after Variable Labels). • Click in the value labels cell and the following dialog box appears
Add missing values for variable PSEX • (The column after Value Labels) • Click in the Missing values cell and a dialog box appears. • Enter values representing missing values
Practice for other variables • Set variable labels, value labels and missing values for some other variables • Copy and pasting value labels and missing values from a set of cells to other cells can be done. • Make sure you save the file often!!
Frequencies • For which types of variables, will it be appropriate to compute frequencies? Nominal, ordinal, interval and ratio? • For which types of variables, will it be appropriate to compute averages? Nominal, ordinal, interval and ratio?
Compute frequencies in SPSS -1 • Select from menu • Analyze -> Descriptive Statistics -> Frequencies
Compute frequencies in SPSS -2 • Select the variables in the left-hand box and move them to the right-hand box. • Press OK.
Compute frequencies in SPSS -3 • Explore the options under the Statistics and Charts buttons, and see what kinds of output you can produce. • Compute frequencies for other variables as a practice.
Constructs in a questionnaire - 1 • Sometimes we are interested in a measure that is not directly obtainable/observable as questions like “are you a boy or a girl”. • For example, socio-economic status is something that we have an interest in, but it is a concept (like well-being) rather than something that we can see and directly measure. • Such concepts are often called constructs, or latent variables.
Constructs in a questionnaire - 2 • Sociologists and statisticians have developed methodologies to measure constructs (or latent variables). • Psychometrics is the science of the measurement of latent variables. • The field of psychometrics include classical test theory (CTT) and item response theory (IRT)
Constructs in a questionnaire - 3 • To measure a construct, typically a number of observable indicators are collected (e.g., through a questionnaire). • The data from these indicators are aggregated in some way (e.g., to form a total score) to be used as a measure of the construct for each individual.
Constructs in a questionnaire - 4 • A simple way to aggregate the indicators into a measure for a construct is just to sum the scores for the set of questions for each student. • These sums (or measures of the constructs) can then be used as new variables as the basis of further statistical analysis. • There are more sophisticated ways to aggregate the indicator scores into a construct score (e.g, using item response theory models).
Constructs in a questionnaire - 5 • In SPSS, calculate sum scores for each construct you identified, for each student. • You can then use these new variables for further analyses. • Watch animated demo on how to compute sum scores. • HowToComputeSumScores_demo.swf
Outline • Categorical variables (ordinal and nominal) • Continuous variables (interval and ratio)
Download from subject website • Data file from TIMSS 2003 study for Australia TIMSS2003AUS.sav • Student Questionnaire from TIMSS 2003 study for Australia T03_Student_8.pdf
Categorical data • Nominal- numbers are used only as labels for different objects within a set. For example, • gender • idbook (there are 12 different test booklets) • Ordinal - numbers are used to reflect the rank order of objects within a set according to a specific criterion • bsbgbook (number of books in the home) • bsbgmfed (mother’s education level)
Summary of categorical variables • In general, summary of categorical variables addresses the questions: • How many categories? • How many cases in each category or What are the proportions of cases in each of the categories? • If a variable is ordinal, questions regarding trends and association can be considered. • Examples: • For data file TIMSS2003AUS.sav, the possible questions could be: • What are the proportions of female and male students in the study? • What are the levels of education of parents for the students surveyed? • Is there an association between levels of education of parents and number of books in the home?
Hands-on (1) • Are there more girls than boys? • Is there an association between Father’s education level and the number of books at home? • Follow animated demo • frequency_1_demo • frequency_2_demo • Explore_1_demo • Explore_1_output_demo
Hands-on (2) • Is there a difference between girls and boys in terms of whether they enjoy mathematics (variable bsbmtenj)? • Follow animated demo • Crosstab_1_demo • Crosstab_1_output_demo
Hands-on (3) • Is there a difference between girls and boys in terms of whether they enjoy SCIENCE (variable bsbstenj, (var 67))?
Things to watch out for in comparing frequencies - 1 • Consider if you should compare raw frequencies or percentages. • For percentages, make sure the denominator (total) is the appropriate one to use. For example, check row total, column total, overall total. • Check the scale to make sure there is no exaggeration of differences
Things to watch out for in comparing frequencies – Raw score or percentage?
Things to watch out for in comparing frequencies – Raw score or percentage? • Percentages are better because there are many more students speaking the test language at home than those who do not.
Things to watch out for in comparing frequencies – Check magnitude of scale • The graph on the right shows large differences. But check the scale on the vertical axis. There are only a few students. We can’t say there is a great difference. • Beware of visual deception.
Continuous data • Interval - numbers reflect both the rank order of objects and the extent of the differences between them (e.g. temperature) • Ratio - scale has an absolute zero and hence a ratio of scores is independent of the units of the scale (e.g. height, weight, age. )
Summary of continuous variables Example of Questions • What is the average score that the students surveyed get? • What is the middle score? (median) • Which is the most frequent score? (mode) • What is the highest score ? (max) • What is the lowest score? (min) • What is the range of students’ scores? (range) • To what extent are the scores close to the mean? (variance and standard deviation)
Mean and Median • Mean (average, expected value) • Sum observations / number of observations • Median • 50% subjects below and 50% subjects above
Variance and Standard deviation Where µ is the mean, and n is the number of observations.
Normal Distribution • Many variables have a distribution shaped like a bell curve.
Example descriptive statistics • Variable 154 (bsmmat01) is an estimate of a student’s mathematics achievement. • Follow animated demo: • descriptive_1_demo
Histogram of continuous variable • Frequency analysis and bar charts may fail because there are too many categories. • Use histogram. • Variable 154 (bsmmat01) is an estimate of a student’s mathematics achievement. • Follow animated demo: • histogram_1_demo
Compare histograms for groups • Compare mathematics achievement distributions between groups based on father’s education level. • Follow animated demo: • histogram_2_demo
Box-Plots • Box-plots are graphical representations of the data in a five-number summary with the addition of ‘cutoffs’ or ‘fences’ for the identification of possible outliers (individual data points are plotted beyond the fences if they occur)
Box plot for mathematics achievement • Follow animated demo: • boxplot_1_demo • boxplot_2_demo
Output of Box-plot of mathematics scores by father’s education level
Parametric and Non-parametric • Mean and Median • Mean: average • Median: • score at the 50th percentile. • The middle value
Mean and Median • If the distribution of scores is symmetrical, the mean and median will be close. • If the distribution is skewed, then the mean and median will be quite different. • Mean is sensitive to outliers • Median is not sensitive to outliers • Example: income distribution