690 likes | 717 Views
Learn how to navigate SPSS, create new data files, import data, recode variables, and produce plots and summaries. Download workshop materials for hands-on practice.
E N D
Getting started with SPSSDr Jenny FreemanMathematics & Statistics HelpUniversity of Sheffield
Learning outcomes By the end of this session you should understand: • The different windows in SPSS • The difference between Data View and Variable View By the end of this session you should be able to: • Open SPSS and create a new data file • Read data in from existing files, both SPSS and EXCEL • Recode data to create new variables • Produce simple plots, data summaries and tables • Edit plots • Save files and output
Download the slides and data In your web browser, type in the following address and save the files to your computer: http://www.sheffield.ac.uk/mash/workshop_materials
Questionnaire Q1: Which of these colours do you like most? Q2: I love maths: Q3: How long did it take you to travel here today, in minutes? • What data types are these? • How do we put these data into SPSS?
Entering data into SPSS • When entering information into SPSS need to code it numerically. Can then label it to make this understandable • Note that for ‘Colour’ and ‘ Love maths’, this does not make them numerical! Colour = 5 blue 2 = Disagree
Open SPSS • Get IBM SPSS Statistics from ‘All programs’ Select ‘New Dataset’
Data View • It opens in Data View window • This is where you input your data
Variable View The Variable View window allows you to create variables, name them, select their type, code missing values and create labels for different values
Variable View Give variables names and labels Label appears in output 0 decimals for integers No gaps or symbols String means words
Value labels • Give the numbers labels • Click on the blue box • Enter the value label for each number, then ‘Add’ until all numbers are labelled
Missing values You can give missing values a particular value e.g. -99, and in the labels tab you can then label then e.g. -99=did not finish, -98= not applicable Missing values can be identified in the same way as value labels
Entering data into SPSS One variable per column One row per subject
Data types Some operations in SPSS can only be performed on certain data types Choose the right data type using the measure column
Recoding variables: Automatic recode • You can use the ‘Automatic Recode’ function to recode string variables into numeric variables, such as the Gender variable you have created Transform Automatic Recode … • It will assign numbers for the data in aphabetical order e.g. F = 1; M = 2
File Save as Save the dataset as an SPSS file using File Save as SPSS data files end in .sav
Exercise 1: Entering data into SPSS • Open SPSS and select ‘New Dataset’ from the options • Go to ‘Variable View’ and create the dataset template for inputting the 5 variables on the handout: ID, Colour, Maths, Time, Gender • Input the data, creating numeric codes for ‘Favourite colour’ and ‘Love_maths’. Input ‘Gender’ as a string variable • Use Automatic recode to create a new Gender variable with numeric codes rather than the string variable
Opening an Excel file: Titanic data • The ship Titanic sank in 1912 with the loss of most of its passengers • Data are available on 1309 passengers and crew on board the ship Titanic
Opening an Excel file Open the Excel file ‘SPSS_workshop_data’ File Open Data Select Excel from the ‘Files of type’ menu and then select the Excel file required and ‘Open’
Opening an Excel file • SPSS only opens one sheet at a time so if there are multiple worksheets, make sure you are opening the correct sheet. Note: There must only be one header row in Excel • Give the variables and values suitable labels (see next slide), and choose the right data types
Opening an Excel file Suggested labels
File Save as Save the dataset as an SPSS file using File Save as SPSS data files end in .sav
Exercise 2: Titanic Which variables could be used to investigate whether ‘wealthy’ people were more likely to survive?
Summarising categorical data • %’s are preferable unless there are small frequencies • Bar/ pie charts for one categorical variable • Contingency tables and stacked/ multiple bar charts for assessing relationships between 2 categorical variables • Research question: Who are the most dangerous drivers in Britain?
Exercise 3: Dangerous drivers Is there a relationship between age, gender and accidents? Could this data display be improved?
Frequencies How many people were travelling in each class? Analyze Descriptive Statistics Frequencies Move class from the left to the right hand side using the middle arrow, then ‘OK’
Output window Charts and tables appear in a separate window in SPSS
File Save as Output files are saved separately to the data Give the output file a sensible name e.g. Titanic summary statistics
Were class and survival related? Analyze Descriptive Statistics Crosstabs Move class to ‘Row’ and survived to ‘Columns’ using the arrow • Select ‘Cells’ to get the % options • Choose ‘Row’ to get summary percentages across the rows
Crosstabulation Who was most likely to die?
Graph menu Graphs Legacy dialogs Bar Charts for categorical data Charts for scale data
Variable across the x-axis (Class): Different bars Stacked bar chart Graphs Legacy Dialogs Bar Variable to split the bars (survived)
Editing the chart Double click on the chart to open the editing window
Changing the font size • The font is often too small so change it to 12 on the axes and 14 for the title • Click on the words on the bottom • Select font 12 • Repeat for other labels
Changing the colours Double click on one of the bars, then single click on the one of the bar sections that you want to change the colour of
Changing the colours Select Fill & Border from the properties menu Change the fill colours of the bars Change the border colour Change the pattern Click Apply after each change
Further options for editing Turn bars to 100% Add a title Add data labels Double click on bar to open properties window to change colour
Changing the labels • To edit the labels, click to access the properties window and select ‘Data Value Labels’ • Move: ‘Percent’ up ‘Count’ down • Use the ‘Number format’ tab to display to 0 decimal places and increase font size to at least 12 Click twice with a gap between to change the axis label from ‘Count’ to ‘Percentage’
Copying to Word • Close the editing window and return to the Output window • You can save the output file with everything in but it’s better to take charts to Word and write up as you go along • Right click on the chart and select Copy then paste as a picture in Word Did class affect survival?
Exercise 4: Survival of the pushiest? • Are American’s more likely to survive when a boat sinks? Produce a suitable summary table and stacked barchartto investigate this http://www.independent.co.uk/news/world/australasia/more-britons-than-americans-died-on-titanic-because-they-queued-1452299.html
Exercise 5: Compare genders The number of haircuts a year for a sample of people was summarised: On average who gets more haircuts a year and which gender is more spread out? Do the means and medians look similar for each gender?
Summary statistics in SPSS There are numerous options for summarising data through the Analyse menu in SPSS
Exercise 6: Use Explore to comparethe cost of ticket by survival • Use Explore to get summary statistics and histograms of Cost of ticket by Survival status • Analyze Descriptive Statistics Explore
Exercise 6: Use Explore to comparethe cost of ticket by survival • Which summary statistics should be used? • Interpret the output: how do the two groups (those who died and who survived) compare? • Use the histograms to decide which summary measures to use
Histograms A histogram shows the spread of the data Graphs Legacy Dialogs Histogram To produce histograms separately for the two groups, move ‘Survived’ into the Rows, to stack them on top of each other or into Columns to have them side-by-side
Histograms The data are very skewed. Most people spent very little on their tickets but a few spent a lot Use medians and interquartile range to summarise skewed data