400 likes | 847 Views
How to Analyze Data?. Aravinda Guntupalli. SPSS windows process. Data window Variable view window Output window Chart editor window. How to use different file types?. Excel file csv file SPSS file. Types of variables. You can select type of variable String Numeric
E N D
How to Analyze Data? Aravinda Guntupalli
SPSS windows process • Data window • Variable view window • Output window • Chart editor window
How to use different file types? • Excel file • csv file • SPSS file
Types of variables • You can select type of variable • String • Numeric • You can also select format of variable • Categorical • Ordinal • Interval
Why does it matter? • Statistical computations and analyses assume that the variables have specific levels of measurement • Can you compute average of hair color? • Does it makes sense to compute the average of educational experience? • An average requires a variable to be interval.
Stock and flow variables • In data analysis it is useful to distinguish between between stock and flow variables. • Stock variables are measured at a point in time and flow variables are measured over a period in time. • Cross-section data make comparisons at a given or in a given period in time, while time-series data depict evolution over time.
Compute new variable • You can calculate different variables from the existing variables. • For this you need to know the way to compute your target variable from the existing variables. • You can perform operations like addition, subtraction, division and multiplication of variables to create a new variable.
Example • Total out put of food grains (addition of rice, wheat, maize and other grain output) • Income difference between males and females (male income – female income) • Age square variable (age*age) • GDP Per capita (Total GDP/Population)
Recode variable • Using SPSS you can recode a variable into the same variable. How? • We have data on years of education from 0 to 22 years for mothers and you need to do analysis using only 3 categories: Mothers who did not complete the high school, mothers who completed high school and mothers completed college?How you will do this?
How to perform this? • Go to Transform pull down menu – then go to Recode- then to Recode into same variable (if you want to replace the existing information) • Select education and move it into the numeric variable list. • Define values by clicking Old and new values. • Enter 0-11 range as 1, 12-15 as 2 and 16-22 as 3
How to make a new data set? • We will create now a data set on our own. • Cross-sectional • Panel • Time series • Types of variables • String • Numeric
Replace missing values • Missing observations can be problematic in analysis, and some time series measures cannot be computed if there are missing values in the series. • Replace Missing Values creates new time series variables from existing ones, replacing missing values with estimates computed with one of several methods.
Also… • Default new variable names are the first six characters of the existing variable used to create it, followed by an underscore and a sequential number. • For example, for the variable PRICE, the new variable name would be PRICE_1. The new variables retain any defined value labels from the original variables. • Optionally, you can enter variable names to override the default new variable names.
To Replace Missing Values for Time Series Variables • From the pull down menu choose: Transform and then Replace Missing Values • You can then select the estimation method you want to use to replace missing values. • Select the variable for which you want to replace missing values. • Also you can enter variable names to override the default new variable names.
Boxplot • A boxplot consists of box and 2 tails. • The horizontal line inside the box tells the position of the median and its upper and lower boundaries are its upper and lower quartiles. • The tails run to the most extreme values. • boxplot in sum shows structure of the data along with its skewness and spread.
Drawing a boxplot. Question: We have recorded the heights in cm of boys in a class as shown below. We will draw a boxplot for this data. Q2 Qu QL 137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186 Upper Quartile = 180 Lower Quartile = 158 Median = 171 130 140 150 160 170 180 cm 190
How to make a boxplot? • From the menus, choose: Graphs and Boxplot • Select the icon for Simple and select Summaries for groups of cases. • Select Define. • Select the variable for which you want boxplots, and move it into the Variable box. • Select a variable for the category axis and move it into the Category Axis box. This variable may be numeric, string, or long string.
Histogram A Histogram is a graphical representation of a frequency distribution for continuous data. The height is proportional to the frequency of that class
How to make histogram? • From the menus, choose: Graphs and Histogram • Select a numeric variable for Variable in the Histogram dialog. • Select Display normal curve to display a normal curve on the histogram.
Scatter plot (1) • To know the relationships between two quantitative variables we are interested in we can use scatter plots. • A scatter diagram plots the value of one economic variable against the value of another variable. • It can be used to reveal whether a relationship exists and the type of relationship that exists. • A scatter plot can describe the relation between reading and writing scores.
Typical Patterns Negative linear relationship Positive linear relationship No relationship Negative nonlinear relationship Nonlinear (concave) relationship
How to make scatter plots? • From the menus, choose: Graphs and Scatter • Select the icon for Simple. • Select Define. • You must select a variable for the Y-axis and a variable for the X-axis. These variables must be numeric, but should not be in date format. • You can select a variable and move it into the Set Markers by box. This variable may be numeric or string.
Descriptive statistics • It tells you how many valid cases you have for data along with mean and standard deviation. • You can understand about distribution using this command in SPSS. • How to do this? • Analyse • Descriptive statistics • Frequencies/Descriptives/Explore/Crosstabs • Select the variables • Using shift or ctrl key you can select multiple variables
What is Correlation? • Research question: What is the relation between two variables? • Correlation is a measure of the direction and degree of linear association between 2 variables
Interpreting Correlation Strengthr very weak 0 - .19 weak .20 - .39 moderate .40 - .59 strong .60 - .79 very strong .80 - 1.00
Relation between hourly pay and age R Square values indicate the proportion of variance in the dependent variable (y) accounted for by variation in the independent variable (x)
Regression coefficients hourly pay = 1.336 + .231 x age + error
When do we use Multivariate Regression Analysis • To find the relationship between more than two variables • y= b0 + bx1 + bx2 + e • hours worked (y) • education (x1) • income (x2)
Simultaneous regression • hourly pay (£)= -8.773 + .622*education + 0.201*age
What if… we have a dichotomous dependent variable? • Use a dummy dependent variable regression model • Logistic regression model • Unlike simple linear regression and multiple regression, in logistic regression the dependent variable is dichotomous (ie. 0,1) • In logistic regression more than one independent variable can be used