1 / 39

How to Analyze Data?

How to Analyze Data?. Aravinda Guntupalli. SPSS windows process. Data window Variable view window Output window Chart editor window. How to use different file types?. Excel file csv file SPSS file. Types of variables. You can select type of variable String Numeric

christmas
Download Presentation

How to Analyze Data?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Analyze Data? Aravinda Guntupalli

  2. SPSS windows process • Data window • Variable view window • Output window • Chart editor window

  3. How to use different file types? • Excel file • csv file • SPSS file

  4. Types of variables • You can select type of variable • String • Numeric • You can also select format of variable • Categorical • Ordinal • Interval

  5. Why does it matter? • Statistical computations and analyses assume that the variables have specific levels of measurement • Can you compute average of hair color? • Does it makes sense to compute the average of educational experience? • An average requires a variable to be interval. 

  6. Stock and flow variables • In data analysis it is useful to distinguish between between stock and flow variables. • Stock variables are measured at a point in time and flow variables are measured over a period in time. • Cross-section data make comparisons at a given or in a given period in time, while time-series data depict evolution over time.

  7. Manipulate existing data

  8. Compute new variable • You can calculate different variables from the existing variables. • For this you need to know the way to compute your target variable from the existing variables. • You can perform operations like addition, subtraction, division and multiplication of variables to create a new variable.

  9. Example • Total out put of food grains (addition of rice, wheat, maize and other grain output) • Income difference between males and females (male income – female income) • Age square variable (age*age) • GDP Per capita (Total GDP/Population)

  10. Recode variable • Using SPSS you can recode a variable into the same variable. How? • We have data on years of education from 0 to 22 years for mothers and you need to do analysis using only 3 categories: Mothers who did not complete the high school, mothers who completed high school and mothers completed college?How you will do this?

  11. How to perform this? • Go to Transform pull down menu – then go to Recode- then to Recode into same variable (if you want to replace the existing information) • Select education and move it into the numeric variable list. • Define values by clicking Old and new values. • Enter 0-11 range as 1, 12-15 as 2 and 16-22 as 3

  12. How to make a new data set? • We will create now a data set on our own. • Cross-sectional • Panel • Time series • Types of variables • String • Numeric

  13. Replace missing values • Missing observations can be problematic in analysis, and some time series measures cannot be computed if there are missing values in the series. • Replace Missing Values creates new time series variables from existing ones, replacing missing values with estimates computed with one of several methods.

  14. Also… • Default new variable names are the first six characters of the existing variable used to create it, followed by an underscore and a sequential number. • For example, for the variable PRICE, the new variable name would be PRICE_1. The new variables retain any defined value labels from the original variables. • Optionally, you can enter variable names to override the default new variable names.

  15. To Replace Missing Values for Time Series Variables • From the pull down menu choose: Transform and then Replace Missing Values • You can then select the estimation method you want to use to replace missing values. • Select the variable for which you want to replace missing values. • Also you can enter variable names to override the default new variable names.

  16. Graphs

  17. Boxplot • A boxplot consists of box and 2 tails. • The horizontal line inside the box tells the position of the median and its upper and lower boundaries are its upper and lower quartiles. • The tails run to the most extreme values. • boxplot in sum shows structure of the data along with its skewness and spread.

  18. Drawing a boxplot. Question: We have recorded the heights in cm of boys in a class as shown below. We will draw a boxplot for this data. Q2 Qu QL 137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186 Upper Quartile = 180 Lower Quartile = 158 Median = 171 130 140 150 160 170 180 cm 190

  19. Boxplot

  20. How to make a boxplot? • From the menus, choose: Graphs and Boxplot • Select the icon for Simple and select Summaries for groups of cases. • Select Define. • Select the variable for which you want boxplots, and move it into the Variable box. • Select a variable for the category axis and move it into the Category Axis box. This variable may be numeric, string, or long string.

  21. Histogram A Histogram is a graphical representation of a frequency distribution for continuous data. The height is proportional to the frequency of that class

  22. Histogram (2)

  23. How to make histogram? • From the menus, choose: Graphs and Histogram • Select a numeric variable for Variable in the Histogram dialog. • Select Display normal curve to display a normal curve on the histogram.

  24. Scatter plot (1) • To know the relationships between two quantitative variables we are interested in we can use scatter plots. • A scatter diagram plots the value of one economic variable against the value of another variable. • It can be used to reveal whether a relationship exists and the type of relationship that exists. • A scatter plot can describe the relation between reading and writing scores.

  25. Scatter plot (2)

  26. Typical Patterns Negative linear relationship Positive linear relationship No relationship Negative nonlinear relationship Nonlinear (concave) relationship

  27. How to make scatter plots? • From the menus, choose: Graphs and Scatter • Select the icon for Simple. • Select Define. • You must select a variable for the Y-axis and a variable for the X-axis. These variables must be numeric, but should not be in date format. • You can select a variable and move it into the Set Markers by box. This variable may be numeric or string.

  28. Descriptive statistics

  29. Descriptive statistics • It tells you how many valid cases you have for data along with mean and standard deviation. • You can understand about distribution using this command in SPSS. • How to do this? • Analyse • Descriptive statistics • Frequencies/Descriptives/Explore/Crosstabs • Select the variables • Using shift or ctrl key you can select multiple variables

  30. Correlation and regression

  31. What is Correlation? • Research question: What is the relation between two variables? • Correlation is a measure of the direction and degree of linear association between 2 variables

  32. Interpreting Correlation Strengthr very weak 0 - .19 weak .20 - .39 moderate .40 - .59 strong .60 - .79 very strong .80 - 1.00

  33. Relation between hourly pay and age R Square values indicate the proportion of variance in the dependent variable (y) accounted for by variation in the independent variable (x)

  34. Regression coefficients hourly pay = 1.336 + .231 x age + error

  35. Multivariate Regression Analysis

  36. When do we use Multivariate Regression Analysis • To find the relationship between more than two variables • y= b0 + bx1 + bx2 + e • hours worked (y) • education (x1) • income (x2)

  37. Simultaneous regression • hourly pay (£)= -8.773 + .622*education + 0.201*age

  38. What if… we have a dichotomous dependent variable? • Use a dummy dependent variable regression model • Logistic regression model • Unlike simple linear regression and multiple regression, in logistic regression the dependent variable is dichotomous (ie. 0,1) • In logistic regression more than one independent variable can be used

  39. Thank You

More Related