280 likes | 389 Views
Statistics for Social Sciences I (E563). Prof. Sudip Ranjan Basu , Ph.D 25 September 2008. Think about these bar diagrams …. « A statistical tie ». Measurement in Statistics. Concepts of measurement: Measurement: a very specific process to assigning number to a variable
E N D
Statistics for Social Sciences I (E563) Prof. Sudip Ranjan Basu, Ph.D 25 September 2008
Think about these bar diagrams… « A statistical tie » Lecture 2-Sudip R. Basu
Measurement in Statistics • Concepts of measurement: • Measurement:a very specific process to assigning number to a variable • Assignment by category (categorical/qualitative-attributes) • Assignment by amount • assignment of a person to a particular category or a variable • Validity: • to describe the objective and accurately reflect the concept • to measure by a particular scale or index • Face validity/Content validity/Criterion validity/Construct validity • Reliability: • to have consistency of the data collected • likelihood that the scale is actually measuring what it is supposed to measure • Free of measurement errors • Split-half reliability/test-retest reliability Lecture 2-Sudip R. Basu
Forms of ‘variable’ • Variables: Concepts that vary, or change, from one observation to another in a sample or population • Measurement scale differs • Different statistical methods to apply to Quantitative and Qualitative variables Lecture 2-Sudip R. Basu
Sales of measurement • Qualitative variable: • Unordered/nominal scale • Primary mode of transportation • (Bus, tram, bicycle, walk) • Qualitative variable: • Ordered/ordinal scale • Involves a rank order or other • ordering • Political philosophy • (Liberal, moderate. conservative) Lecture 2-Sudip R. Basu
Quantitative aspects of ordinal data • Interval scale: • Class interval: An interval that indicates the space between two end points • Qualitative • vary in magnitude • Nominal scale: • Qualitative • vary in quality not in quantity • Ordinal scale: • quantitative-qualitative • vary in quality not in quantity • Each level has a greater or smaller magnitude • Numerical scale by assigning numerical scores to categories • Interval than nominal • Sensitivity analysis Lecture 2-Sudip R. Basu
Discrete and Continuous • Discrete: A set of values form separate numbers, such as 0,1,2,…. • Unit of measurement cannot be subdivided • Number of siblings • Number of visits to a physician last year • Categorical variables-nominal or ordinal • Quantitative variables-discrete (Number of siblings) or continuous (age) • Continuous: An infinite continuum of possible real number values • Any real number possible between two values • Height • Weight Lecture 2-Sudip R. Basu
Summarize types of variables Lecture 2-Sudip R. Basu
Describing data • Categorical data: • Frequency : headcounts or tallies indicating the number of cases in particular category or the total number of cases measured/the number of observations • Scores: Numbers that are used to represent amounts or rankings • Relative frequency • The proportion (# of observations in a category divided by the total number of observations) or percentage (proportion multiplied by 100) of the observations that fall in that category • Sum of proportions equals to 1.00 • Frequency distribution • A tabulation that lists possible values for a variable, together with the number of observations at each level. • Relative frequency distribution • A listing of possible values together with their proportions or percentages • Quantitative data: • Frequency distribution • Intervals of values in frequency distributions are usually of equal width • Mutually exclusive intervals Lecture 2-Sudip R. Basu
Bar graphs Lecture 2-Sudip R. Basu
Comparing groups • Compare: Same variable and different groups • Relative frequency distributions • Histograms • Stem-and-leaf plots Lecture 2-Sudip R. Basu
Population and sample distribution • Sample distribution is a ‘blurry’ picture of the population distribution • As the sample size increases, the sample proportion in any interval gets closer to the true population proportion • Sample distribution population distribution Lecture 2-Sudip R. Basu
Shape of a distribution • Shapes of distributions differ Symmetric Skewed Lecture 2-Sudip R. Basu
SESSION 2 of Lecture 2 Lecture 2-Sudip R. Basu
Working with STATAstata@stata.comhttp://www.ststa.com Lecture 2-Sudip R. Basu
Getting started with STATA • The first four windows open automatically after clicking STATA icon: • The most visible window is the Results Window, which shows results from commands you have typed in the Command Window. • The Command Window is below Results Window where all your commands are typed. • The Review Window lists all typed commands that have been entered from the Command Window. When you click on a command from Review Window, it is pasted into the Command Window. • The Variables Window lists all working variables in the file. Once you click on a variable, and it will appear in the command window. Lecture 2-Sudip R. Basu
STATA window Lecture 2-Sudip R. Basu
Simpel commands • The data editor allows you to enter, view, or edit your working data file. Caution: This window must be closed in order to run commands in STATA. • The do-file editor allowsyou to write, edit, and save STATA commands. STATA commands can be run from the do-file editor. -- files are called do files because they have the file extension .do • Note: STATA treats lines that begin with an asterisk * or text between a pair of /* and */ as comments. Lecture 2-Sudip R. Basu
Save-Close files • Open/Save/Close data file using the icons at the top of the screen-“file” or via commands in the Command Window. • The STATA dataset is saved in the .dta format. • You can use a separate programme called Stat Transfer to translate the dataset from its current format into STATA format. • For large dataset, researchers prefer to use this program. This program retains any variable or value labels from the original file. Lecture 2-Sudip R. Basu
Help-Search • Memoryallows you to handle a large datasets. For example, you can set a memory size of 20m by the following command in the Command Window. .set memory 20m • Help/Search facilities in the STATA allow looking for any command. You can use the help command by simply typing help in the Command Window or using the drop-down Help menu icon, which will open a separate window. You can also type findit commandfor more information. • However, if you do not know the STATA command name you can use the Search facility using the drop-down Help menu icon. For example, if you want help with describe, then you type: .help describe • STATA programme uses simple language syntax. Almost all commands follow the structure: .command variable (variable variable…) , options Lecture 2-Sudip R. Basu
Creating a new dataset • The easy way to create a dataset is to type values for each variable, in columns that STATA automatically calls var1, var2, etc in the Data Editor. Thus, var1 contains names of students; var1 statistics competency; and so forth. • Rename: .rename var1 students .label variable students “Students in Statistics, 2008-2009” • After typing in the information, you close the window and savedata, say .stat2.dta . save stat2 Lecture 2-Sudip R. Basu
Working with Sample • Specifying Subsets of the data: You can restrict to a subset of the data by adding an in or if qualifier, such as using only the 1st through 20th observation, type .list in 1/25 .sort origin .list origin program in 1/25 • The if qualifier also has broad applications, but it selects observations based on specific variable values, such as .summarize if stat==1 Lecture 2-Sudip R. Basu
Describing data • Frequency Tables and Two-Way Cross Tabulations: You can work on Categorical variables for tabulation. Use the dataset stat to tabulate the categorical variable programme: .tabulate programme • You can do cross-tabulation of programme by stat: .tabulate programme stat • You can get column percentages, type .tabulate programme stat, column Lecture 2-Sudip R. Basu
Data tabulation • Multiple Tables and Multi-way Cross-Tabulations: You can work on many different variables, type .tab1 origin programme stat .tab1 programme – education • You can get multiple two-way tables, such as cross-tabulations of every two-way combinations of the listed variables, type .tab2 origin programme stat • To produce multi-way tables, if we do not need percentages or statistical tests, type .table programme , contents (freq) • To produce two-way frequency table or cross-tabulation, type . table origin programme , contents (freq) • To produce a more complicated tables, type . table origin programme , contents (freq) by (stat) Lecture 2-Sudip R. Basu
GRAPHS with STATA • You can draw bar charts, type: .graph bar stat, over (programme) blabel(bar) bar (1, bcolor(gs10)) .graph bar stat, over(programme) legend( label(1 "Frequency")) ytitle("Native Language Speakers") title("Bar diagram of native language speakers, E563") subtitle("by languages") note("Source: Statistics Class 1, SRBasu") .graph bar stat word, over (programme) blabel(bar) bar (1, bcolor(gs10)) bar (2, bcolor (gs7)) • You can draw horizontal bar charts, type: .graph hbar stat, over (programme) blabel(bar) bar (1, bcolor(gs10)) .graph hbar stat word, over (programme) blabel(bar) Lecture 2-Sudip R. Basu
Working with datasets See Week 2 web-course material • Assignment_1 Datasets: 2) Week2_Students Profile 3) Week2_World Socio-economic data Lecture 2-Sudip R. Basu
Note Week 3-2 October • Descriptive Statistics • Measures of Central Tendency and Dispersion, Moments, Skewness, and Kurtosis • Readings: • AF-Chapter 3 (p.39-60) • MS-Chapter 4, MS-Chapter 5 • Assignment: Assignment 2 • Students should turn in his/her own paper in hardcopies to teaching assistant at Rigot Office No. 31 or in class on Thursday 9 October-Week 4. Lecture 2-Sudip R. Basu