230 likes | 448 Views
RMTD 404. Lecture 2. Summation Notation. We need a way to talk about the processes that occur in a statistical analysis in a succinct way We use summation notation Σ - stands for “sum” X - stands for the variable we sum
E N D
RMTD 404 Lecture 2
Summation Notation • We need a way to talk about the processes that occur in a statistical analysis in a succinct way • We use summation notation Σ - stands for “sum” X - stands for the variable we sum i - referred to as a subscripting index, stands for the individual values of X N - stands for the highest value we sum across (usually the number of cases). N could be replaced by a number, but we usually use a letter like N to indicate that we’re summing across all values of X (i.e., there are N values of the X variable).
Summation Notation • Examples • Is read as the sum of the values of X ranging from 1 (the first unit/person) to the Nth person (the last unit/person) • Say X is a vector of {1,2,3,4,5} • Using the above summation notation we can get 1+2+3+4+5 = 15
Summation Notation • We can be more specific • In this case we are only interested in summing the first 4 integer: 1+2+3+4 = 10 • What do you think about these ones?
Summation Notation • X = {11,9,8,15,3} • If i = 2, Xi= 9 • If i = N, Xi= 3 (the Nth case value; N = 5) • What do we think about this?
Summation Notation • X = {11,9,8,15,3} • If i= 2, Xi= 9 • If i = N, Xi= 3 (the Nth case value; N = 5) • What do we think about these? • Pay attention to the parentheses – solve those first then exponentiate
Summation Notation • Some rules • Adding a constant • Multiplying a constant • Multiplying matched pairs (two vectors) • Difference between two vectors
Summation Notation • Don’t let summation notation scare you • All we’re doing here is summing across a vector of rows (I)and a vector of columns (J)
Measures of Central Tendency • To get at the “location” of the distributions we use measures of central tendency • We look at location shifts
Measures of Central Tendency • Mean • Median • Mode X = {5,3,2,9,3,4,9,8,2} Using R…
Distributions: Modality • Compare the following two graphics • The left graph shows evidence of a bimodal distribution (two distinct points) Mean, median, mode
Distributions: Shape • When talking about shape, we are talking about kurtosis – the concentration of the data in the center, shoulders, and tail center leptokurtic shoulders mesokurtic platykurtic tails
Distribution: Skewness • The left is negatively skewed while the right is positively skewed • When skewness is present, our measures of central tendency aren’t as obvious mode median mean
Measures of Variability • Range – difference between two most extreme points • Interquartile Range – the difference between the 25th and 75th percentiles • Variance - the average deviation score from the mean • Standard deviation – average absolute deviation from the mean
Measures of Variability • Coefficient of Variation - An index that rescales the standard deviations from two groups that are measured on the same scale but have very different means (useful for comparing group variability).
SPSS & R • Using the NELS student data we can get the following output for the base-year math scores • Using SPSS • Using R summary(bytxmstd) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 30.28 43.17 51.45 51.71 59.66 71.22 30.00
Transformation • There are some solutions to skewed distributions • Linear transformations • We can add a constant to each case in the dataset will shift the mean of the distribution by that value • We can similarly multiply or divide values each case by some constant
Transformation • Standardization is a very common method • Z-scores help us turn raw scores into standard deviations (with a mean of 0 and sd of 1) • For example, if someone has a GRE score of 620, and the mean is 500, and sd is 100 then…
Transformation • You can use the following formula to transform scores into have a mean and standard deviation of your interest • X’ is the transformed score, sx’ is the desired sd, and Xbar’ is the desired mean
Some Important Properties • Sufficiency • Statistic uses all of the information in the sample – think of the mean, median, and mode… • Unbiasedness • The average of the sum of all possible samples will yield the exact estimate of the parameter of interest – the expected value is equal to the parameter • Efficiency • The variability of a large number of samples is smaller for some statistic than for another (related) statistic • Resistant • Not heavily influenced by outliers
Introduction to R • Basic commands • Creating variables • Graphics • Importing data
Introduction to SPSS • Descriptives • Transformations • Graphics