510 likes | 612 Views
PSYC 6130C UNIVARIATE ANALYSIS. Prof. James Elder. Introduction. What is (are) statistics?. A branch of mathematics concerned with understanding and summarizing collections of numbers A collection of numerical facts Estimates of population parameters, derived from samples.
E N D
PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder
What is (are) statistics? • A branch of mathematics concerned with understanding and summarizing collections of numbers • A collection of numerical facts • Estimates of population parameters, derived from samples PSYC 6130, PROF. J. ELDER
What is this course about? • Applied statistics • Emphasizes methods, not proofs • Descriptive statistics • Inferential statistics PSYC 6130, PROF. J. ELDER
Fall Term PSYC 6130, PROF. J. ELDER
Winter Term PSYC 6130, PROF. J. ELDER
Some Background (Howell Ch. 1)
Variables and Constants • Constants are properties that never change (e.g., the speed of light in a vacuum ~3x108m/s). • Most physiological and psychological parameters of interest vary considerably • Between individuals (e.g., intelligence quotient) • Within individuals (e.g., heart rate) • Any variable whose variation is somewhat unpredictable is called a random variable (rv). PSYC 6130, PROF. J. ELDER
Scales of measurement • Nominal scale: values are categories, having no meaningful correspondence to numbers. PSYC 6130, PROF. J. ELDER
Scales of measurement • Ordinal scale: ordering is meaningful, but exact numerical values (if they exist) are not. PSYC 6130, PROF. J. ELDER
Scales of measurement • Interval scale: values are numerically meaningful, and interval between two values is meaningful. • Example: Celsius temperature scale. It takes the same amount of energy to raise the temperature of a gram of water from 20 °C to 21 °C as it does to raise it from 30 °C to 31 °C. • Ratio scale: ratio of two values is also meaningful. • Example: Kelvin temperature scale. A gram of H20 at 300 K has twice the energy of a gram of H20 at 150 K. • Ratio scales require a 0-point corresponding to a complete lack of the substance being measured. • Example: a gram of H20 at 0 K has no heat (particles are motionless). PSYC 6130, PROF. J. ELDER
Continuous vs Discrete Variables • A continuous variable may assume any real value within some range PSYC 6130, PROF. J. ELDER
Continuous vs Discrete Variables • A discrete variable may assume only a countable number of values: intermediate values are not meaningful. PSYC 6130, PROF. J. ELDER
Independent vs Dependent Variables • Experiments involve independent and dependent variables. • The independent variable is controlled by the experimenter. • The dependent variable is measured. • We seek to detect and model effects of the independent variable on the dependent variable. • Example: In a visual search task, subjects are asked to find the odd-man-out in a display of discrete items (e.g., a horizontal bar amongst vertical bars). • The number of items in the display is an independent variable. • Reaction time is the main dependent variable. • Typically, we observe a roughly linear relationship between the number of items and the reaction time. PSYC 6130, PROF. J. ELDER
Experimental vs Correlational Research • Experimental study: • Researcher controls the independent variable. • Seek to detect effects on the dependent variable. • Direction of causation may be inferred (but may be indirect). • Correlational study: • There are no independent or dependent variables. • No variables are under control of the researcher. • Seek to find statistical relationships (dependencies) between variables. • Direction of causation may not normally be inferred. PSYC 6130, PROF. J. ELDER
Correlational Studies: Examples PSYC 6130, PROF. J. ELDER
Populations vs Samples • In human science, we typically want to characterize and make inferences not about a particular person (e.g., Uncle Bob) but about all people, or all people with a certain property (e.g., all people suffering from a bipolar disorder). • These groups of interest are called populations. • Typically, these populations are too large and inaccessible to study. • Instead, we study a subset of the group, called a sample. • In order to make reliable inferences about the population, samples are ideally randomly selected. • The population properties of interest are called parameters. • The corresponding measurements made on our samples are called statistics. Statistics are approximations (estimates) of parameters. PSYC 6130, PROF. J. ELDER
Different Types of Populations and Samples • Outside of human science, populations do not necessarily refer to humans • e.g. populations may be of bees, algae, quarks, stock prices, pork belly futures, ozone levels, etc… • In clinical and social psychology you will often be conducting large-n studies on human populations. • In cognitive psychology, you will often be doing small-n within-subject studies involving repeated trials on the same subject. • Here, you may think of the ‘population’ as being the infinite set of responses you would obtain were you able to continue the experiment indefinitely. • The sample is the set of responses you were able to collect in a finite number of trials (e.g., 5000) on the same subject. PSYC 6130, PROF. J. ELDER
Summation Notation PSYC 6130, PROF. J. ELDER
Some Summation Rules PSYC 6130, PROF. J. ELDER
Summary • What is (are) statistics • Variables and constants • Scales of measurement • Continuous and discrete variables • Independent and dependent variables • Experimental and correlational research • Populations and samples • Summation Notation PSYC 6130, PROF. J. ELDER
Frequency Tables 1991 U.S. General Social Survey: Number of Brothers and Sisters PSYC 6130, PROF. J. ELDER
Bar Graphs and Histograms PSYC 6130, PROF. J. ELDER
Statistics Canada 2001 Census Age of Respondent Grouped Frequency Distributions • What are the apparent limits? • What are the real limits? PSYC 6130, PROF. J. ELDER
Percentiles and Percentile Ranks • Percentile: The score at or below which a given % of scores lie. • Percentile Rank: The percentage of scores at or below a given score PSYC 6130, PROF. J. ELDER
Linear Interpolation to Compute Percentile Ranks Statistics Canada 2001 Census Age of Respondent PSYC 6130, PROF. J. ELDER
Linear Interpolation to Compute Percentiles Statistics Canada 2001 Census Age of Respondent PSYC 6130, PROF. J. ELDER
Measures of Central Tendency • The mode – applies to ratio, interval, ordinal or nominal scales. • The median – applies to ratio, interval and ordinal scales • The mean – applies to ratio and interval scales PSYC 6130, PROF. J. ELDER
The Mode • Defined as the most frequent value (the peak) • Applies to ratio, interval, ordinal and nominal scales • Sensitive to sampling error (noise) • Distributions may be referred to as unimodal, bimodal or multimodal, depending upon the number of peaks PSYC 6130, PROF. J. ELDER
The Median • Defined as the 50th percentile • Applies to ratio, interval and ordinal scales • Can be used for open-ended distributions PSYC 6130, PROF. J. ELDER
The Mean • Applies only to ratio or interval scales • Sensitive to outliers PSYC 6130, PROF. J. ELDER
1. 2. 3. Properties of the Mean PSYC 6130, PROF. J. ELDER
Properties of the Mean (Cntd…) PSYC 6130, PROF. J. ELDER
Measures of Variability (Dispersion) • Range – applies to ratio, interval, ordinal scales • Semi-interquartile range – applies to ratio, interval, ordinal scales • Variance (standard deviation) – applies to ratio, interval scales PSYC 6130, PROF. J. ELDER
Range = 79 drinks Range • Interval between lowest and highest values • Generally unreliable – changing one value (highest or lowest) can cause large change in range. PSYC 6130, PROF. J. ELDER
Semi-Interquartile Range • The interquartile range is the interval between the first and third quartile, i.e. between the 25th and 75th percentile. • The semi-interquartile range is half the interquartile range. • Can be used with open-ended distributions • Unaffected by extreme scores SIQ = 2.5 drinks PSYC 6130, PROF. J. ELDER
Population Variance and Standard Deviation PSYC 6130, PROF. J. ELDER
Sample Variance and Standard Deviation PSYC 6130, PROF. J. ELDER
Degrees of Freedom PSYC 6130, PROF. J. ELDER
Computational Formulas for Variance PSYC 6130, PROF. J. ELDER
Properties of the Standard Deviation 1. PSYC 6130, PROF. J. ELDER
Properties of the Standard Deviation (cntd…) 2. PSYC 6130, PROF. J. ELDER
Standard Deviation Example cf. SIQ = 2.5 drinks range = 79 drinks PSYC 6130, PROF. J. ELDER
Median=3 Mean=6.7 Skew • The mean and median are identical for symmetric distributions. • Skew tends to push the mean away from the median, toward the tail (but not always) PSYC 6130, PROF. J. ELDER
Skewness • Properties of skewness • Positive for positive skew (tail to the right) • Negative for negative skew (tail to the left) • Dimensionless • Invariant to shifting or scaling data (adding or multiplying constants) PSYC 6130, PROF. J. ELDER
Dealing with Outliers • Trimming: • Throw out the top and bottom k% of values (k=5%, for example). • May be justified if there is evidence for confounding process interfering with the dependent variable being studied • Example: participant blinks during presentation of a visual stimulus • Example: participant misunderstands a question on a questionnaire. • Transforming • Scores are transformed by some function (e.g., log, square root) • Often done to reduce or eliminate skewness PSYC 6130, PROF. J. ELDER
skewness=0.08 skewness=0.67 Log-Transforming Data PSYC 6130, PROF. J. ELDER
End of Lecture 1 Sept 10, 2008
kurtosis>0: leptokurtic (Laplacian) kurtosis=0: mesokurtic (Gaussian) kurtosis<0: platykurtic Kurtosis PSYC 6130, PROF. J. ELDER