Statistics Loyola Law School – Spring 2008

StatisticsLoyola Law School – Spring 2008 Doug Stenstrom email: stenstro@usc.edu phone: (213)422-0909 http://www.psychwiki.com/wiki/Statistics_Spring_2008

Purpose of the Course After completing this course, you will be able to: • Theory: Well-versed on statistical concepts and theorizing; understand underlying methodology behind statistical reasoning and problem solving • Application: Analyze data using all the major statistical techniques using most popular software • Evaluation: Critically examine and understand statistical claims; learn to interpret “Results” sections, statistical criteria, news articles, etc. • Also… Write a “Results” section Know the vocabulary/terminology of statistics

Today’s objective • Overview of the course • Help you map out your semester (e.g., time, tasks, etc.) • Discuss the underlying THEORY behind Statistics • Answer questions you may have about statistical process

Why Statistical reasoning is important: • Information bombardment! Is there data? • If you cannot distinguish good from faulty reasoning, then you are vulnerable to manipulation • Applicable to all areas of life, such as health, business, sports, politics, personal relationships, etc. • Almost 85% of lung cancers in men and 45% in women are tobacco-related. • People tend to be more persuasive when they look others directly in the eye and speak loudly and quickly. • A surprising new study shows that eating egg whites can increase one's lifespan. • News today – “Justices Hear Arguments in Lethal Injection Case” • http://www.cnn.com/ELECTION/2008/primaries/results/epolls/index.html#IADEM

Overview of the Research Process

Most people find statistics to be difficult to learn because of the way it is taught • Old way = Bottom-up This class = Top-down • Old way = Disconnect between class and real-world This class = Learn both Theory and Application • Old way = Overwhelming complexity This class = Read textbook AFTER class is over

Syllabus • I would suggest getting the textbook for three reasons… • How to read the textbook… • What will happen in class each day… • What will be covered this semester…

Motivation! • Statistics can be boring • I would suggest having your own data… • The class is structured so that you don’t need your own data • The class is also structured so that we can analyze your data during class

THEORY (what is statistics?) 10 statements to understand the theorizing behind statistics

(1) In every study there is error. • Error is the difference between your study (sample) and the true value (population) • Measurement error • Sampling Error

(2) Since there is error, there is always some doubt • Imagine you were conducting a study about whether males or females are happier… • Given the error, you may find a difference (sample) when one does not exist (population); • Or, you may find no difference (sample) when in fact one does exist (population).

(3) Since there is some doubt, you are dealing with PROBABILITIES of being right/wrong • Any outcome you receive (sample), you need to determine the probability or percentage of being right (population), such as 90% confident, 80% confident, 55% confident, etc.

(4) You calculate probabilities using a “probability distribution” • A probability distribution describes the frequency or probabilities that an event can take • Given any probability distribution, you can calculate the probability of any given score taking place.

(5) But how do we know the “probability distribution” of the entire population? • We DON’T!

(6) Instead, we assume it approaches a “normal” distribution. • A normal distribution is a symmetric bell-shaped curve defined by two things: the mean (average) and variance (variability). • The idea behind statistics is that as sample size increases, distributions will approximate normal.

(6) Instead, we assume it approaches a “normal” distribution. • Also, the sampling distribution of the sample means approximates normal, even if the population from which the sample is taken is not normal. • http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html

(7) Statisticians have already created distributions for all possible situations • For every type of statistical test (e.g., correlation, regression, t-test, ANOVA, etc) and for every type of situation (e.g., sample size, group size, degrees of freedom, etc) statisticians have created probability distributions to fit that situation.

(8) And, they have determined “Rejection Regions” based upon how much doubt you are willing to live with • Given the doubt involved in research, how much doubt are you willing to live with?

(9) “Statistics” is about comparing your Sample (Test Statistic) to the Population (Distribution and Rejection Regions) Three step process: • Step 1 - Calculate Test Statistic (Your Sample) • Step 2 – Obtain Sampling Distribution and Rejection Regions (Population) • Step 3 – Compare your Sample to the Population.

EXAMPLE • Step 1 - Calculate Test Statistic (Your Sample) • For every type of statistical test (e.g., correlation, regression, t-test, ANOVA, etc), statisticians have created a formula for that test that measures how much variance is explained by sample. • In our “happiness” study, males=4.2 females=4.8 From our study we know means, standard deviation, size. Plug those numbers into t-test formula. Lets say the formula tells us that t=3.2

EXAMPLE • Step 2 - Obtain Distribution and Rejection Regions • (As indicated by statement #7 and #8) For every type of statistical test (e.g., correlation, regression, t-test, ANOVA, etc) and for every type of situation (e.g., sample size, group size, degrees of freedom, etc) statisticians have created probability distributions and rejection regions to fit that situation. - In our “happiness” study, here is distribution:

EXAMPLE • Step 3 – Compare your Sample to the Population. • If your “Test Statistic” falls in rejection region, then we have LESS than 5% doubt, so we consider the outcome of the study as valid • In our example, 3.2 is in the rejection region

(10) Thus, the theorizing behind conducting statistics is: anytime you conduct a statistical test, we calculate the probability that the outcome is wrong, and if we calculate a 5% or less chance of being wrong, we accept the outcome as valid (called “significant”).

10 statements (1) In every study there is error. (2) Since there is error, there is always some doubt. (3) Since there is doubt, you are dealing with Probabilities of being right/wrong. (4) You calculate probabilities using a “probability distribution”. (5) But how do we know the “probability distribution” of the entire population? (6) Instead, we assume it approaches a “normal” distribution. (7) Statisticians have already created distributions for all possible situations. • And they have determined “Rejection Regions” based upon 5% doubt. (9) “Statistics” is about comparing your Sample (Test Statistic) to the Population (Distribution and Rejection Regions). (10) Thus, the theorizing behind conducting statistics is: anytime you conduct a statistical test, we calculate the probability that the outcome is wrong, and if we calculate a 5% or less chance of being wrong, we accept the outcome as valid (called “significant”).

Statistics Loyola Law School – Spring 2008