600 likes | 794 Views
MGMT 276: Statistical Inference in Management Spring, 2014. Welcome. Green sheets. Please click in. My last name starts with a letter somewhere between A. A – D B. E – L C. M – R D. S – Z. Schedule of readings. Before next exam: February 18 th Please read chapters 1 - 4 &
E N D
MGMT 276: Statistical Inference in ManagementSpring, 2014 Welcome Green sheets
Please click in My last name starts with a letter somewhere between A. A – D B. E – L C. M – R D. S – Z
Schedule of readings Before next exam: February 18th Please read chapters 1 - 4 & Appendix D & E in Lind Please read Chapters 1, 5, 6 and 13 in Plous Chapter 1: Selective Perception Chapter 5: Plasticity Chapter 6: Effects of Question Wording and Framing Chapter 13: Anchoring and Adjustment
Use this as your study guide By the end of lecture today2/6/14 Correlational methodology Strength of correlation versus direction Positive vs Negative correlation Strong, vs Moderate vs Weak correlation Characteristics of a distribution Remember to hold onto homework until we have a chance to cover it
Homework due - (February 13th) On class website: please print and complete homework worksheet # 5
Review of Homework Worksheet Notice Gillian asked 1300 people .10 x 1,000,000 = 100,000 130/1300 = .10 10 .10 100,000 .08 8 80,000 .10x100=10 25 .25 250,000 35 .35 350,000 22 .22 220,000 130+104+325+455+286=1300
Review of Homework Worksheet 10 .10 100,000 .08 8 80,000 25 .25 250,000 35 .35 350,000 22 .22 220,000
Negative Strong Review of Homework Worksheet Down -.9 9 8 7 6 Dollars Spent 5 4 3 2 1 50 10 30 20 40 Age
Negative Strong Review of Homework Worksheet Down -0.9227 =correl(A2:A11,B2:B11) =-0.9226648007
Negative Strong Review of Homework Worksheet Down -0.9227 This shows a strong negative relationship (r = - 0.92) between the amount spent on snacks and the age of the moviegoer Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Correlation r (actual number) =correl(A2:A11,B2:B11) =-0.9226648007
Scatterplot displays relationships between two continuous variables Correlation: Measure of how two variables co-occur and also can be used for prediction Range between -1 and +1 The closer to zero the weaker the relationship and the worse the prediction Positive or negative
Correlation - How do numerical values change? http://neyman.stat.uiuc.edu/~stat100/cuwu/Games.html http://argyll.epsb.ca/jreed/math9/strand4/scatterPlot.htm Let’s estimate the correlation coefficient for each of the following r = +.80 r = +1.0 r = -1.0 r = -.50 r = 0.0
This shows a strong positive relationship (r = 0.97) between the appraised price of the house and its eventual sales price r = +0.97 Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number)
r = +0.97 r = -0.48 This shows a moderate negative relationship (r = -0.48) between the amount of pectin in orange juice and its sweetness Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number)
Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) This shows a strong negative relationship (r = -0.91) between the distance that a golf ball is hit and the accuracy of the drive r = -0.91
Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) This shows a moderate positive relationship (r = 0.61) between the length of stay in a hospital and the number of services provided r = 0.61 r = -0.91
r = +0.97 r = -0.48 r = 0.61 r = -0.91
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Break into groups of 2 or 3 Each person hand in own worksheet. Be sure to list your name and names of all others in your group Use examples that are different from those is lecture 1. Describe one positive correlation Draw a scatterplot (label axes) 2. Describe one negative correlation Draw a scatterplot (label axes) 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)
Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Variable name is listed clearly Both axes have real numbers listed Both axes and values are labeled This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). 48 52 5660 64 68 72 1. Describe one positive correlation Draw a scatterplot (label axes) Height of Mothers (in) 2. Describe one negative correlation Draw a scatterplot (label axes) 48 52 56 60 64 68 72 76 Height of Daughters (inches) 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)
This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches). 48 52 5660 64 68 72 Both axes and values are labeled Both axes and values are labeled Both variables are listed, as are direction and strength Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches). 48 52 5660 64 68 72 Both axes and values are labeled Both axes and values are labeled Both variables are listed, as are direction and strength Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Break into groups of 2 or 3 Each person hand in own worksheet. Be sure to list your name and names of all others in your group Use examples that are different from those is lecture 1. Describe one positive correlation Draw a scatterplot (label axes) 2. Describe one negative correlation Draw a scatterplot (label axes) 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)
Both variables are listed, as are direction and strength Both axes and values are labeled Both axes and values are labeled This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches). 48 52 5660 64 68 72 1. Describe one positive correlation Draw a scatterplot (label axes) Height of Mothers (in) 2. Describe one negative correlation Draw a scatterplot (label axes) 48 52 56 60 64 68 72 76 Height of Daughters (inches) 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)
Negative Strong Review of Homework Worksheet Down -0.9227 Must be complete and must be stapled Hand in your homework =correl(A2:A11,B2:B11) =-0.9226648007
You’ve completed constructing your questionnaire…what’s the best way to get responders?? Sample versus census How is a census different from a sample? Census measures each person in the specific population Sample measures a subset of the population and infers about the population – representative sample is good What’s better? Use of existing survey data U.S. Census Family size, fertility, occupation The General Social Survey Surveys sample of US citizens over 1,000 items Same questions asked each year
Population (census) versus sampleParameter versus statistic Parameter – Measurement or characteristic of the population Usually unknown (only estimated) Usually represented by Greek letters (µ) pronounced “mu” pronounced “mew” Statistic – Numerical value calculated from a sample Usually represented by Roman letters (x) pronounced “x bar”
Simple random sampling: each person from the population has an equal probability of being included Sample frame = how you define population Let’s take a sample …a random sample Question: Average weight of U of A football player Sample frame population of the U of A football team Pick 24th name on the list Random number table – List of random numbers Or, you can use excel to provide number for random sample =RANDBETWEEN(1,115) Pick 64th name on the list(64 is just an example here) 64
Systematic random sampling: A probability sampling technique that involves selecting every kth person from a sampling frame You pick the number Other examples of systematic random sampling 1) check every 2000th light bulb 2) survey every 10th voter
Stratified sampling: sampling technique that involves dividing a sample into subgroups (or strata) and then selecting samples from each of these groups - sampling technique can maintain ratios for the different groups Average number of speeding tickets 12% of sample is from California 7% of sample is from Texas 6% of sample is from Florida 6% from New York 4% from Illinois 4% from Ohio 4% from Pennsylvania 3% from Michigan etc Average cost for text books for a semester 17.7% of sample are Pre-business majors 4.6% of sample are Psychology majors 2.8% of sample are Biology majors 2.4% of sample are Architecture majors etc
Cluster sampling: sampling technique divides a population sample into subgroups (or clusters) by region or physical space. Can either measure everyone or select samples for each cluster Textbook prices Southwest schools Midwest schools Northwest schools etc Average student income, survey by Old main area Near McClelland Around Main Gate etc Patient satisfaction for hospital 7th floor (near maternity ward) 5th floor (near physical rehab) 2nd floor (near trauma center) etc
Non-random sampling is vulnerable to bias Convenience sampling: sampling technique that involves sampling people nearby. A non-random sample and vulnerable to bias Snowball sampling: a non-random technique in which one or more members of a population are located and used to lead the researcher to other members of the population Used when we don’t have any other way of finding them - also vulnerable to biases Judgment sampling: sampling technique that involves sampling people who an expert says would be useful. A non-random sample and vulnerable to bias
Overview Frequency distributions The normal curve Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure of 1) central tendency 2) dispersion or 3) shape Mean, Median, Mode, Trimmed Mean Standard deviation, Variance, Range Mean Absolute Deviation Skewed right, skewed left unimodal, bimodal, symmetric
Another example: How many kids in your family? Number of kids in family 1 4 3 2 1 8 4 2 2 14 14 4 2 1 4 2 3 2 1 8
Measures of Central Tendency(Measures of location)The mean, median and mode Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations Mean for a sample: Σx / n = mean = x Mean for a population: ΣX / N = mean = µ(mu) Measures of “location” Where on the number line the scores tend to cluster Note: Σ = add up x or X = scores n or N = number of scores
Measures of Central Tendency(Measures of location)The mean, median and mode Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations Mean for a sample: Σx / n = mean = x 41/ 10 = mean = 4.1 Number of kids in family 1 4 3 2 1 8 4 2 2 14 Note: Σ = add up x or X = scores n or N = number of scores
Number of kids in family 1 4 32 18 42 2 14 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least) 1, 3, 1, 4, 2, 4, 2, 8, 2, 14 1, 2, 2, 4, 2, 1, 8, 3, 4, 14
Number of kids in family 1 3 1 4 2 4 2 8 2 14 Number of kids in family 1 4 32 18 42 2 14 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least) 1, 3, 1, 4, 2, 4, 2, 8, 2, 14 1, 2, 2, 4, 1, 2, 2, 4, 2, 1, 2, 1, 8, 8, 3, 4, 14 3, 4, 14 2.5 2 + 3 µ=2.5 If there appears to be two medians, take the mean of the two Median always has a percentile rank of 50% regardless of shape of distribution
Number of kids in family 1 3 1 4 2 4 2 8 2 14 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least)
Mode: The value of the most frequent observation Score f . 1 2 2 3 3 1 4 2 5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 1 Number of kids in family 1 3 1 4 2 4 2 8 2 14 Please note: The mode is “2” because it is the most frequently occurring score. It occurs “3” times. “3” is not the mode, it is just the frequency for the value that is the mode Bimodal distribution: If there are two most frequent observations
What about central tendency for qualitative data? Mode is good for nominal or ordinal data Median can be used with ordinal data Mean can be used with interval or ratio data
Overview Frequency distributions The normal curve Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure of 1) central tendency 2) dispersion or 3) shape Mean, Median, Mode, Trimmed Mean Skewed right, skewed left unimodal, bimodal, symmetric
A little more about frequency distributions An example of a normal distribution