480 likes | 594 Views
Please start portfolios. MGMT 276: Statistical Inference in Management Room McClelland Hall, Room 129 Summer II, 2011. Welcome. http://www.youtube.com/watch?v=Ahg6qcgoay4&watch_response. Use this as your study guide. By the end of lecture today 7/13/11. Review Homework
E N D
Please start portfolios
MGMT 276: Statistical Inference in ManagementRoom McClelland Hall, Room 129Summer II, 2011 Welcome http://www.youtube.com/watch?v=Ahg6qcgoay4&watch_response
Use this as your study guide By the end of lecture today7/13/11 Review Homework Time series versus cross-sectional comparisons: Descriptive or inferential Random vs non-random sampling Connecting intentions of studies with Experimental Methodologies Appropriate statistical analyses and appropriate graphs 7 Most Common Analyses (Confidence Intervals, t-tests, ANOVA, 2-way ANOVA, Correlation, Regression, Chi Square)
Schedule of readings • Before next exam: • Please read chapters 1 - 4 in Lind Please read Chapters 1, 5, 6 and 13 in Plous • Chapter 1: Selective Perception Chapter 5: Plasticity Chapter 6: Effects of Question Wording and Framing Chapter 13: Anchoring and Adjustment
Homework review You are looking to see if “class standing” affects the “level of sales”. Class standing Independent variable (IV):______________ Number of levels of IV: ________________ (how many means?) Quasi or True experiment:______________ Dependent variable: __________________ Between or within participant design: ______________ In this study, what is the operational definition of “class standing”? In this study, what is the operational definition of “level of sales”? 4 Quasi Level of sales Between Classification based on units earned Number of bags of peanuts sold
Homework review You are looking to see whether “type of program” has an effect on “body transformation”. Please identify the following variables: Independent variable (IV):______________ Number of levels of IV: _______________ (how many means?) Quasi or True experiment:______________ Dependent variable: __________________ Between or within participant design: ______________ What is the operational definition of “type of program”? What is the operational definition of “body transformation”? Type of program 2 True Body transformation Between Type of program = type of diet (regular versus programmatic diet) Body transformation = number of pounds lost
Homework review You are looking to see which driving choice is most efficient. So you ask each driver to drive each of the three routes and time themselves on how long it takes. Please identify the following variables: Independent variable (IV):______________ (how many means) Number of levels of IV: ________________ Dependent variable: __________________ Between or within participant design: ______________ What is the operational definition of “driving efficiency”? What is the operational definition of “driving choice”? Type of route 3 driving efficiency Within Driving efficiency = travel time (measured in minutes) Driving choice = route taken
Maggie wanted to compare the median salary of male and female lawyers. • Using census data she found that the median salary for male lawyers was $66,000 while the median yearly salary for female lawyers was $61,000. Please identify the following information: • Independent variable (IV): __________________________ • Number of levels of IV: __________________________ • Quasi or True experiment: __________________________ • Dependent variable: __________________________ • Between or within participant design: ____________________ • Level of measurement for “Gender”: _____________________ • Level of measurement for “Salary”: ______________________
Time series versus cross-sectional comparisons: Trends over time versus a snapshot comparison Time series design: Each observation represents a measurement at some point in time. Repeated measurements allow us to see trends. Cross-sectional design: Each observation represents a measurement at some point in time. Comparing across groups allows us to see differences. Traffic accidents Please note: Any one piece of data can often (not always) be used in either a time series comparison or a cross-sectional comparison. It depends how you set up your question. Does Tucson or Albuquerque have more traffic accidents (they have similar population sizes)? Does Tucson have more traffic accidents as the year ends and winter approaches?
Time series versus cross-sectional comparisons: Trends over time versus a snapshot comparison Time series design: Each observation represents a measurement at some point in time. Repeated measurements allow us to see trends. Cross-sectional design: Each observation represents a measurement at some point in time. Comparing across groups allows us to see differences. Unemployment rate Is there an increase in workers calling in sick as the summer months approach? Do more young workers call in sick than older workers? Grade point average (GPA) Does GPA tend to go up or down as students move from freshman to sophomores to juniors to seniors? Does GPA tend to go up or down when you compare Mr. Chen’s class with Mr. Frank’s Freshman English classes?
Descriptive or inferential? Descriptive statistics - organizing and summarizing data Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected What is the average height of the basketball team? Measured all of the players and reported the average height Measured only a sample of the players and reported the average height for team In this class, percentage of students who support the death penalty? Measured all of the students in class and reported percentage who said “yes” Measured only a sample of the students in class and reported percentage who said “yes” Based on the data collected from the students in this class we can conclude that 60% of the students at this university support the death penalty Measured all of the students in class and reported percentage who said “yes”
Descriptive or inferential? Descriptive statistics - organizing and summarizing data Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected Men are in general taller than women Measured all of the citizens of Arizona and reported heights Shoe size is not a good predictor of intelligence Measured all of the shoe sizes and IQ of students of 20 universities Blondes have more fun Asked 500 actresses to complete a happiness survey The average age of students at the U of A is 21 Asked all students in the fraternities and sororities their age
Descriptive vs inferential statistics Descriptive statistics - organizing and summarizing data Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected To determine this we have to consider the methodologies used in collecting the data
Population (census) versus sampleParameter versus statistic Parameter – Measurement or characteristic of the population Usually unknown (only estimated) Usually represented by Greek letters (µ) pronounced “mu” pronounced “mew” Statistic – Numerical value calculated from a sample Usually represented by Roman letters (x) pronounced “x bar”
Sample versus census How is a census different from a sample? Census measures each person in the specific population Sample measures a subset of the population and infers about the population – representative sample is good What’s better? Use of existing survey data U.S. Census Family size, fertility, occupation The General Social Survey Surveys sample of US citizens over 1,000 items Same questions asked each year
Simple random sampling: each person from the population has an equal probability of being included Sample frame = how you define population Let’s take a sample …a random sample Question: Average weight of U of A football player Sample frame population of the U of A football team Pick 24th name on the list Random number table – List of random numbers Or, you can use excel to provide number for random sample =RANDBETWEEN(1,110) Pick 64th name on the list(64 is just an example here) 64
Systematic random sampling: A probability sampling technique that involves selecting every kth person from a sampling frame You pick the number Other examples of systematic random sampling 1) check every 2000th light bulb 2) survey every 10th voter
Stratified sampling: sampling technique that involves dividing a sample into subgroups (or strata) and then selecting samples from each of these groups - sampling technique can maintain ratios for the different groups Average number of speeding tickets 12% of sample is from California 7% of sample is from Texas 6% of sample is from Florida 6% from New York 4% from Illinois 4% from Ohio 4% from Pennsylvania 3% from Michigan etc Average cost for text books for a semester 17.7% of sample are Pre-business majors 4.6% of sample are Psychology majors 2.8% of sample are Biology majors 2.4% of sample are Architecture majors etc
Cluster sampling: sampling technique divides a population sample into subgroups (or clusters) by region or physical space. Can either measure everyone or select samples for each cluster Textbook prices Southwest schools Midwest schools Northwest schools etc Average student income, survey by Old main area Near McClelland Around Main Gate etc Patient satisfaction for hospital 7th floor (near maternity ward) 5th floor (near physical rehab) 2nd floor (near trauma center) etc
Convenience sampling: sampling technique that involves sampling people nearby. A non-random sample and vulnerable to bias Snowball sampling: a non-random technique in which one or more members of a population are located and used to lead the researcher to other members of the population Used when we don’t have any other way of finding them Also vulnerable to biases
Judgment sampling: sampling technique that involves sampling people who an expert says would be useful. A non-random sample and vulnerable to bias Focus group: members can be randomly or not randomly selected. Mediator gathers opinion and information from group. Information can be qualitative or quantitative
Questionnaires use self-report items for measuring constructs. Constructs are operationally defined by content of items. Questionnaire is a set of fixed-format, self-report items completed without supervision or time-constraint Response rate and power of random sampling Number of responders versus percentage of responders Random sampling vs non-random sampling Really important regarding bias! Really important regarding bias! Wording, order, balance can all affect results
Questionnaires use self-report items for measuring constructs. Constructs are operationally defined by content of items. As “consumers” of questionnaire data – what should we ask? Number of responders versus percentage of responders Methodology of sampling Operational definitions of constructs Wording As “composers” of questionnaire data – how should we ask? - pilot – fix - pilot – analyze – fix - pilot – all the way through your design
Connecting intentions of studies with Experimental Methodologies Appropriate statistical analyses Appropriate graphs Today I want to present some “typical designs”. We will spend the next couple weeks filling in the details. We’ll come back to these distinctions over and over again, and build on them for the rest of the semester. Let’s get this overview well! Not worry about calculation details for now
Create example of each type Identify IV (one or two) Identify DV (one or two) Draw possible graph for each Writing Assignment Think about this as we work through each type of study Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation Study Type 6: Simple and Multiple regression We’ll come back to these distinctions over and over again, and build on them for the rest of the class. Let’s get this overview well! Not worry about calculation details for now Study Type 7: Chi Square
Remember, this is just introduction to the idea Not worry about calculation details for now, we will get to those soon Study Type 1: Confidence Intervals On average newborns weigh 7 pounds, and are 20 inches long. My sister just had a baby - guess how much it weighs? Makes sense, right?!? Guess the mean. On average you would be right most often if you always guessed the mean Point estimate versus confidence interval: Guessing a single number versus a range of numbers What if you really needed to be right?!!? You could guess a range with smallest and largest possible scores. (how wide a range to be completely sure? Confidence interval: Guessing a range (max and min) and assigning a level of confidence that the score falls in that range
Remember, this is just introduction to the idea Not worry about calculation details for now, we will get to those soon Study Type 1: Confidence Intervals Confidence Intervals: A range of values that, with a known degree of certainty, includes an unknown population characteristic, such as a population mean 100% Confidence Interval: We can be 100% confident that our population mean falls between these two scores (Guess absurdly large and small values) 99% Confidence Interval: We can be 99% confident that our population mean falls between these two scores 95% Confidence Interval: We can be 95% confident that our population mean falls between these two scores Which has a wider interval relative to raw scores 95% or 99%?
Remember, this is just introduction to the idea Not worry about calculation details for now, we will get to those soon Study Type 1: Confidence Intervals Confidence Intervals: A range of values that, with a known degree of certainty, includes an unknown population characteristic, such as a population mean • This sample of 10,000 newborns • a mean weight is 7 pounds. What do you think the minimum and maximum weights would be to capture 95% of all newborns? • This sample of 1000 flights, the mean number of empty seats is 12. What do you think the minimum and maximum number of empty seats are likely to be in the flights today with a 95% level of certainty? • You can use a mean of a sample • to guess the • mean of population • mean of a smaller sample • most likely score for an individual • This sample of 500 households produced • a mean income of $35,000 a year. What do you think the minimum and maximum income levels are so that we are 95% confident that we captured Mabel’s?
Study Type 1: Confidence Intervals Study Type 2: t-test We are looking to compare two means
Study Type 2: t-test analysis Single Independent Variable (categorical) comparing two groups Single Dependent Variable (numerical/continuous) Used to test the effect of the IV on the DV Andrea was interested in the effect of vacation time on productivity of the workers in her department. She randomly assigned workers into two groups, she allowed one group to go on vacation while the other group had no vacation. After the vacation she measured productivity for the two groups. Independent Variable Dependent Variable Between or within Quasi or true Causal relationship? Productivity Yes Vacation No Vacation
Andrea was interested in the effect of vacation time on productivity of the workers in her department. She randomly assigned workers into two groups, she allowed one group to go on vacation while the other group had no vacation. After the vacation she measured productivity for the two groups. This is an example of a true experiment. Dependent variable is always quantitative If “true” experiment (randomly assigned to groups) we can conclude that vacation had an effect - it increased productivity In t-test, independent variable is qualitative (with two groups) If “quasi” experiment (not randomly assigned to groups), we can conclude only that data suggest that vacation may have had an effect; productivity increased for those who went on vacation, but we can’t rule out other explanations.
Study Type 2: t-test analysis Single Independent Variable (categorical) comparing two groups Single Dependent Variable (numerical/continuous) Comparing two means (2 bars on graph) Used to test the effect of the IV on the DV Please note: a t-test allows us to compare two means If the means are statistically different - we say that there is “real” difference that is not just due to chance - we say there is a statistically significant difference p < 0.05 p < 0.05 is most common value – the “p value” can vary (p < 0.01, or p < 0.001)
Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Comparing more than two means
Study Type 3: One-way ANOVA Single Independent Variable comparing more than twogroups Single Dependent Variable (numerical/continuous) Used to test the effect of the IV on the DV Ian was interested in the effect of incentives for girl scouts on the number of cookies sold. He randomly assigned girl scouts into one of three groups. The three groups were given one of three incentives and looked to see who sold more cookies. The 3 incentives were 1) Trip to Hawaii, 2) New Bike or 3) Nothing. This is an example of a true experiment How could we make this a quasi-experiment? Independent Variable: Type of incentive Levels of Independent Variable: None, Bike, Trip to Hawaii Dependent Variable: Number of cookies sold Levels of Dependent Variable: 1, 2, 3 up to max sold Between participant design Causal relationship: Incentive had an effect – it increased sales
Study Type 3: One-way ANOVA Single Independent Variable comparing more than two groups Single Dependent Variable (numerical/continuous) Used to test the effect of the IV on the DV Ian was interested in the effect of incentives for girl scouts on the number of cookies sold. He randomly assigned girl scouts into one of three groups. The three groups were given one of three incentives and looked to see who sold more cookies. The 3 incentives were 1) Trip to Hawaii, 2) New Bike or 3) Nothing. This is an example of a true experiment Dependent variable is always quantitative Sales Sales New Bike None Trip Hawaii New Bike None Trip Hawaii In an ANOVA, independent variable is qualitative (& more than two groups)
Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Comparing two independent variables Each one has multiple levels
“Two-way” = “Two IVs” Study Type 4: Two-way ANOVA Ian was interested in the effect of incentives (and age) for girl scouts on the number of cookies sold. He randomly assigned girl scouts into one of three groups. The three groups were given one of three incentives and he looked to see who sold more cookies. The 3 incentives were: 1) Trip to Hawaii, 2) New Bike or 3) Nothing. He also measured the scouts’ ages. Independent Variable #1 Independent Variable #2 Dependent Variable
Study Type 4: Two-way ANOVA Multiple Independent Variables (categorical), each variable comparing two or moregroups Single Dependent Variable (numerical/continuous) Used to test the effect of two IV on the DV Independent Variable #1: Type of incentive Levels of Independent Variable: None, Bike, Trip to Hawaii Independent Variable #2: Age Levels of Independent Variable: Elementary girls versus college Dependent Variable: Number of cookies sold Levels of Dependent Variable: 1, 2, 3 up to max sold Between participant design Results: Incentive had an effect – it increased sales Data suggest age had an effect – older girls sold more
Study Type 4: Two-way ANOVA Two Independent Variables (categorical) Single Dependent Variable (numerical/continuous) Used to test the effect of two IV on the DV Dependent variable is always quantitative College College Elementary Sales Elementary Sales New Bike None Trip Hawaii New Bike None Trip Hawaii In an ANOVA, both independent variables are qualitative (with more than two groups)
Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation
Pretty much all correlations are “quasi-experimental” Study Type 5: Correlation plots relationship between two continuous / quantitative variables Neutral relative to causality – but especially useful for predictions Relationship between amount of money spent on advertising and amount of money made in sales Dependent variable is always quantitative Dollars spent on Advertising Positive Correlation In correlation, both variables are quantitative Dollars in Sales Describe strength and direction of correlation – in this case positive/strong Graphing correlations use scatterplots (not bar graphs)
Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation Study Type 6: Simple and Multiple regression
YearlyIncome Expenses per year Study Type 6: Regression: Using the correlation to predict the value of one variable based on its relationship with the other variable Multiple regression will use multiple independent variables to predict the dependent variable You probably make this much The predicted variable goes on the “Y” axis and is called the dependent variable. The predictor variable goes on the “X” axis and is called the independent variable You probably make this much Dependent Variable (Predicted) If you spend this much If you save this much Independent Variable 1 (Predictor) If you spend this much Independent Variable 2 (Predictor)
Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation Study Type 6: Simple and Multiple regression Study Type 7: Chi Square
Study Type 7: Chi square is used to evaluate nominal data (just count how many in each category) or objects or events Variables are nominal or ordinal (so we comparing frequencies, not means) What is most popular ride at Disneyland? Just count how many people ride each one. a. Dumbob. Peter Panc. Space Mountaind. Splash Mountaine. Small World We could gather this data using clickers
Thank you! See you next time!!