670 likes | 836 Views
. Please start your Daily Portfolio. Use this as your study guide. By the end of next two lectures 7/10/13. Dot Plots Frequency Distributions - Frequency Histograms Frequency, relative frequency Guidelines for constructing frequency distributions Correlational methodology
E N D
. Please start your Daily Portfolio
Use this as your study guide By the end of next two lectures7/10/13 • Dot Plots • Frequency Distributions - Frequency Histograms • Frequency, relative frequency • Guidelines for constructing frequency distributions • Correlational methodology • Positive, Negative and Zero correlation • Characteristics of a distribution • Central Tendency – Dispersion - Shape • What are the three primary types of“measures of central tendency”? • Mean - Median - Mode • Measures of variability • Range, Standard deviation and Variance • Memorizing the four definitional formulae • Estimating standard deviation from distribution
Introduction to Statistics for the Social SciencesSBS200, COMM200, GEOG200, PA200, POL200, or SOC200Lecture Section 001, Summer Session II, 20139:00 - 11:20am Monday - FridayRoom 312 Social Sciences (Monday – Thursdays)Room 480 Marshall Building (Fridays) Welcome http://www.youtube.com/watch?v=oSQJP40PcGI
Please double check – Allcell phones other electronic devices are turned off and stowed away http://www.youtube.com/watch?v=oSQJP40PcGI
Homework due – Thursday On class website: please print and complete homework #3 & 4 worksheet Rubric available online Peer review on Thursday Important to bring homework totally done on Thursday!
Schedule of readings Before Friday: Please read chapters 1 - 2 in Ha & Ha textbook Please read Appendix D, E & F online Please read Chapters 1, 5, 6 and 13 in Plous Chapter 1: Selective Perception Chapter 5: Plasticity Chapter 6: Effects of Question Wording and Framing Chapter 13: Anchoring and Adjustment Study guide is online
Please click in My last name starts with a letter somewhere between A. A – D B. E – L C. M – R D. S – Z
Homework review You are looking to see if “class standing” affects the “level of sales”. Class standing Independent variable (IV):______________ Number of levels of IV: ________________ (how many means?) Quasi or True experiment:______________ Dependent variable: __________________ Between or within participant design: ______________ In this study, what is the operational definition of “class standing”? In this study, what is the operational definition of “level of sales”? 4 Quasi Level of sales Between Classification based on units earned Number of bags of peanuts sold
Homework review You are looking to see whether “type of program” has an effect on “body transformation”. Please identify the following variables: Independent variable (IV):______________ Number of levels of IV: _______________ (how many means?) Quasi or True experiment:______________ Dependent variable: __________________ Between or within participant design: ______________ What is the operational definition of “type of program”? What is the operational definition of “body transformation”? Type of program 2 True Body transformation Between Type of program = type of diet (regular versus programmatic diet) Body transformation = number of pounds lost
Homework review You are looking to see which driving choice is most efficient. So you ask each driver to drive each of the three routes and time themselves on how long it takes. Please identify the following variables: Independent variable (IV):______________ (how many means) Number of levels of IV: ________________ Dependent variable: __________________ Between or within participant design: ______________ What is the operational definition of “driving efficiency”? What is the operational definition of “driving choice”? Type of route 3 driving efficiency Within Driving efficiency = travel time (measured in minutes) Driving choice = route taken
Notice that the operational definition of each construct matters Homework review
gender Homework review 2 quasi salary between nominal ratio
Name of City 3 Quasi- experiment Temperature Between Nominal Interval
Homework review city 3 quasi temperature between nominal interval Must be complete and must be stapled Hand in your homework
Writing Assignment – Pop Quiz Ari conducted a watermelon seed spitting experiment. She wanted to know if people can spit farther if they get a running start. She tested 100 people. She randomly assigned them into one of two groups. One group stood still on the starting line and spit their watermelon seeds as far as they could. The second group was allowed to run up to the starting line before they spit their watermelon seeds. She measured how far each person spit their watermelon seeds. Please answer the following questions 1. What is the independent variable? 2. The independent variable: Is it continuous or discrete? 3. The independent variable: Is it nominal, ordinal, interval or ratio? 4. What is the dependent variable? 5. The dependent variable: Is it continuous or discrete? 6. The dependent variable: Is it nominal, ordinal, interval or ratio? 7. Is this a quasi or true experiment? 8. Is this a within or between participant design 9. Is this a single blind, double blind or not at all blind experiment?
Questionnaire Homework
Questionnaire Homework
Questionnaire Homework
Questionnaire Homework Variable label and scale values Variable label and scale values
Questionnaire Homework What might you graph?
Designed our study / observation / questionnaire Collected our data Organize and present our results
Scatterplot displays relationships between two continuous variables Correlation: Measure of how two variables co-occur and also can be used for prediction Range between -1 and +1 The closer to zero the weaker the relationship and the worse the prediction Positive or negative
Correlation Range between -1 and +1 +1.00 perfect relationship = perfect predictor +0.80 strong relationship = good predictor +0.20 weak relationship = poor predictor 0 no relationship = very poor predictor -0.20 weak relationship = poor predictor -0.80 strong relationship = good predictor -1.00 perfect relationship = perfect predictor
Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down Height of Mothers by Height of Daughters Height ofMothers Positive Correlation Height of Daughters
Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down Brushing teeth by number cavities BrushingTeeth Negative Correlation NumberCavities
Perfect correlation = +1.00 or -1.00 One variable perfectly predicts the other Height in inches and height in feet Speed (mph) and time to finish race Positive correlation Negative correlation
Correlation The more closely the dots approximate a straight line,(the less spread out they are) the stronger the relationship is. Perfect correlation = +1.00 or -1.00 One variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line
Correlation does not imply causation Is it possible that they are causally related? Yes, but the correlational analysis does not answer that question What if it’s a perfect correlation – isn’t that causal? No, it feels more compelling, but is neutral about causality Number of Birthdays Number of Birthday Cakes
Positive correlation: as values on one variable go up, so do values for other variable Negative correlation: as values on one variable go up, the values for other variable go down Number of bathrooms in a city and number of crimes committed Positive correlation Positive correlation
Linear vs curvilinear relationship Linear relationship is a relationship that can be described best with a straight line Curvilinear relationship is a relationship that can be described best with a curved line
Correlation - How do numerical values change? http://neyman.stat.uiuc.edu/~stat100/cuwu/Games.html http://argyll.epsb.ca/jreed/math9/strand4/scatterPlot.htm Let’s estimate the correlation coefficient for each of the following r = +.80 r = +1.0 r = -1.0 r = -.50 r = 0.0
This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches). 48 52 5660 64 68 72 Both axes and values are labeled Both axes and values are labeled Both variables are listed, as are direction and strength Height of Mothers (in) 48 52 56 60 64 68 72 76 Height of Daughters (inches)
Break into groups of 2 or 3 Each person hand in own worksheet. Be sure to list your name and names of all others in your group Use examples that are different from those is lecture 1. Describe one positive correlation Draw a scatterplot (label axes) 2. Describe one negative correlation Draw a scatterplot (label axes) 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)
Both variables are listed, as are direction and strength Both axes and values are labeled Both axes and values are labeled This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches). 48 52 5660 64 68 72 1. Describe one positive correlation Draw a scatterplot (label axes) Height of Mothers (in) 2. Describe one negative correlation Draw a scatterplot (label axes) 48 52 56 60 64 68 72 76 Height of Daughters (inches) 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)
Describing Data Visually Lists of numbers too hard to see patterns 14 17 20 25 21 29 16 25 27 18 16 13 11 21 19 24 20 11 20 28 16 13 17 14 14 16 8 17 17 11 11 14 17 19 24 8 16 12 25 9 20 17 11 14 16 18 22 14 18 23 12 15 10 13 15 11 11 8 11 14 17 19 24 8 12 14 17 20 25 9 12 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 Organizing numbers helps Graphical representation even more clear This is a dot plot
Describing Data Visually 8 12 14 17 19 24 8 12 14 17 20 25 9 13 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 Measuring the “frequency of occurrence” Then figure “frequency of occurrence” for the bins We’ve got to put these data into groups (“bins”)
Frequency distributions Frequency distributions an organized list of observations and their frequency of occurrence How many kids are in your family? What is the most common family size?
Another example: How many kids in your family? Number of kids in family 1 3 1 4 2 4 2 8 2 14 14 4 2 1 4 2 3 2 1 8
Frequency distributions Number of kids in family 1 3 1 4 2 4 2 8 2 14 How many kids are in your family? What is the most common family size? Crucial guidelines for constructing frequency distributions: 1. Classes should be mutually exclusive: Each observation should be represented only once (no overlap between classes) Wrong 0 - 5 5 - 10 10 - 15 Correct 0 - 4 5 - 9 10 - 14 Correct 0 - under 5 5 - under 10 10 - under 15 2. Set of classes should be exhaustive: Should include all possible data values (no data points should fall outside range) Correct 0 - 3 4 - 7 8 - 11 12 - 15 Wrong 0 - 3 4 - 7 8 - 11 No place for our family of 14!
Frequency distributions Number of kids in family 1 3 1 4 2 4 2 8 2 14 How many kids are in your family? What is the most common family size? Crucial guidelines for constructing frequency distributions: 3. All classes should have equal intervals (even if the frequency for that class is zero) Correct 0 - 4 5 - 9 10 - 14 Wrong 0 - 4 8 - 12 14 - 19 Correct 0 - under 5 5 - under 10 10 - under 15 missing space for families of 5, 6, or 7
8 12 14 17 19 24 8 12 14 17 20 25 9 13 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 4. Selecting number of classes is subjective Generally 5 -15 will often work How about 6 classes? (“bins”) How about 16 classes? (“bins”) How about 8 classes? (“bins”)
8 12 14 17 19 24 8 12 14 17 20 25 9 13 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 5. Class width should be round (easy) numbers Lower boundary can be multiple of interval size Remember: This is all about helping readers understand quickly and clearly. Clear & Easy 8 - 11 12 - 15 16 - 19 20 - 23 24 - 27 28 - 31 Round numbers: 5, 10, 15, 20 etc or 3, 6, 9, 12 etc • 6. Try to avoid open ended classes • For example • 10 and above • Greater than 100 • Less than 50
Let’s do one Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 53 58 60 61 64 69 70 72 73 75 75 76 78 80 82 84 84 84 87 87 87 88 89 91 93 94 95 99 Step 1: List scores Step 2: List scores in order Step 3: Decide whether grouped or ungrouped If less than 10 groups, “ungrouped” is fine If more than 10 groups, “grouped” might be better How to figure how many values Largest number - smallest number + 1 99 - 53 + 1 = 47 Step 4: Generate number and size of intervals (or size of bins) If we have 6 bins – we’d have intervals of 8 Sample size (n) 10 – 16 17 – 32 33 – 64 65 – 128 129 - 255 256 – 511 512 – 1,024 Number of classes 5 6 7 8 9 10 11 Let’s just try it and see which we prefer… Whaddya think? Would intervals of 5 be easier to read?
Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 Scores on an exam Score Frequency 93 - 100 4 85 - 92 6 77- 84 6 69 - 76 7 61- 68 2 53 - 60 3 Scores on an exam Score Frequency 95 - 99 2 90 - 94 3 85 - 89 5 80 – 84 5 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 53 58 60 61 64 69 70 72 73 75 75 76 78 80 82 84 84 84 87 87 87 88 89 91 93 94 95 99 Let’s just try it and see which we prefer… 6 bins Interval of 8 10 bins Interval of 5 Scores on an exam Score Frequency 95 - 99 2 90 - 94 3 85 - 89 5 80 – 84 5 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Remember: This is all about helping readers understand quickly and clearly.
Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 Scores on an exam Score Frequency 95 - 99 2 90 - 94 3 85 - 89 5 80 – 84 5 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Let’s make a frequency histogram using 10 bins and bin width of 5!!
Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 Step 6: Complete the Frequency Table Scores on an exam Score Frequency 95 - 99 2 90 - 94 3 85 - 89 5 80 – 84 5 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 RelativeCumulative Frequency 1.0000 .9285 .8214 .6428 .4642 .3213 .2142 .1785 .0714 .0357 Relative Frequency .0715 .1071 .1786 .1786 .1429 .1071 .0357 .1071 .0357 .0357 Cumulative Frequency 28 26 23 18 13 9 6 5 2 1 Just adding up the relative frequency data from the smallest to largest numbers Please note: Also just dividing cumulative frequency by total number 1/28 = .0357 2/28 = .0714 5/28 = .1786 6 bins Interval of 8 Just adding up the frequency data from the smallest to largest numbers Just dividing each frequency by total number to get a ratio (like a percent) Please note: 1 /28 = .0357 3/ 28 = .1071 4/28 = .1429
Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 Where are we? Scores on an exam Score Frequency 95 - 99 2 90 - 94 3 85 - 89 5 80 – 84 5 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Relative Frequency .0715 .1071 .1786 .1786 .1429 .1071 .0357 .1071 .0357 .0357 Cumulative Rel. Freq. 1.0000 .9285 .8214 .6428 .4642 .3213 .2142 .1785 .0714 .0357 Cumulative Frequency 28 26 23 18 13 9 6 5 2 1 Cumulative Frequency Data Cumulative Frequency Histogram