790 likes | 942 Views
Basic Quantitative Methods in the Social Sciences (AKA Intro Stats). 02-250-01 Lecture 2. Sign Up for Participant Pool!!. see Psychology research first hand! earn up to 2 bonus points HOW???? sign up on the web (takes less than 5 minutes): www.uwindsor.ca/psychology/signup
E N D
Basic Quantitative Methods in the Social Sciences(AKA Intro Stats) 02-250-01 Lecture 2
Sign Up for Participant Pool!! • see Psychology research first hand! • earn up to 2 bonus points • HOW???? • sign up on the web (takes less than 5 minutes): • www.uwindsor.ca/psychology/signup • or access through psych homepage • You MUST sign up by May 19 to be included
Major Points Today • Types of Measurement • Summation Notation • Organizing Data • Stem and Leaf Displays • Graphs • Measures of Central Tendency
Types of Measurement • There are 4 types of measurement most often used in statistics: • Nominal • Ordinal • Interval • Ratio
Nominal Measurement • Nominal Measurement: the classification of measurements into a set of categories • The numbers produced by nominal measurement are frequencies of occurrence in the categories (e.g., 22 ducks, 12 chickens, 2 geese, etc)
Nominal Measurement cont. • A second example is gender – 2 categories, male and female • Nominal measurement applies to qualitative variables - elements are assigned to a category because they possess one characteristic or another • Nominal data is also termed qualitative data
Ordinal Measurement • Ordinal Measurement: the rank ordering of elements on a continuum • Ordinal measurement does not measure the amount of the variable - it represents the individual’s placement in a continuum (or ranking; e.g., the winner of a race is in “first place”)
Ordinal Measurement cont. • It is important to note that the amount of variable difference between rank position is not constant - the difference in amount of talent between the 1st and 2nd place finishers in a race cannot be assumed to be the same as the difference in amount of talent between the 5th and 6th place finishers • Ordinal data can tell you that the person in 1st place finished before the person in 3rd place, but not by how much
Interval Measurement • Interval Measurement: the assignment of numerical quantity to the variable in a way that: • the number assigned reflects the amount of the variable • the size of the measurement unit remains constant • and the zero point is defined arbitrarily and does not represent an absence of the property being measured
Interval Measurement cont. • The best example is temperature • 40°C represents how hot something is (the amount of heat it has) • The unit of measurement (1°C) represents the same amount of heat regardless of where it occurs in the range of measurement (the amount of change in temperature is the same between 25°C - 26°C and 32°C - 33°C) • The zero point (0°C) is arbitrary - it represents the point at which water freezes, not the absence of temperature
Interval Measurement cont. • Interval measurement can contain negative numbers, whereas Nominal and Ordinal Measurement do not
Ratio Measurement • Ratio Measurement: The assignment of numerical quantity to the variable in such a way that: • the number assigned reflects the amount of the variable • the size of the measurement unit remains constant • and the zero point represents an absence of the property being measured
Ratio Measurement • Good examples are time and length • A ratio scale cannot produce negative numbers • Interval and ratio measurement are equivalent for statistical purposes and are often referred to as one thing (interval/ratio data)
Summation Notation • We commonly use the letters “X” and “Y” to represent the variables we have measured • Upper case Greek letter sigma () is known as the summation operator; it means “the sum of”
Example • Suppose we keep a record for 6 days of every time someone slips in the CAW Student Centre Cafeteria (represented by X), the data may look like this:
Data Example Day X Mon 10 Tues 5 Weds 12 Thurs 11 Fri 21 Sat 28
X • X means the sum of all the X scores, so that: • X = X1 + X2 + X3 + ... XN • = 10 + 5 + 12 + 11 + 21 + 28 • = 87 • Note: X1 means the first X score – XN means the last X score
(X)2 • (X)2 means the square of the sum (total all numbers within parentheses and then square), so that: • (X)2 = (X1 + X2 + X3 + ... XN)2 • = (10 + 5 + 12 + 11 + 21 + 28)2 • = (87)(87) • = 7569
X 2 • X 2 means the sum of the squares (square each number and then sum), so that: • X 2 = X12 + X22 + X32 + ... XN2 • = 10 2 + 5 2 + 12 2 + 11 2 + 21 2 + 28 2 • = 100 + 25 + 144 + 121 + 441 + 784 • = 1615
More Summation Notation • Suppose you also keep track of the number of pieces of garbage dropped on the floor of the CAW Student Centre for the same days as above (variable Y) and the data were as follows:
Example Data DayXY Mon10210 Tues5160 Weds12245 Thurs11240 Fri21340 Sat28415
XY • XY means the sum of the products: • XY = (X1)(Y1) + (X2)(Y2) + (X3)(Y3) + ... (XN)(YN) • = (10)(210) + (5)(160) + (12)(245) + (11)(240) + (21)(340) + (28)(415) • = 2100 + 800 + 2940 + 2640 + 7140 + 11620 • = 27240
Organizing Data • Frequency Distributions: A frequency distribution is a table which shows the number of individuals or events that occurred at each measurement value • this is the most common form of organizing data
Frequency Distributions • The following hypothetical frequency distribution shows the number of women in different majors at the University of Windsor: Major # of Women Art 15 Biology 35 Chemistry 34 Music 85 Psychology 97
Frequency Distributions • This frequency distribution organizes the data into nominal categories (by major) • Frequency distributions can also organize data by points of measurement on a continuous variable, as follows:
Frequency Distributions Age of Students in 02-250: Age Frequency 18 14 19 85 20 58 21 40 22 35 23 16 24 10 25 6 26 4
Frequency Distributions • Frequency distributions should not exceed 15 to 20 lines, as the point is to summarize the data in a way that represents all the information concisely • When there are more data than can be classified in 20 lines, the data can be grouped into score ranges known as class intervals, as in this example:
Class Interval Example Canada Population Estimates for the Year 2016 (in millions) Age Pop Age Pop 0 - 4 2.05 50 - 54 2.79 5 - 9 2.07 55 - 59 2.69 10 - 14 2.12 60 - 64 2.31 15 - 19 2.19 65 - 69 1.97 20 - 24 2.38 70 - 74 1.42 25 - 29 2.48 75 - 79 0.99 30 - 34 2.54 80 - 84 0.71 35 - 39 2.53 85 - 89 0.47 40 - 44 2.51 90 + 0.33 45 - 49 2.57
Frequency Distributions cont. • Looking at the frequency distribution tells us: • The most frequently occurring age is expected to be in the 50-54 age range (b/c this is the largest population estimate, 2.79 million). • The age frequencies are expected to be fairly evenly distributed from 0 to 70 years old and then fall off • The expected distributions of ages is not symmetrical: very low (young) and high (old) ages do not occur with equal likelihood
Frequency Distributions cont. • Dividing the data into class intervals makes the data more accessible • Data which has been divided into class intervals is sometimes referred to as grouped data
Cumulative Frequency Distributions • Frequency distributions can be made to contain more information, as when a column of cumulative frequencies is added • Cumulative Frequency Distribution: A table in which the frequency of individuals or events at each measurement value is added to previous frequencies so that each line reads as the total frequency of that and lower measurement values
Cumulative Frequency Ex. Age of Students in 02-250: Age Frequency Cumulative Frequency 18 14 14 19 85 99 20 58 157 21 40 197 22 35 232 23 16 248 24 10 258 25 6 264 26 4 268
More Frequency Distributions • Frequency distributions can also contain information about the percentages and cumulative percentages of observations at the various scores:
More Frequency Distributions Age of Students in 02-250: Age Frequency Cumulative % Cumulative Frequency % 18 14 14 5.22 5.22 19 85 99 31.72 36.94 20 58 157 21.64 58.58 21 40 197 14.93 73.51 22 35 232 13.06 86.57 23 16 248 5.97 92.54 24 10 258 3.93 96.27 25 6 264 2.24 98.51 26 4 268 1.49 100.00
Exact Limits • All measurements are expressed in discrete units, such as seconds or centimeters • No matter how small the unit of measurement, it is always possible to imagine finer measurement • 1 cm = 10 mm
Exact Limits • So, for continuous variables, any measure should be viewed as representing a range of values • This range has a width equal to the unit of measurement used, and the boundaries of this range are the exact limits of the measure
Exact Limits • E.g., If we say an event lasted 12 seconds, we mean it is closer to 12 seconds than to 11 or 13 seconds. A score of 12 represents a range of values. This range is one second wide (one unit of the measurement) and extends between 11.5 and 12.5 seconds
Exact Limits • Exact limits identify the upper and lower ends of the range represented by the raw score and are the real boundaries of the measure in question
Exact Limits • Exact Limits: Values one-half unit of measurement above and below the score or class interval. Exact limits are the boundaries of the range of values represented by the measure • Some authors refer to exact limits as real limits
Exact Limits Examples Measure Exact Limits 52 51.5 - 52.5 51 50.5 - 51.5 52.2 52.15 - 52.25 52.1 52.05 - 52.15
Exact Limits Examples Measure Exact Limits 50.02 50.015 - 50.025 50.01 50.005 - 50.015 Class Interval Exact Limits 50 - 54 49.5 - 54.5 55 - 59 54.5 - 59.5
Stem-and-Leaf Displays • Stem-and-Leaf Display: partitions each score into a “stem” and a “leaf” and groups the scores according to common stems • The “Leaf” is the rightmost digit • The “Stem” is the digit (or digits) to the left of the leaf (the stem is 0 for 1 digit numbers)
Stem-and-Leaf E.g., Stem Leaf 4 0 4 54 5 4 123 12 3 • 123 4 The numbers 24 and 26 have different “leaves”(4 and 6) but the same stem (2)
Stem-and-Leaf • Consider this raw data and their stem-and-leaf plot:
Stem-and-Leaf stemleaf 36 4477 505899 601225788 724559 8578 92 Data: 36, 44, 47, 47, 50, 55, 58, 59, 59, 60, 61, 62, 62, 65, 67, 68, 68, 72, 74, 75, 75, 79, 85, 87, 88, 92
Stem-and-Leaf • Or this example: • Data: 102, 104, 115, 116, 116, 125, 127, 128, 129, 129, 131, 136, 137, 145, 145 stem leaf 10 24 11 566 12 57899 13 167 14 55
Stem-and-Leaf • Unlike frequency distributions, stem-and-leaf plots give an indication of the overall distribution of the scores (e.g., evenly spread or bunched, symmetrical or nonsymmetrical) • Note: Make sure you include every instance of a given value, e.g., if 57 occurs 3 times in the data set, this should be represented in the stem and leaf display with a stem of 5 and three 7s in the leaf.
Graphs • Graph refers to all manner of pictorial, or graphic, representation of data • We will consider histograms and frequency polygons
Graphs • The horizontal axis (X axis) is labeled with units representing points of measurement and the vertical axis (Y axis) is labeled with values representing frequency of occurrence • Histograms and frequency polygons are like 2-dimensional representations of frequency distributions
Histogram • Histogram: A graphic in which the horizontal axis identifies points of measurement, and the vertical axis represents frequency of occurrence • Solid bars are used to represent the frequency at each point of measurement (a histogram is a bar graph)