550 likes | 673 Views
PPA 501 – Analytical Methods in Administration. Lecture 5a - Counting and Charting Responses. Percentages and Proportions. Percentages and proportions supply a frame of reference for reporting research results by standardizing the raw data: percentages by base 100 and proportions by base 1.00.
E N D
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses
Percentages and Proportions • Percentages and proportions supply a frame of reference for reporting research results by standardizing the raw data: percentages by base 100 and proportions by base 1.00.
Percentages and Proportions • Example from IAEM-NEMA Survey, 2006.
Percentages and Proportions • Guidelines. • When working with a small number of cases, report the actual frequencies. • Always report the number of observations along with proportions and percentages. • Proportions and percentages can be used for any level of measurement.
Ratios and Rates • We determine ratios by dividing the frequency of one category by another.
Ratios and Rates • The ratio of people who agree that the FEMA response was inadequate to those who disagree is (27+15)/(24+7) =42/31 = 1.35 to 1. That is, for every 10 people who disagree, there are 13.5 who agree. • Rates are defined as the number of actual occurrences of some phenomenon divided by the number of possible occurrences per some unit of population.
Ratios and Rates • Example: In the IAEM-NEMA Survey (Local), I asked how many emergency managers would rank wildfires as the mostly likely source of catastrophic disaster in their jurisdiction. • The survey result indicated that eight out of 111 respondents believed this to be true. Expressed as a rate per 1,000 emergency managers, this would be (8/111)*1000, or 72.1 emergency managers per 1000 believe fires to be the most likely cause of catastrophic disasters in their jurisdiction.
Frequency Distributions • Tables that summarize the distribution of a variable by reporting the number of cases contained in each category of the variables. • Helpful and commonly used ways of organizing and working with data. • Almost always the first step in any statistical analysis. • The problem is that the raw data rarely reveals any consistent pattern. Data must be grouped to identify patterns.
Frequency Distributions • The categories of the frequency distribution must be exhaustive and mutually exclusive. (Each case must be counted in one and only one category). • Frequency distributions must have a descriptive title, clearly labeled categories, percentages, cumulative percentages, and a report of the total number of cases.
Frequency Distributions - Nominal • Table 1. Type of organization worked for ADM 612, Leadership, student
Frequency Distributions - Ordinal • Table 2. Percentage of ADM 612 students agreeing that they or their supervisors were articulate.
Frequency Distributions – Grouped Interval • Table 3. Years of emergency management experience – IAEM survey respondents.
Charts and Graphs • Researcher use charts and graphs to present their data in ways that are visually more dramatic than frequency distributions. • Pie charts and bar charts are appropriate for discrete data at any level of measurement. • Histograms and line charts or frequency polygons are used for interval and ratio variables.
PPA 501 – Analytical Methods in Administration Lecture 5b – Measures of Central Tendency
Introduction • The benefit of frequency distributions, graphs, and charts is their ability to summarize the overall shape of a distribution.
Introduction • To completely summarize a distribution, however, you need two additional pieces of information: some idea of the typical or average case in the distribution and some idea about how much variety or heterogeneity there is in the distribution. • The typical case involves measures of central tendency.
Introduction • The three most common measures of central tendency are the mode, median, and the mean. • The mode is the most common score. • The median is the middle score. • The mean is the typical score. • If the distribution has a single peak and is perfectly symmetrical, all three are the same.
Mode • The value that occurs most frequently. • Best used when dealing with nominal level variables, although it can be used for higher levels of measurement. • Limitations: some distributions have no mode or too many modes. • For ordinal and interval-ratio data, the mode may not be central to the distribution.
Median • Always represents the exact center of a distribution of scores. • The median is the score of the case where half of the cases are higher and half of the cases are lower. If the median family income is $30,000, half of the families make less than $30,000 and half make more.
Median • Before finding the median, the scores must be arranged in order from lowest to highest or highest to lowest. • When the number of cases is odd, the central case is the median [(N+1)/2 case].
Median • When the number of cases is even, the median is the arithmetic average of the two central cases [the mean of case N/2 and case (N/2+1)]. • The median can be calculated for ordinal and interval-ratio data.
Percentiles • The median is a subset of a larger group of positional measures called percentiles. • The median is the 50th percentile (50% of the scores are lower. • The 25th percentile would mean that 25% of the scores are lower (and 75% higher).
Percentiles • Deciles divide distribution into ten equal segments. The score at the first decile has 10% of the scores lower, the second decile had 20% of the scores lower, etc. • Quartiles divide the distribution into quarters. • The second quartile, the fifth decile and the median are all the same value.
Mean • The calculation of the mean is straightforward: add the scores and divide by the number of scores. • Mathematical formula:
Characteristics of the Mean • The mean is the point around which all of the scores (Xi) cancel out. • The sum of the squared differences from the mean is smaller than the difference for any other point.
Characteristics of the Mean • Every score in the distribution affects it. • Advantage: the mean utilizes all the available information. • Disadvantage: a few extreme cases can make the mean misleading. • Relative to the median, the mean is always pulled in the direction of extreme scores. • Positive skew: mean higher than the median. • Median income 1998: $46,737 • Mean income 1998: $59,589 • Jerry Seinfeld income 1998: $267,000,000 (Equivalent to median income of 5,713 families) • Negative skew: mean lower than the median.
Rules for the Selection of Measures of Central Tendency • Use the mode when: • Variables are measured at the nominal level. • You want a quick and easy measure for ordinal or interval measures. • You want to report the most common score. • Use the median when: • Variables are measured at the ordinal level. • Variables measured at the interval-ratio level have highly skewed distributions. • You want to report the central score.
Rules for the Selection of Measures of Central Tendency • Use the mean when: • Variables are measured at the interval-ratio level (except for highly skewed distributions). • You want to report the most typical score. The mean is the fulcrum that exactly balances all scores. • You anticipate additional statistical analyses.
PPA 501 – Analytical Methods in Administration Lecture 5c – Measures of Dispersion
Introduction • By themselves, measures of central tendency cannot summarize data completely. • For a full description of a distribution of scores, measures of central tendency must be paired with measures of dispersion. • Measures of dispersion assess the variability of the data. This is true even if the distributions being compared have the same measures of central tendency.
Introduction • Measures of dispersion discussed. • The range and interquartile range. • Standard deviation and variance.
Range and Interquartile Range • Range: the distance between the highest and lowest scores. • Only uses two scores. • Can be misleading if there are extreme values. • Interquartile range: Only examines the middle 50% of the distribution. Formally, it is the difference between the value at the 75% percentile minus the value at the 25th percentile.
Range and Interquartile Range • Problems: only based on two scores. Ignores remaining cases in the distribution.
Range and Interquartile Range: FEMA Disaster Payouts, 1953 to 2005
The Standard Deviation • The basic limitation of both the range and the IQR is their failure to use all the scores in the distribution • A good measure of dispersion should • Use all the scores in the distribution. • Describe the average or typical deviation of the scores. • Increase in value as the distribution of scores becomes more heterogeneous.
The Standard Deviation • One way to do this is to start with the distances between every point and some central value like the mean. • The distances between the scores are the mean (Xi-Mean X) are called deviation scores. • The greater the variability, the greater the deviation score.
The Standard Deviation • One course of action is to sum the deviations and divide by the number of cases, but the sum of the deviations is always equal to zero. • The next solution is to make all deviations positive. • Absolute value – average deviation. • Squared deviations – standard deviation.