620 likes | 632 Views
Learn about measures of central tendency, dispersion, bias, and skewness in statistical data analysis. Explore central values, modes, medians, and their importance in data representation.
E N D
Fundamental statistical characteristics I: Measures of central tendency Chapter 3
Fundamental statistical characteristics Group indexes • Central tendency • Variability (Dispersion) • Bias (Asymmetry) • Skewness (Kurtosis) Individual indexes • Position • Centiles (Ci) • Percentiles (Pi) • Quartiles (Qi) • Raw scores (Xi) • Differentials scores (xi) • Standard scores (Zi)
Which value represents the whole? Around which value are the majority of the data?
How is the data arranged with respect to the distribution center? How far or together are the data from each other?
How are the data arranged with respect to the rest? Are data piled at one end?
Which form is the distribution? Is it flattened or sharp?
To describe a data distribution we need at least two statistics: • 1. One that reflects the central tendency: value which represents the whole. Value around which the majority of the data is placed. • 2. Another that reflects the dispersion around this value, if the data are far apart or close together with respect to the central value.
Central tendency measure • Is a brief description of a mass of data, usually obtained from a sample. • Serves to describe, indirectly, the population from which the sample was extracted. • Representative sample; the average of their values will say a lot of the average we would get on the population they represent.
1. Which is the value most often repeated? Mo
Mode (Mo) • “The most often repeated value”. • “The value most frequently observed in a sample or population”. • “The variable value with the highest absolute frequency”. • It is symbolized byMo (Fechner y Pearson)
Type I distributions:Small data set a) Unimodal distribution: • Data: [8 – 8 – 11 – 11 – 15 – 15 – 15 – 15 – 15 – 17 – 17 – 17 – 19 - 19] Mo = 15
b) Amodal distribution: • Data: [8 – 8 – 8 – 11 – 11 – 11 – 15 – 15 – 15 – 17 – 17 – 17 – 19 – 19 –19] Withoutmode
c) Bimodal distribution: • Data: [8 – 9 – 9 – 10 – 10 – 10 – 10 – 11 – 11 – 13 – 13 – 13 – 13 – 15] Mo1 = 10 Mo2 = 13
d) Multimodal distribution: • Data: [8 – 8 – 9 – 9 – 9– 10 – 11 – 11 – 11– 12 – 12 – 13 – 13 – 13– 14 – 15 - 15] Mo2 = 11 Mo3 = 13 Mo1 = 9
Type II distributions:Big data set Frequency table a) Unimodal distribution: Mo = 14 MOST OFTEN REPEATED VALUE 14
b) Bimodal distribution:Mo1 = 2 y Mo2 = 6 2 MOST OFTEN REPEATED VALUES 6
Complete the table if you know that the modes are: -2, -1 y 5 and that f3 = f4
2. What is the average score in motivation?
Arithmetic mean • It is the central tendency index most commonly used • Definition: “It is the sum of all observed values divided by the total number of them”.
Type I distributions:Small data set • Example: The following are 10 numbers remembered by 10 children in a immediate memory task • 6 – 5 – 4 – 7 – 5 – 7 – 8 – 6 – 7 - 8
6 – 5 – 4 – 7 – 5 – 7 – 8 – 6 – 7 - 8 5 8 6 7 4 6.3
In the following serie, the “center of gravity” is:3 – 10 – 8 – 4 – 7 – 6 – 9 – 12 – 2 – 4
2 3 4 5 6 7 8 9 10 11 12 6.5
Type II distributions:Big data set Possibility 1: MEAN FREQUENCY TABLE
0 1 3 4 2 1.65
3. Which is the value exceeded by half of the subjects? Mdn
Median (Mdn) • Definitions: • It is the distribution point that divides it into 2 equal parts. • It is the value with the property that the number of observations smaller than itself is equal to the number of observations higher than itself. • It is the value that holds the central point of an ordered series of data. • 50% of the values are above and the other 50% is below the central value.
Graphic representation • It is defined as a point (a value), not like a data or particular measure. • A point whose value does not necessarily have to match any observed values.
Type I distributions:Small data set ODD data set: [7 – 11 – 6 – 5 – 7 – 12 – 9 – 8 – 10 – 6 – 9] 1º) Data is sorted from the lowest to the highest: [5 – 6 – 6 – 7 – 7 – 8 – 9 – 9 – 10 – 11 – 12] 2º) Central value is obtained:
[5 – 6 – 6 – 7 – 7 – 8 – 9 – 9 – 10 – 11 – 12] Mdn = 8
EVEN data set: [23 – 35 – 43 – 29 – 34 – 41 – 33 – 38 – 38 – 32] 1º) Data is sorted from the lowest to the highest: [23 – 29 – 32 – 33 – 34 – 35 – 38 – 38 – 41 – 43] 2º)
Type II distributions:Big data set Frequency tables Example: • n= 36 • To be even, there • are 2 central data • 36/2=18. Central point between 18 and 19 (18’5) • x18=x19=10; x18’5=10
Comparison between measures of central tendency • If there aren’t arguments against, we always prefer the mean: • Other statistics are based on the mean. • It's the best estimator of their parameter. • We prefer the median: • When the variable is ordinal. • When there exists very extreme data. • When there exist open intervals. • We prefer the mode: • When the variable is qualitative or nominal. • When the open interval matches the median.
Degree of agreement to consider "shouting" as a sign of aggression
Central tendency measures: used to indicate around which particular value a concrete data set is placed. • Position measures: used to provide information about the relative position in which a case is with respect to the data set which it belongs to. • Are used to interpret specific data.
Quantiles • Mdn: divides the distribution in 2 parts: • Quartiles (Qk): divide the distribution in 4 parts: Q1, Q2, Q3: i/k = 1/4 • Deciles (Dk): divide the distribution in 10 parts : D1, D2, ... , D9: i/k = 1/10 • Percentiles (Pk): divides the distribution in 100 parts: P1, P2, ... , P99: i/k = 1/100 • They divide the distribution in K parts with the same amount of data. i/k = ½
Calculating the value that corresponds to a particular quantile • 1. Translate the position measurement to an absolute position • 2. Find out the value for the data that occupies the absolute position of our interest • The question is: What value takes the position ...?
E.g. 7th decile corresponds to the position 20; • Which value is shown by the data that takes the absolute position 20?