Understanding Median and Mode in Data Analysis

The median of a given data set is the value which divides the ordered set of values into two equal parts such that the number of values equal to or greater than the median is equal to the number of values less than or equal the median. • In a sample data set having n values , the median is (n+1)/2 th value, in the ordered sample. • CAUTION: Median is not (n+1)/2!!! • Note that when n is odd the median is the middlemost value in the ordered data set

If n =11 then median is the sixth observation • If n=12 then the median is the value at 6.5th position • Ex: Consider the “age at first child birth “ data 22,24,24,23,21,25,26,23,24,25,26,26,23,24,24 • Arrange in increasing order: • 21,22,23,23,23,24,24,24,24,24,25,25,26,26,26

n=15.Hence position of the median is (15+1)/2=8 • median=8th observation= 24 • The median age at first childbirth is 24 years. • ex: 34,45,67,87,39,93,56,48 (ages of persons ) • N=8 & median is 4.5th value • Ordered sample : 34,39,45,48,56,67,87,93 • Median is average of 4th and 5th values • Median=(48+56)/2=104/2=52 years • What is the mean age in the above case?

Compute the mean age for the sample. • Mean age is 58.63 years • Mean is large here due to the influence of large values • Hence median is better in such cases • Computation of median in grouped data: • For discrete frequency table use the definition and identify the value corresponding to the cum.freq (n+1)/2

Ex • Score level (xi) (fi) cum.freq. • 0 3 3 • 1 12 15 • 2 23 38 • 3 46 84 • 4 18 102 • 5 6 108

n=108 & (n+1)/2=54.5= position of median Hence median is the 54.5th value in the ordered sample. Observe that the 54th and 55th observations are having the value 3. Hence median score is 3.

94 86 180 252 143 168 77 270 134 86 99 285 200 99 118 117 101 149 137 84 139 145 108 126 108 69 264 Activity : Find the mean and median of the following data( fbs(mg/dl) )

Mean =144.16 • 1st quartile= 99.00 • Median= 134.00 • 3rd quartile= 168.00 • Mode = 86

First compute the true class limits(if required ) and the cumulative frequency of each class • Identify the first class interval for which the cumulative frequeny is greater than or equal to (n/2). This class is the median class. • Use the following formula to compute the median .

Median • Here the notations have the following meaning • L = lower limit of the median class • i = width of the median class • f= frequency of the median class • n= total no. of observations ( sample size) • c= cum. freq. of class preceding the median class

Example

Here n= 60, so n/2 = 30 • Hence median class is 20 – 25, L=20 & i=5 • f=12 • c=27 • So median = 20 + (5/12)(30-27) • median = 21.25 50% of the observations fall below 21.25 and 50% of the observations are greater than or equal the median value.

HW: Compute mean for the above data and compare with median. What are your observations? • Properties of median: • It is unique. It is computed from the ordered data. It is a positional average identifying the central value. It is not influenced by extreme values – resistance to outliers. If the extreme class intervals are open ended, then also we can compute median.

When to use median instead of mean? • Ordinal scale data – (rank score, satisfaction score) • Highly skewed data – ( extreme values/income distribution ) • Median is a good measure of central tendency when we do simple practical reporting • Most of the software report the commonly used descriptive measures together

Mode is the most commonly occurring value in a data set. • For ungrouped data mode can be obtained by finding the value which appears /repeats max. no. of times. • Ex: Consider the data set: • 3,4,3,5,5,6,4,3,5,5,7,8,8,5,8,6,9,5 • Here 5 repeats 6 times. Hence mode=5

Note: Mode is not unique. In a dataset more than one value might repeat max. time. • Ex: Consider the data set: • 3,4,3,5,5,6,4,3,5,5,3,8,8,5,3,6,3,5 • Here 3 and 5 repeat 6 times. Hence both 3 and 5 are modal values. • Question: If in a data set all values repeat equal no. of times, what is the mode?

In a grouped frequency table,mode is identified as the value with the largest frequency. • Score level (xi) (fi) • 0 3 • 1 12 • 2 23 • 3 46 • 4 18 • 5 6 • Mode=3

Mode is the value that has the highest frequency in a data set. • For grouped data, modal class is the class with the highest frequency. • To find mode for grouped data, use the following formula:

Here L = lower limit/boundary of modal class • i = width of modal class • ∆1= difference between the frequency of • modal class and the class before the modal class • ∆2 = difference between the frequency of modal class and the frequency of the class after the modal class

The above formula can be written in the following form: Mode= L= lower limit/boundary of modal class i= width of modal class f0 = freq. of class preceding the modal class f1 = freq. of modal class f2 = freq. of class next to modal class Note: compare with the previous formula ∆1 = f1-f0, ∆2 = f1- f2

When to use mode? • Mode is not a good measure compared to mean or median • It can be used to find out the ‘most common’ or “ most commonly occurring” value • Can be used with nominal variables • Not amenable for statistical inference

Summary & comments Descriptive statistics are summary figures or measures to summarise a data set using a single figure/ number . The measures of central tendency provide a summary statistic giving an idea about the concentration of the data around a central value. Mean,median and mode are the three important measures of central tendency. Use mean for data in interval or ratio scale and median for ordinal scale. Mode is valid in all cases but it is an ill-defined measure. While mean is used in parametric inference,median is an important measure used in non-parametric inference.

Median can be computed graphically from the cumulative frequency curves/ogive curve

Some remarks: • While computing mean,median and mode for continuous data from grouped frequency tables, we are not using the original sample values.(this information is lost when we group the data). • To compute mean we assume that the values in each class are concentrated at the midpoint. Hence we are getting an estimate of the mean. • To compute median and mode, we use interpolation technique to estimate the measures . • Hence , in grouped data we estimate the measures of central tendency.

Partition values are measures computed from ordered data. These are defined with reference to the position of the measure. The important measures are: • Quartiles: These are measures which divide the ordered data into four equal parts. The first quartile, is the value which divides the ordered dataset in such a way that 25% of the observations fall below it and the remaining observations lie above it. In an ungrouped data, is the value at the position • The second quartile is the median. • The third quartile is the value which divides the ordered dataset such that 75% of the observations fall below it and the remaining observations lie above it. It is denoted by . and located at the position

Score level (xi) (fi) cum.freq. • 0 3 3 • 1 12 15 • 2 23 38 • 3 46 84 • 4 18 102 • 5 6 108

Position of is (n+1)/4 = 109/4 =27.25 • Hence the first quartile value is 2. • Position of third quartile = 3(n+1)/4 = 81.75 • Hence the third quartile value is 3. • What is the median value ? • Workout more examples

For grouped continuous data

Here n/4 = 169/4 = 42.25 • First quartile class is 20 – 29 • L = 19.5 , i = 10 , f = 66, c = 4 • Q1 = 19.5 + (10/66) ( 42.25 – 4 ) =25.30 • 3n/4 = 126.75. Q3 class = 40 – 49 • L=39.5, i=10,f=36, c=117 • Q3 = 39.5 +(10/36) ( 126.75 – 117)=42.21

Self study • Prepare a study report on percentiles– definition, computation, application and interpretation.

Understanding Median and Mode in Data Analysis

Understanding Median and Mode in Data Analysis

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7