390 likes | 567 Views
SESSION 11 & 12. Last Update 3 rd March 2011. Introduction to Statistics. Learning Objectives. (Cumulative Relative) Frequency tables revisited… Catalogue of graphical representations at your disposal Polygons and Ogives – Differentiation
E N D
SESSION 11 & 12 Last Update 3rd March 2011 Introduction to Statistics
Learning Objectives • (Cumulative Relative) Frequency tables revisited… • Catalogue of graphical representations at your disposal • Polygons and Ogives – Differentiation Use this presentation as a guide. The contents are all relevant to your examination unless specified to the contrary!
Raw Data • Determine number of class intervals Sample size n = 25 Sturges’ formula: • Find maximum and minimum obs:
Raw Data • Calculate class width • Determine the next lower integer value from the minimum: This is the starting value for the first class interval
Class Intervals • Start with the lowest (integer) value = 7. Add the class width to calculate the upper bound. The combination of upper and lower bound give the class interval (Don’t forget the inequality to avoid overlaps). Continue in the same fashion until all required class intervals (here 6) are defined.
Midpoints • Calculate the midpoints of the class intervals:
Tally • Sort all (return) observations into the class intervals (or bins). You may use a designated tally column to do so manually or use the FREQUENCY function in Excel (the results are integer values)
Observed Frequencies • Convert Tally column to observed Frequencies
Cumulative Frequencies • Calculate cumulative frequencies as the running subtotal of the frequency column
Relative Frequencies • Calculate the relative frequencies:
Cumulative Frequencies • Calculate cumulative relative frequencies as the running subtotal of the relative frequency column
Histogram – Data required • Select the class intervals as the horizontal axis (x-axis) and the observed frequencies as the vertical (y-axis). The height of the bars in the histogram should represent the observed frequencies for each class interval.
Frequency Polygon – Add Intervals • First, add two additional class intervals. These should have the same width as the other class intervals. Thus, they can be created by subtracting the class width form the lower bound of the first interval and adding the class width to the upper bound of the last class interval. Midpoints are calculated as before [(-7-12)/2 = -9.5 and (28 + 23)/2 = 25.5]. The observed frequencies are zero for both new intervals as all observations fall within the old intervals.
Frequency Polygon – Data required • Select the midpoints as the horizontal axis (x-axis) and the observed frequencies as the vertical (y-axis). Instead of bars use markers (x/y-coordinates). Draw a line through all markers.
Frequency Polygon – continued • Occasionally, the data has a predefined minimum and maximum. Consider the following frequency table of class marks in statistics: Using the previous approach leads to Midpoints (or results) and class intervals that are actually impossible. The logical maximum for class marks is = 100, the logical minimum is – 0!
Frequency Polygon – Data required • The solution is to include the maximum and minimum as two additional points of your frequency polygon (xy-Coordinates: 100/0 and 0/0)
Cum. Freq. Graph – Data required • Select the class intervals as the horizontal axis (x-axis) and the cumulative frequencies as the vertical (y-axis).
less than Ogive – Add interval • For the less than Ogive Graph, an additional data point is required. We can add an additional class interval “ < -7 “. The observed frequency is zero for the new interval as all observations fall within the old intervals.
less than Ogive – Data required • Select the upper bounds as the horizontal axis (x-axis) and the cumulative frequencies as the vertical (y-axis).
Standardising Data • It may be desirable to express data in terms of relative frequencies. These were calculated before and are contained in the table below (both discrete as well as cumulative). All Graphs introduced so far can be based on relative frequency rather than observed frequency.
Relative Frequency Polygon – Data required • Select the midpoints as the horizontal axis (x-axis) and the relative frequencies as the vertical (y-axis). Instead of bars use markers (x/y-coordinates). Draw a line through all markers. The relative frequencies for the additional class intervals are = 0 (since the observed frequencies = 0). All that changes in comparison to the observed frequency polygon is the y-axis. The shape of the function remains the same.
OR Pie Chart – Data required • Select the class intervals as the categories for the pie slices and the relative frequencies as their corresponding values. The size of the slices should be representative of the proportion. Note that the additional categories have relative frequencies = 0.00. Thus, they may be omitted without altering the pie chart itself. Due to the difficulties associated with free-hand drawing pie charts not relevant to your examination!
Cumulative Relative Frequency Graph – Data required • Select the class intervals as the horizontal axis (x-axis) and the cumulative relative frequencies as the vertical (y-axis).
less than Ogive (relative Freq.) – Data required • Select the upper bounds as the horizontal axis (x-axis) and the cumulative relative frequencies as the vertical (y-axis). The associated cumulative relative frequency is = 0.00 (since no observations fall below -7).
less than Ogive (relative Freq.) P(X < 0%) i.e. negative Performance. Here ≈ 0.31 or 31%
Why use Relative Frequencies? In order to compare two datasets (i.e. Investment A and Investment B), the frequencies need to be standardised to compare the frequency distributions. This is necessary since the sample sizes, class intervals and class width may be different across samples.
Graphical Representations Observed Frequencies Relative Frequencies discreet cumulative cumulative discreet Polygon Ogive Polygon Pie Chart Histogram