1 / 53

Numerical Methods for describing quantitative data

Numerical Methods for describing quantitative data. Scales of M easurement. from weakest to strongest - nominal scale - ordinal scale - interval scale - ratio scale. 1. Nominal S cale. N umbers are labels of groups or classes Simple codes assigned to objects as labels

zuri
Download Presentation

Numerical Methods for describing quantitative data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NumericalMethodsfordescribingquantitativedata

  2. Scales of Measurement from weakest to strongest - nominal scale - ordinal scale - interval scale - ratio scale

  3. 1. Nominal Scale • Numbers are labels of groups or classes • Simple codes assigned to objects as labels • For qualitative data, e.g. professional classification, geographic classification • e.g.- blonde: 1, brown: 2, red: 3, black: 4 (a person with red hair does not possess more "hairness" than a person with blonde hair) - female: 1, male: 2

  4. 2. Ordinal Scale • Data elements may be ordered according to their relative size or quality, the numbers assigned to objects or events represent the rank order (1st, 2nd, 3rd etc.) • e.g. top lists of companies

  5. 3. Interval Scale • Meaning of distances between any two observations • The "zero point" is arbitrary • Negative values can be used • Ratios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly • e.g. temperature with the Celsius scale

  6. 4. Ratio Scale • Strongest scale of measurement • Distances between observations and also the ratios of distances have a meaning • Contains a meaningful zero • e.g. mass, length, time a salary of $50,000 is twice as large as a salary of $25,000

  7. Scales of Measurement (Book p168 E20) Salary (HUF) The ranking of horses in a race Gender Grade Temperature Rate (%) Numberplate (cars) Nationality Calendar date Ratio Ordinal Nominal Ordinal Interval Ratio Nominal Nominal Interval

  8. Statistical Rows & ColumnsClassesFrequencies

  9. Data Set • Mass of numericaldata – discretevalues E.g:11.8, 3.6, 16.6, 13.5, 3.6, 8.3, 8.9, 9.1, 7.7, 2.3, 12.1, 6.1, 10.2, 8.0, 11.4, 6.8, 9.6, 19.5, 15.3, 12.3, 8.5, 15.9, 18.7, 11.7, 6.2, 11.2, 10.4, 7.2, 5.5, 14.5 • Frequencydistribution: method of organising & presentingdata • Scorevalue • Interval of scorevalues: classes Statisticaltable recordsthenumber of observationsineachclass

  10. Frequencytablewithscorevalues ClassIntervals Approximateclasswidth:

  11. Types of QuantitativeRows E.g: Waterconsumptionin X village RelativeFrequency Frequency CumulativeFrequency CumulativeRelativeFrequency

  12. Frequency (f): The number of times a value of the data occurs. • CumulativeFrequency (f’): The sum of the frequencies for all values that are less than, or equal to the given value. 5+17 5+17+15 5+17+15+8 5+17+15+8+5

  13. Relative Frequency (g): The ratio of the number of times a value of the data occurs in the set of all outcomes to the number of all outcomes. • Cumulative Relative Frequency (g’): The term applies to an ordered set of observations from smallest to largest. The Cumulative Relative Frequency is the sum of the relative frequencies for all values that are less than or equal to the given value.

  14. Sum of Values (S) xi: discretevalueormiddle of theclass

  15. Relative Sum of Values (Z)

  16. QuantitativeRows I Book p187 E67

  17. QuantitativeRowsII Book p188 E68

  18. QuantitativeRowsIII The 200 customersspendedtogether2.400.000 HUF.

  19. DESCRIPTIVE STATISTICS • Definition:Descriptive statistics is concerned only with collecting and describing data • Methods: -statistical tables and graphs - descriptive measures Descriptive measure – a single number that provides information about a set of data

  20. Definition of a Population I. Central Tendency - mean -mode - median II. Percentiles, Quartiles III. Dispersion IV. Shape calculation location

  21. I.1. Means • Arithmetic mean (average) • Geometric mean – the ratio of any two consecutive numbers is constant • e.g. compound interest rate • Harmonic mean – units of measurement differ between the numerator and denominator • e.g. miles per hour • Quadratic mean • e.g. theform of standard deviation

  22. Arithmetic Mean • Typicallyreferredtoasmean. • The most commonmeasure of centraltendency. • It is theonlycommonmeasureinwhichallthevalues play an equalrole. • Symbol: , calledX-bar FrequencyDistributionExpressions: Raw Data Expressions:

  23. Properties of Mean • The sum of the differences from the mean is 0. is minimal, if a=

  24. Properties of Mean 2. • Ifyou add a constant ‘a’ toeveryxi, themeanwill be a+ • Ifyoumultiplyeveryxiby a constant ‘b’, themeanwill be b* • x1, x2, ..., xn → • y1, y2, ..., yn→ •  x1 + y1; ...; xn + yn→

  25. Geometric Mean The rate of changeof a variable over time. The nthroot of theproductof n values. Raw Data Expressions: FrequencyDistributionExpressions:

  26. GDP in Hungary Source: HCSO Averagegrowthrate:

  27. Harmonic Mean The harmonicmean of a set of n numbers is found by adding up the reciprocals of the numbers, and then dividing n by this sum. Raw Data Expressions: = FrequencyDistributionExpressions: = , where

  28. Relation between the Partitional Ratio and Dynamic Ratio

  29. Quadratic Mean

  30. I.2. Median • Statistic which has an equal number of variates above and below it • Raw Data Expressions: rankedvalue • Independent fromextremevalues • Justfromdatainorder  • The „middle term” • me= lower boundary of the median class • n = total number of variates in the frequency distribution • f’me-1 = cumulative frequency of the class below the median class • fme = frequency of the median class • h = class interval

  31. rankedvalue

  32. I.3. Mode • The value that occurs most frequently • Typical value • mo = the lower class boundary of the mode’s class • k1 = the difference between the frequencies of the mode’s class and the previous class • k2 = the difference between the frequencies of the mode’s class and the next class • h = class interval

  33. II. Percentiles and Quartiles • The Pth percentile of a group of members is that value below which lie P% (P percent) of the numbers in the group. • Q1(lower quartile): The first quartile is the 25th percentile. It is that point below which lie ¼ of the data. • Q2(middle quartile): The median is the data below which lie half the data. It is the 50th percentile. • Q3(upper quartile): The third quartile is the 75th percentile point. It is that below which lie 75 percent of the data.

  34. rankedvalue rankedvalue

  35. III. Measures of Dispersion • Range • Interquartile Range • Population and Sample Standard Deviation • Population and Sample Variance • Coefficient of Variation

  36. III.1. Range • The range of a set of observations is the difference between the largest observation and the smallest observation. III.2. IQR • Interquartile range: difference between the first and third quartiles.

  37. III.3. Standard Deviation • The standard deviationis a measure of dispersionaroundthemean. • A low standard deviation indicates that the data points tend to be very close to themean, whereas high standard deviation indicates that the data are spread out over a large range of values. • In a normaldistribution, 68% of casesfallwithinone standard deviation of themean and 95% of casesfallwithin 2 standard deviations.

  38. Properties of Standard Deviation • 0, if x=constant

  39. Properties of Standard Deviation • Ifyou add a constant ‘a’ toeveryxi, the standard deviationwill be thesame.

  40. Properties of Standard Deviation • Ifyoumultiplyeveryxiby a constant ‘b’, the standard deviationwill be b*

  41. III.4. Variance • Variance of a set of observations: the average squared deviation of the data points from their mean. • Population variance: • Sample variance: III.5. Coefficient of Variation • The measure of dispersionaroundthemeanin %.

  42. IV. Measures of Shape • Skewness is a measure of the degree of asymmetry of a frequency distribution. • Kurtosis is a measure of the flatness (versus peakedness) of a frequency distribution.

  43. IV.1.Kurtosis The measure of theextenttowhichobservationsclusteraroundthecentralpoint. Positive – cluster more and havelongertails For a normaldistribution, thevalue of thekurtosisstatistic is zero. Negative – cluster less and haveshortertails

  44. Symmetry IV.2. Skewness Skewed to the left (long right tail) Skewed to the right A<0 A>0

  45. Box Plot • The box plot is a set of five summary measures of the distributions of the data: - the median of the data - the lower quartile - the upper quartile - the smallest observation - the largest observation + asymetry

  46. Box&Whiskers Source: Aczel [1996]

More Related