1 / 11

10b. Univariate Analysis Part 2

CSCI N207 Data Analysis Using Spreadsheet. 10b. Univariate Analysis Part 2. Lingma Acheson linglu@iupui.edu. Department of Computer and Information Science, IUPUI. The Range. Difference between minimum and maximum values in a data set

Download Presentation

10b. Univariate Analysis Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI N207 Data Analysis Using Spreadsheet 10b. Univariate Analysis Part 2 Lingma Acheson linglu@iupui.edu Department of Computer and Information Science, IUPUI

  2. The Range • Difference between minimum and maximum values in a data set • Larger range usually (but not always) indicates a large spread or deviation in the values of the data set. (73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100) Range : 100 – 45 = 55 • Some extreme low or high value might throw off the range, e.g. (20, 76, 77, 80, 82, 82, 84, 88, 90, 93, 99, 100) Range: 100 – 20 = 80

  3. Variance • One measure of dispersion (deviation from the mean) of a data set. How far away is each data from the mean? • Variance – average distance to the mean • The larger the variance, the greater is the average deviation of each datum from the mean (more numbers are away from the mean). • E.g. 73, 67, 70, 67, 49, 60, 81, 71, 78, 62, 53, 87, 72, 65, 74, 50, 84, 45, 62,100 Variance = ((73-68.5)2+(67-68.5)2 +(70-68.5)2 + … +(100-68.5)2)/20 Variance = Average value of the data set Excel Functions: VARP() – variance for the whole population (data set is complete) VAR() – variance from a sample population (data set is a sample)

  4. Standard Deviation • Square root of the variance, as the variance gets the square of the distance. • The magnitude of the number is more in line with the values in the data set. • Can be thought of as the average deviation from the mean of a data set. Standard Deviation = Excel Functions: STDEVP() – use this when the data set is complete STDEV() – use this when the data set is a sample

  5. Frequency Tables • Use frequency table to observe the distribution • E.g. Consider the following data set: {45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100} • Need to determine how to group data into different bins.

  6. Histogram • A histogram is simply a column chart of the frequency table. Page 6

  7. Data Distribution

  8. Normal Distributions • The Bell curve • Symmetrical • Mean ≈ Median

  9. Skewed Distributions • Most of the times the distributions are skewed. • Positively skewed distribution: mean > median • Negatively skewed distribution: mean < median

  10. Data Distribution {45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100} Average (68.6) and Median (68) Mode (74) 55.14 82.06 -1SD +1SD

  11. Standard Deviation With a normal distribution: mean + 1*SD covers 68% of data mean + 2*SD covers 95% of data mean + 3*SD covers 99.7% of data Page 11

More Related