1 / 21

Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

Raoul LePage Professor STATISTICS AND PROBABILITY www.stt.msu.edu/~lepage click on STT315_F06. Week 11-13-06. WEEK PLAN: Probability Histograms (Sec. 1-5) Data Smoothing - not in your text - Bonus Computer Project.

humbert
Download Presentation

Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Raoul LePage Professor STATISTICS AND PROBABILITY www.stt.msu.edu/~lepage click on STT315_F06 Week 11-13-06

  2. WEEK PLAN: Probability Histograms (Sec. 1-5) Data Smoothing - not in your text - Bonus Computer Project

  3. Plot average heights of normal densities placed at each data value, e.g. {10, 14}. It is like smearing each sample value, as if it were a drop of paint, according to the thickness of a normal density. Each normal integrates to one, as does their average the “Sample Density Estimate” shown in dark. Smoothing data , so you can see it. normal densities at data {10, 14} density average

  4. The mean of a sample density estimate is equal to the sample mean of its data.

  5. Making the densities narrower isolates different parts of the data and reveals more detail. NARROWER TENTS = MORE DETAIL

  6. Closer view of the density by itself, with narrow normal curves. density

  7. Histograms lump data into categories (the black boxes), not as good for continuous data. DENSITY OR HISTOGRAM ? density histogram

  8. Form of each rectangle comprising a Probability Histogram. Example: A sample of n = 40 finds three data values which are at least 30 but less than 35 (interval [30, 35)). height = area = w height = 3 / 40 = 3/(40 5) Histograms may radically change their shape in response to minor changes of bin locations or widths. ** * 30 35 bin-width w = 35 - 30 = 5

  9. Plot of average heights of 5 tents placed at data {12, 21, 42, 8, 9}. DENSITY FOR { 12, 21, 42, 8, 9 } normal density smear data density

  10. Narrower tents operate at higher resolution but they may bring out features that are illusory. IS DETAIL ILLUSORY ? which do we trust ? kinkier smoother

  11. Population of N = 500 compared with two samples of n = 30 each. BEWARE OVER-FINE RESOLUTION POP mean = 32.02 population of N = 500 with 2 samples of n = 30

  12. Population of N = 500 compared with two samples of n = 30 each. BEWARE OVER-FINE RESOLUTION sample means are close SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 densities not good at fine resolution population of N = 500 with 2 samples of n = 30

  13. The same two samples of n = 30 each from the population of 500. WE DO BETTER AT COARSE RESOLUTION SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 how about coarse resolution ? population of N = 500 with 2 samples of n = 30

  14. The same two samples of n = 30 each from the population of 500. WE DO BETTER AT COARSE RESOLUTION SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 good at coarse resolution population of N = 500 with 2 samples of n = 30

  15. The same two samples of n = 30 each from the population of 500. HOW ABOUT MEDIUM RESOLUTION ? SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 medium resolution ? population of N = 500 with 2 samples of n = 30

  16. The same two samples of n = 30 each from the population of 500. HOW ABOUT MEDIUM RESOLUTION ? SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 not good at medium resolution population of N = 500 with 2 samples of n = 30

  17. A sample of only n = 600 from a population of N = 500 million.(medium resolution) SAMPLING ONLY 600 FROM 500 MILLION ? large sample of n = 600 ? POP mean = 32.02 medium resolution ? population of N = 500,000 with a sample of n = 600

  18. A sample of only n = 600 from a population of N = 500 million.(MEDIUM resolution) SAMPLING ONLY 600 FROM 500 MILLION ? mean very close sample of n = 600 sample mean = 32.84 POP mean = 32.02 densities are close population of N = 500,000 with a sample of n = 600

  19. A sample of only n = 600 from a population of N = 500 million.(FINE resolution) SAMPLING ONLY 600 FROM 500 MILLION ? sample of n = 600 sample mean = 32.84 POP mean = 32.02 FINE resolution densities very close population of N = 500,000 with a sample of n = 600

  20. TALKING POINTS A density is controlled by the sd, referred to as bandwidth, of the normal densities used to make it. 1a. You have to be content with the information revealed by the population density at your chosen bandwidth. 1b. Small samples zero-in on coarse densities, i.e. made at large bandwidth, fairly well . 1c. Samples in hundreds may perform remarkably well, even at fine resolution, I.e. small bandwidth. 2. Histograms are notorious for being unstable for some data. Yet, they remain popular. Learn to make them by hand. 3. Learn to make a density for 2 to 4 data values by hand.

More Related