1 / 29

Stat 2411 Statistical Methods

Stat 2411 Statistical Methods. Chapter 2: Summarizing data. Summarizing Data. Data are collected to answer some questions. The analysis of the data includes thinking and statistical methods. Example: 8 lb test Fishing Line Question: Which type(s) of line are strongest?.

bly
Download Presentation

Stat 2411 Statistical Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 2411 Statistical Methods Chapter 2: Summarizing data

  2. Summarizing Data Data are collected to answer some questions. The analysis of the data includes thinking and statistical methods. Example: 8 lb test Fishing Line Question: Which type(s) of line are strongest?

  3. 2.1 Listing numerical data • Trilene XL 11.5 11.3 11.7 11.6 11.7 11.4 11.5 11.5 11.6 11.4 • Trilene XT11.6 11.8 11.7 11.7 11.5 116 11.6 11.8 11.4 11.7 • Stren11.1 11.1 11.2 11.0 11.1 11.3 11.2 10.9 11.0 11.1

  4. Plotting of the dataDot diagram When Analyzing data, always plot the data! A dot diagram: XLXTStren 11.8 * * 11.7 * * * * * 11.6 * * * * * 11.5 * * * * 11.4 * * * 11.3 * * 11.2 * * 11.1 * * * * 11.0 * * 10.9 *

  5. Plotting of the dataBar Chart A bar chart – Trilene XL 11.3 11.4 11.5 11.6 11.7

  6. 2.2 Stem and Leaf Diagram • Separate each observation into 2 parts • Stem: everything but the rightmost digit • Leaf: the final digit • Write the stems in a vertical column, then draw a vertical line next to them • Write each leaf in a row to the right of its stem

  7. 9 10 11 12 13 Systolic bp data 108 134 100 108 112 112 112 122 116 116 120 108 108 96 114 108 128 114 112 124 90 102 106 124 130 116 Stem Leaf plot 8 0 8 2 2 4

  8. Completed Stem Leaf plot 9 10 11 12 13 06 02688888 222244666 02448 04

  9. Stem and Leaf Diagram Exercise Cardiac output in middle aged runners. (Journal of Sports Medicine) 20.9 17.9 19.9 16.0 12.8 23.2 21.2 21.0 20.9 15.0 22.2 22.2 18.3 19.8 21.0 15.8 23.6 20.6 Tip: Stem—Ones Leaves—Tenths • 8 • 0 8 • 0 • 9 • 3 • 8 9 • 6 9 9 • 0 0 2 6 9 • 2 2

  10. 2.3 Frequency Distributions With larger data sets it helps to count numbers of values in different summary classes, usually 5-15 classes. E.g. Suspended solids in agricultural watersheds. (Water Resources Bulletin) Suspended Solids (ppm)Frequency 30-39 8 40-497 50-69 5 60-69 11 70-79 6 80-89 1 90-99 2

  11. Frequency Distributions Look at book for: • Class limits • Upper class limits • Lower class limits • Class marks • Class intervals

  12. 2.4 Graphical Representations • A histogram represents a frequency distribution with bars. 11 8 7 6 5 2 1 30-39 40-49 50-59 60-69 70-79 80-89 90-99

  13. Pie Chart (360 x %) Tree # % Degrees Oak 50 62.5% 225 Maple 20 25% 90 Ash 10 12.5% 45 80 360

  14. 2.5 Two Variable Data Scattergram Cma Chromosome Abnormal % 0.11 2 0.19 5 0.51 13 0.53 15 1.08 25 1.62 28 1.73 36 2.36 45 2.72 56 3.12 59 3.88 63 4.18 60

  15. Plotting Original Data • Always plot original data points. • This is the first thing to do when analyzing data • This is very important!

  16. Plotting Cancer Study Results • The following plots are from a study by Dr. Terry Rose-Hellekant in the Medical School Duluth • Treatments • Tamoxifen • Placebo • Some mice develop breast cancer

  17. The data are RT-PCR expressions corresponding to particular genes • In RT-PCR the values are roughly a log base 2 scale of the RNA content. • PUM1 Is a “housekeeping” gene • Account for RNA quality in the sample • For example time since death for a study of schizophrenia on deceased patients’ brains

  18. Two groups can be compared with back to back stem and leaf diagrams E.g. Stopping distances of bikes Treaded tire Smooth tire 34 1 8 9 35 5 5 36 6 4 37 5 38 1 39 1 2 0 40 Or dot diagrams | | | * | ** | | * |** Treaded 340 350 360 370 380 390 400 |*** | * | | * | | * | Smooth

  19. When there are associations between sets of data values, plot the data accordingly. E.g., Snowfall for duluth and White Bear Lake 1972-2000 A not very good way to plot the data WB Lake Duluth 130 * 120 * 110 ** ** 100 *** * 90 ***** 80 ****** ****** 70 ** *** 60 ** ********** 50 **** *** 40 *** *** 30 * *** 20

  20. Duluth White Bear

  21. A study of trace metals in South Indian River 5 3 1 6 2 4 T=top water zinc concentration (mg/L) B=bottom water zinc (mg/L) 1 2 3 4 5 6 Top 0.415 0.238 0.390 0.410 0.605 0.609 Bottom 0.430 0.266 0.567 0.531 0.707 0.716

  22. One of the first things to do when analyzing data is to PLOT the data • This is not a useful way to plot the data. There is not a clear distinction between bottom water and top water zinc • even though Bottom>Top at all 6 locations.

  23. A better way Top Bottom Connect points in the same pair.

  24. A better way Bottom=Top

  25. This following plot would imply a natural ordering of sites from 1 to 6. This would not be the best way to plot the data unless the sites 1-6 correspond to a natural ordering such as distance downstream of a factory.

More Related