300 likes | 320 Views
Stat 2411 Statistical Methods. Chapter 2: Summarizing data. Summarizing Data. Data are collected to answer some questions. The analysis of the data includes thinking and statistical methods. Example: 8 lb test Fishing Line Question: Which type(s) of line are strongest?.
E N D
Stat 2411 Statistical Methods Chapter 2: Summarizing data
Summarizing Data Data are collected to answer some questions. The analysis of the data includes thinking and statistical methods. Example: 8 lb test Fishing Line Question: Which type(s) of line are strongest?
2.1 Listing numerical data • Trilene XL 11.5 11.3 11.7 11.6 11.7 11.4 11.5 11.5 11.6 11.4 • Trilene XT11.6 11.8 11.7 11.7 11.5 116 11.6 11.8 11.4 11.7 • Stren11.1 11.1 11.2 11.0 11.1 11.3 11.2 10.9 11.0 11.1
Plotting of the dataDot diagram When Analyzing data, always plot the data! A dot diagram: XLXTStren 11.8 * * 11.7 * * * * * 11.6 * * * * * 11.5 * * * * 11.4 * * * 11.3 * * 11.2 * * 11.1 * * * * 11.0 * * 10.9 *
Plotting of the dataBar Chart A bar chart – Trilene XL 11.3 11.4 11.5 11.6 11.7
2.2 Stem and Leaf Diagram • Separate each observation into 2 parts • Stem: everything but the rightmost digit • Leaf: the final digit • Write the stems in a vertical column, then draw a vertical line next to them • Write each leaf in a row to the right of its stem
9 10 11 12 13 Systolic bp data 108 134 100 108 112 112 112 122 116 116 120 108 108 96 114 108 128 114 112 124 90 102 106 124 130 116 Stem Leaf plot 8 0 8 2 2 4
Completed Stem Leaf plot 9 10 11 12 13 06 02688888 222244666 02448 04
Stem and Leaf Diagram Exercise Cardiac output in middle aged runners. (Journal of Sports Medicine) 20.9 17.9 19.9 16.0 12.8 23.2 21.2 21.0 20.9 15.0 22.2 22.2 18.3 19.8 21.0 15.8 23.6 20.6 Tip: Stem—Ones Leaves—Tenths • 8 • 0 8 • 0 • 9 • 3 • 8 9 • 6 9 9 • 0 0 2 6 9 • 2 2
2.3 Frequency Distributions With larger data sets it helps to count numbers of values in different summary classes, usually 5-15 classes. E.g. Suspended solids in agricultural watersheds. (Water Resources Bulletin) Suspended Solids (ppm)Frequency 30-39 8 40-497 50-69 5 60-69 11 70-79 6 80-89 1 90-99 2
Frequency Distributions Look at book for: • Class limits • Upper class limits • Lower class limits • Class marks • Class intervals
2.4 Graphical Representations • A histogram represents a frequency distribution with bars. 11 8 7 6 5 2 1 30-39 40-49 50-59 60-69 70-79 80-89 90-99
Pie Chart (360 x %) Tree # % Degrees Oak 50 62.5% 225 Maple 20 25% 90 Ash 10 12.5% 45 80 360
2.5 Two Variable Data Scattergram Cma Chromosome Abnormal % 0.11 2 0.19 5 0.51 13 0.53 15 1.08 25 1.62 28 1.73 36 2.36 45 2.72 56 3.12 59 3.88 63 4.18 60
Plotting Original Data • Always plot original data points. • This is the first thing to do when analyzing data • This is very important!
Plotting Cancer Study Results • The following plots are from a study by Dr. Terry Rose-Hellekant in the Medical School Duluth • Treatments • Tamoxifen • Placebo • Some mice develop breast cancer
The data are RT-PCR expressions corresponding to particular genes • In RT-PCR the values are roughly a log base 2 scale of the RNA content. • PUM1 Is a “housekeeping” gene • Account for RNA quality in the sample • For example time since death for a study of schizophrenia on deceased patients’ brains
Two groups can be compared with back to back stem and leaf diagrams E.g. Stopping distances of bikes Treaded tire Smooth tire 34 1 8 9 35 5 5 36 6 4 37 5 38 1 39 1 2 0 40 Or dot diagrams | | | * | ** | | * |** Treaded 340 350 360 370 380 390 400 |*** | * | | * | | * | Smooth
When there are associations between sets of data values, plot the data accordingly. E.g., Snowfall for duluth and White Bear Lake 1972-2000 A not very good way to plot the data WB Lake Duluth 130 * 120 * 110 ** ** 100 *** * 90 ***** 80 ****** ****** 70 ** *** 60 ** ********** 50 **** *** 40 *** *** 30 * *** 20
Duluth White Bear
A study of trace metals in South Indian River 5 3 1 6 2 4 T=top water zinc concentration (mg/L) B=bottom water zinc (mg/L) 1 2 3 4 5 6 Top 0.415 0.238 0.390 0.410 0.605 0.609 Bottom 0.430 0.266 0.567 0.531 0.707 0.716
One of the first things to do when analyzing data is to PLOT the data • This is not a useful way to plot the data. There is not a clear distinction between bottom water and top water zinc • even though Bottom>Top at all 6 locations.
A better way Top Bottom Connect points in the same pair.
A better way Bottom=Top
This following plot would imply a natural ordering of sites from 1 to 6. This would not be the best way to plot the data unless the sites 1-6 correspond to a natural ordering such as distance downstream of a factory.