1.06k likes | 1.32k Views
Looking at Data. Clinical Data Example . 1. Kline et al. (2002) The researchers analyzed data from 934 emergency room patients with suspected pulmonary embolism (PE). Only about 1 in 5 actually had PE. The researchers wanted to know what clinical factors predicted PE.
E N D
Clinical Data Example • 1. Kline et al. (2002) • The researchers analyzed data from 934 emergency room patients with suspected pulmonary embolism (PE). Only about 1 in 5 actually had PE. The researchers wanted to know what clinical factors predicted PE. • I will use four variables from their dataset today: • Pulmonary embolism (yes/no) • Age (years) • Shock index = heart rate/systolic BP • Shock index categories = take shock index and divide it into 10 groups (lowest to highest shock index)
Types of Variables: Overview Categorical Quantitative binary nominal ordinal discrete continuous 2 categories + more categories + order matters + numerical + uninterrupted
Categorical Variables • Also known as “qualitative.” • Categories. • treatment groups • exposure groups • disease status
Categorical Variables • Dichotomous (binary)– two levels • Dead/alive • Treatment/placebo • Disease/no disease • Exposed/Unexposed • Heads/Tails • Pulmonary Embolism (yes/no) • Male/female
Categorical Variables • Nominal variables– Named categories Order doesn’t matter! • The blood type of a patient (O, A, B, AB) • Marital status • Occupation
Categorical Variables • Ordinal variable– Ordered categories. Order matters! • Staging in breast cancer as I, II, III, or IV • Birth order—1st, 2nd, 3rd, etc. • Letter grades (A, B, C, D, F) • Ratings on a scale from 1-5 • Ratings on: always; usually; many times; once in a while; almost never; never • Age in categories (10-20, 20-30, etc.) • Shock index categories (Kline et al.)
Quantitative Variables • Numerical variables; may be arithmetically manipulated. • Counts • Time • Age • Height
Quantitative Variables • Discrete Numbers– a limited set of distinct values, such as whole numbers. • Number of new AIDS cases in CA in a year (counts) • Years of school completed • The number of children in the family (cannot have a half a child!) • The number of deaths in a defined time period (cannot have a partial death!) • Roll of a die
Quantitative Variables • Continuous Variables - Can take on any number within a defined range. • Time-to-event (survival time) • Age • Blood pressure • Serum insulin • Speed of a car • Income • Shock index (Kline et al.)
Looking at Data • ü How are the data distributed? • Where is the center? • What is the range? • What’s the shape of the distribution (e.g., Gaussian, binomial, exponential, skewed)? • ü Are there “outliers”? • ü Are there data points that don’t make sense?
The first rule of statistics: USE COMMON SENSE!90% of the information is contained in the graph.
Frequency Plots (univariate) Categorical variables • Bar Chart Continuous variables • Box Plot • Histogram
Bar Chart • Used for categorical variables to show frequency or proportion in each category. • Translate the data from frequency tables into a pictorial representation…
Bar Chart for SI categories 200.0 183.3 166.7 150.0 133.3 116.7 Number of Patients 100.0 83.3 66.7 50.0 33.3 16.7 0.0 1 2 3 4 5 6 7 8 9 10 Shock Index Category Much easier to extract information from a bar chart than from a table!
Box plot and histograms: for continuous variables • To show the distribution (shape, center, range, variation) of continuous variables.
Box Plot: Shock Index 2.0 1.3 Q3 + 1.5IQR = .8+1.5(.25)=1.175 maximum (1.7) minimum (or Q1-1.5IQR) Outliers median (.66) Shock Index Units “whisker” 75th percentile (0.8) interquartile range (IQR) = .8-.55 = .25 0.7 25th percentile (0.55) 0.0 SI
Histogram of SI 25.0 16.7 Percent 8.3 0.0 0.0 0.7 1.3 2.0 SI Bins of size 0.1 Note the “right skew”
Box Plot: Shock Index 2.0 1.3 Shock Index Units 0.7 0.0 SI Also shows the “right skew”
maximum minimum median 75th percentile interquartile range 25th percentile Box Plot: Age 100.0 More symmetric 66.7 Years 33.3 0.0 AGE Variables
14.0 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Histogram: Age Not skewed, but not bell-shaped either…
Some histograms from your class (n=24) Starting with politics…
Measures of central tendency • Mean • Median • Mode
Central Tendency • Mean– the average; the balancing point calculation: the sum of values divided by the sample size In math shorthand:
Mean: example Some data: Age of participants: 17 19 21 22 23 23 23 38
14.0 Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 Mean of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49 • 556.9546
14.0 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 The balancing point Mean of age in Kline’s data
Mean of Pulmonary Embolism? (Binary variable?) 80.56% (750) 19.44% (181)
Mean • The mean is affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4 • Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Central Tendency • Median– the exact middle value Calculation: • If there are an odd number of observations, find the middle value • If there are an even number of observations, find the middle two values and average them.
Median: example Some data: Age of participants: 17 19 21 22 23 23 23 38 Median = (22+23)/2 = 22.5
14.0 Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Median of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49
14.0 50% of mass 50% of mass 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 Median of age in Kline’s data
Does PE have a median? • Yes, if you line up the 0’s and 1’s, the middle number is 0.
Median • The median is not affected by extreme values (outliers). 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3 • SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Central Tendency • Mode– the value that occurs most frequently
Mode: example Some data: Age of participants: 17 19 21 22 23 23 23 38 Mode = 23 (occurs 3 times)
Mode of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49
Mode of PE? • 0 appears more than 1, so 0 is the mode.
Measures of Variation/Dispersion • Range • Percentiles/quartiles • Interquartile range • Standard deviation/Variance