1 / 13

Descriptive Statistics

This text provides an overview of descriptive statistics and various graphical displays, including frequency distributions, histograms, stem and leaf plots, and more. It also covers comparing groups, sample and population distributions, measures of central tendency and variation, and the concept of dependent and independent variables. Suitable for beginners in data analysis.

ndon
Download Presentation

Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descriptive Statistics • Tabular and Graphical Displays • Frequency Distribution - List of intervals of values for a variable, and the number of occurrences per interval • Relative Frequency - Proportion (often reported as a percentage) of observations falling in the interval • Histogram/Bar Chart - Graphical representation of a Relative Frequency distribution • Stem and Leaf Plot - Horizontal tabular display of data, based on 2 digits (stem/leaf)

  2. Comparing Groups • Side-by-side bar charts • 3 dimensional histograms • Back-to-back stem and leaf plots • Goal: Compare 2 (or more) groups wrt variable(s) being measured • Do measurements tend to differ among groups?

  3. Sample & Population Distributions • Distributions of Samples and Populations- As samples get larger, the sample distribution gets smoother and looks more like the population distribution • U-shaped - Measurements tend to be large or small, fewer in middle range of values • Bell-shaped - Measurements tend to cluster around the middle with few extremes (symmetric) • Skewed Right - Few extreme large values • Skewed Left - Few extreme small values

  4. Measures of Central Tendency • Mean - Sum of all measurements divided by the number of observations (even distribution of outcomes among cases). Can be highly influenced by extreme values. • Notation: Sample Measurements labeled Y1,...,Yn

  5. Median, Percentiles, Mode • Median - Middle measurement after data have been ordered from smallest to largest. Appropriate for interval and ordinal scales • Pth percentile - Value where P% of measurements fall below and (100-P)% lie above. Lower quartile(25th), Median(50th), Upper quartile(75th) often reported • Mode - Most frequently occurring outcome. Typically reported for ordinal and nominal data.

  6. Measures of Variation • Measures of how similar or different individual’s measurements are • Range -- Largest-Smallest observation • Deviation -- Difference between ith individual’s outcome and the sample mean: • Variance of n observations Y1,...,Yn is the “average” squared deviation:

  7. Measures of Variation • Standard Deviation - Positive square root of the variance (measure in original units): • Properties of the standard deviation: • s 0, and only equals 0 if all observations are equal • s increases with the amount of variation around the mean • Division by n-1 (not n) is due to technical reasons (later) • s depends on the units of the data (e.g. $1000s vs $)

  8. Empirical Rule • If the histogram of the data is approximately bell-shaped, then: • Approximately 68% of measurements lie within 1 standard deviation of the mean. • Approximately 95% of measurements lie within 2 standard deviations of the mean. • Virtually all of the measurements lie within 3 standard deviations of the mean.

  9. Other Measures and Plots • Interquartile Range (IQR)-- 75th%ile - 25th%ile (measures the spread in the middle 50% of data) • Box Plots - Display a box containing middle 50% of measurements with line at median and lines extending from box. Breaks data into four quartiles • Outliers - Observations falling more than 1.5IQR above (below) upper (lower) quartile

  10. Dependent and Independent Variables • Dependent variables are outcomes of interest to investigators. Also referred to as Responses or Endpoints • Independent variables are Factors that are often hypothesized to effect the outcomes (levels of dependent variables). Also referred to as Predictor or Explanatory Variables • Research ??? Does I.V.  D.V.

  11. Example - Clinical Trials of Cialis • Clinical trials conducted worldwide to study efficacy and safety of Cialis (Tadalafil) for ED • Patients randomized to Placebo, 10mg, and 20mg • Co-Primary outcomes: • Change from baseline in erectile dysfunction domain if the International Index of Erectile Dysfunction (Numeric) • Response to: “Were you able to insert your P… into your partner’s V…?” (Nominal: Yes/No) • Response to: “Did your erection last long enough for you to have succesful intercourse?” (Nominal: Yes/No) Source: Carson, et al. (2004).

  12. Example - Clinical Trials of Cialis • Population: All adult males suffering from erectile dysfunction • Sample: 2102 men with mild-to-severe ED in 11 randomized clinical trials • Dependent Variable(s): Co-primary outcomes listed on previous slide • Independent Variable: Cialis Dose: (0, 10, 20 mg) • Research Questions: Does use of Cialis improve erectile function?

  13. Sample Statistics/Population Parameters • Sample Mean and Standard Deviations are most commonly reported summaries of sample data. They are random variables since they will change from one sample to another. • Population Mean (m) and Standard Deviation (s) computed from a population of measurements are fixed (unknown in practice) values called parameters.

More Related