1 / 28

Descriptive Statistics – Part 1

Descriptive Statistics – Part 1. Nina Gunnes August 29, 2019. Get to know your data. Data sources The six w’s: w ho, w hat, w hen, w here, w hy, and ho w Data types Categorical data and numerical data (count data and continuous data) Units of measurement Previous data cleaning

mildredw
Download Presentation

Descriptive Statistics – Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descriptive Statistics – Part 1 Nina Gunnes August 29, 2019 Lecture 1

  2. Get to know your data • Data sources • The six w’s: who, what, when, where, why, and how • Data types • Categorical data and numerical data (count data and continuous data) • Units of measurement • Previous data cleaning • Incomplete, incorrect, or inaccurate observations identified and corrected? • Potential problems and peculiarities Lecture 1

  3. Descriptive statistics • Involves describing and summarizing data • Frequency tables • Charts and plots • Measures of central tendency • Measures of dispersion (or variability) • Usually non-parametric • No assumption made about the underlying population from which the sample is drawn Lecture 1

  4. Absolute frequencies • (Absolute) frequency: • Number of observations in the sample with the th ordered value • Cumulative (absolute) frequency: • Number of observations in the sample with values less than, or equal to, the th ordered value Lecture 1

  5. Relative frequencies • Relative frequency: • Proportion of observations in the sample with th ordered value • Cumulative relative frequency: • Proportion of observations in the sample with values less than, or equal to, the th ordered value Lecture 1

  6. Frequency table • Display of frequencies corresponding to specific data values • Including both absolute and relative frequencies • Numerical data sorted in ascending order of magnitude • Number of occurrences counted for each data value (or range of data values) • Women with parity 0, parity 1, parity 2, etc. • Subjects aged 0–9 years, 10–19 years, 20–29 years, etc. Lecture 1

  7. Frequency table, cont. Data from sbp.txt (Laake et al., 2007) Lecture 1

  8. Graphical display of data • Bar chart • Pie chart • Histogram • Dot plot • Time-series plot • Box plot • Scatter plot Lecture 1

  9. Bar chart • Also known as bar graph • Displaying frequencies of a categorical variable • Number of observations within a category represented by the height of a bar • Suitable for comparing categories • May include clusters of categories • Can be plotted vertically or horizontally Lecture 1

  10. Bar chart, cont. Lecture 1

  11. Pie chart • Displaying frequencies of a categorical variable • Number of observations within a category represented by a slice of the pie • The whole pie representing the overall 100% • Allowing distinct comparison between categories at a glance • Most effective for a limited number of categories • The more categories, the thinner the slices (hard to tell them apart!) Lecture 1

  12. Pie chart, cont. Data source: Statistics Norway Data source: Statistics Norway Lecture 1

  13. Histogram • Displaying the distribution of a continuous variable • Data classified into intervals • Number of observations within an interval represented by the height of a bin • Width of bins – a matter of judgement • Depends on the situation and data at hand • Not too narrow (random variation making the histogram irregular) • Not too wide (hiding important features of the distribution) Lecture 1

  14. Histogram, cont. Data from sbp.txt (Laake et al., 2007) Data from sbp.txt (Laake et al., 2007) Lecture 1

  15. Dot plot • Displaying data points of a continuous variable on a scale • An alternative to histogram • Marking every single observation • Suitable for small to moderate sized data sets • Too cluttered for large data sets • Suitable for highlighting clusters and gaps and identifying outliers • Useful for comparing groups Lecture 1

  16. Dot plot, cont. Data from sbp.txt (Laake et al., 2007) Data from sbp.txt (Laake et al., 2007) Lecture 1

  17. Time-series plot • Displaying numerical data with respect to time • Some measure of time (hours, days, weeks, months, years, etc.) as the unit of the time axis (usually the horizontal axis) • Data points plotted and generally connected with straight lines • Showing development over time • Revealing trends (i.e., changes occurring in a general direction) • E.g., increasing or decreasing incidence of a disease Lecture 1

  18. Time-series plot, cont. Data source: Statistics Norway Lecture 1

  19. Box plot • Also called box-and-whisker plot • Displaying a numerical variable through its quartiles • Suitable for comparing groups • Box containing the 50% middle observations • Lower limit corresponding to the first (lower) quartile • Upper limit corresponding to the third (upper) quartile • A line inside the box indicating the median (i.e., second quartile) Lecture 1

  20. Box plot, cont. • Interquartile range: • Outliers (definition may vary) • Values less than • Values greater than • Extreme outliers (definition may vary) • Values less than • Values greater than Lecture 1

  21. Box plot, cont. • Dispersion beyond the quartiles indicated by “whiskers” • Lines extending from either side of the box • Spacing between different parts of the plot indicating the degree of dispersion (spread) and skewness in the data Lecture 1

  22. Box plot, cont. Lower outer fence () Lower inner fence () Lower quartile () Upper quartile () Upper inner fence () Upper outer fence () Median () ○ ○ ○ Lecture 1

  23. Box plot, cont. • Interquartile range • Lower inner fence • Upper inner fence • Lower outer fence • Upper outer fence Data from framingham-kapittel3.txt (Laake et al., 2007) Lecture 1

  24. Box plot, cont. • Interquartile range • Lower inner fence • Upper inner fence • Lower outer fence • Upper outer fence Data from framingham-kapittel3.txt (Laake et al., 2007) Lecture 1

  25. Box plot, cont. Data from framingham-kapittel3.txt (Laake et al., 2007) Lecture 1

  26. Scatter plot • Displaying the relation between two numerical variables • Plotting data pairs • May indicate a possible linear relation (i.e., correlation) between two variables • Correlation not necessarily implying a causation, or causal relation • Showing variation in the relation by the scatter of the data Lecture 1

  27. Scatter plot, cont. Data from sbp.txt (Laake et al., 2007) Lecture 1

  28. References • Aalen OO, Frigessi A, Moger TA, Scheel I, Skovlund E, Veierød MB. 2006. Statistiskemetoderimedisinoghelsefag. Oslo: Gyldendal akademisk. • Laake P, Hjartåker A, Thelle DS, Veierød MB. 2007. Epidemiologiskeogkliniskeforskningsmetoder. Oslo: Gyldendal akademisk. https://www.med.uio.no/imb/forskning/publikasjoner/boker/2007/epidemiolgiske-kliniske-forskningsmetoder.html. Lecture 1

More Related