1 / 45

Biostatistics

Asst. Prof. NIRUWAN TURNBULL, PhD Faculty of Public Health Mahasarakham University. Biostatistics. Biostatistics “ Bio- Sadistics ”. In God we trust. All others must bring data.

audreyw
Download Presentation

Biostatistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asst. Prof. NIRUWAN TURNBULL, PhD Faculty of Public Health Mahasarakham University Biostatistics

  2. Biostatistics “Bio-Sadistics”

  3. In God we trust. All others must bring data.

  4. “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” H.G. Wells

  5. Statistical ideas can be intimidating and difficult • Thus: • Statistical results are often “skipped-over” when reading scientific literature • Data is often mis-interpreted Challenges

  6. “On average, my class is doing well. Half of my students think that 2+2=3, the other half thinks that 2+2=5.” Mis-Interpretation of Data

  7. A Bar Chart is a map of the locations of the nearest taverns A p-value is the result of a urinalysis Martingale residuals are the droppings of a rare bird A t-test is a blinded taste test between black and green tea You may think that:

  8. Spends Saturday nights in the library Favorite Movie: Revenge of the Nerds Doesn’t play golf A necessary evil SMART! Statistician’s Image

  9. “A statistician is a person that is good with numbers but that lacks the personality to become an accountant.”

  10. The science of collecting, monitoring, analyzing, summarizing, and interpreting data • This includes study design Statistics

  11. Statistics applied to biological (life) problems, including: • Public health • Medicine • Ecological and environmental • Much more statistics than biology, however biostatisticians must learn the biology as well Biostatistics

  12. Identify and develop treatments for disease and estimate their effects. Identify risk factors for diseases. Design, monitor, analyze, interpret, and report results of clinical studies. Develop statistical methodologies to address questions arising from medical/public health data. Biostatistician Roles

  13. Combines rigors of mathematics with uncertainties of the real world. Can make contribution to advancement of science, statistics, medicine, and public health. Can study diseases/health problems in which you may have an interest (cancers, HIV, reproductive health, …). Why Can it be Interesting?

  14. Much of life is composed of a systematic component (i.e., signal) and a random component (i.e., error or noise) • Example: • Smoking is associated with lung cancer. • Yet not everyone that smokes, gets lung cancer, and not everyone that gets lung cancer, smokes • Yet we know that there is an association (a systematic component) Challenge

  15. Our challenge • Identify the systematic component (separate it from the random component), estimate it, and perhaps make inferences with it A Challenge

  16. Population • A group of individuals that we would like to know something about • Parameter • A characteristic of the population in which we have a particular interest • Often denoted with Greek letters (μ, σ, ρ) • Examples: • The proportion of the population that would respond to a certain drug • The association between a risk factor and a disease in a population Populations and Parameters

  17. คือ ค่าต่างๆ ที่รวบรวมมาจากประชากรหรือคำนวณได้จากประชากร ใช้อักษรกรีกเป็นสัญลักษณ์ ได้แก่ • แทนค่า ค่าเฉลี่ยเลขคณิต • แทนค่า ส่วนเบี่ยงเบนมาตรฐาน • 2แทนค่า ความแปรปรวน • แทนค่า สัมประสิทธิ์สหสัมพันธ์ Parameter

  18. คือค่าต่างๆ ที่รวบรวมมาจากกลุ่มตัวอย่างหรือคำนวณได้จากกลุ่มตัวอย่าง ใช้ตัวภาษาอังกฤษเป็นสัญลักษณ์ ได้แก่ แทนค่า ค่าเฉลี่ยเลขคณิต s แทนค่า ส่วนเบี่ยงเบนมาตรฐาน s2 แทนค่า ความแปรปรวน r แทนค่า สัมประสิทธิ์สหสัมพันธ์ Statistic

  19. Sample • A subset of a population (hopefully representative) • Statistic • A characteristic of the sample • Examples: • The observed proportion of the sample that responds to treatment • The observed association between a risk factor and a disease in this sample Samples and Statistics

  20. Studying populations is too expensive and time-consuming, and thus impractical • If a sample is representative of the population, then by observing the sample we can learn something about the population • And thus by looking at the characteristics of the sample (statistics), we may learn something about the characteristics of the population (parameters) Populations and Samples

  21. Populations and Samples The Big Picture Population Parameters μ, σ, σ2 Sample / Statistics x, s, s2

  22. Two steps • Descriptive Statistics • Describe the sample • Inferential Statistics • Make inferences about the population using what is observed in the sample • Primarily performed in two ways: • Hypothesis testing • Estimation Statistical Analyses

  23. Samples are random • If we had chosen a different sample, then we would obtain different statistics (sampling variation or random variation) • However, note that we are trying to estimate the same (constant) population parameters Issues

  24. Describe the Sample • Begin one variable at a time • Describe important variables in your analyses (e.g., endpoints, demographics, confounders, etc.) Step I – Descriptive Statistics

  25. Several types of data • Nominal • Ordinal • Discrete • Continuous • Time-to-event with censoring • The type of data influences the analysis methods to be employed Types of Data

  26. Types of Variables

  27. Mutually exclusive unordered categories • Examples • Sex (male, female) • Race/ethnicity (white, black, latino, asian, native american, etc.) • Site • Can summarize in: • Tables – using counts and percentages • Bar chart/graph Nominal Data

  28. Ordered Categories • Examples • Adverse events • Mild, moderate, severe, life-threatening, death • Income • Low, medium, high Ordinal Data

  29. Often only integer numbers are possible • If there are many different discrete values, then discrete data is often treated as continuous • Examples: CD4 count, HIV viral load • If there are very few discrete values, then discrete data is often treated as ordinal Discrete Data

  30. Any value on the continuum is possible (even fractions or decimals) • Examples: • Height • Weight • Many “discrete” variables are often treated as continuous • Examples: CD4 count, viral load Continuous Data

  31. Time to an event (continuous variable) • The event does not have to be survival • Concept of “Censoring” • If we follow a person until the event, then the survival time is clear • If we follow someone for a length of time but the event does not occur, the the time is censored (but we still have partial information; namely that the event did not occur during the follow up period) • Examples: time to progression (cancer), time to response, time to relapse, time to death Survival Data

  32. Think of data as a rectangular matrix of rows and columns • Simplest structure • Rows represent the “experimental unit” (e.g., person) • Each row is an independent observation • Columns represent “variables” measured on the experimental unit • More complex structures • Multiple rows per person (e.g., multiple timepoints) Dataset Structure

  33. It is ALWAYS a good idea to summarize your data (at least for important variables) • You become familiar with the data and the characteristics of the sample that you are studying • You can also identify problems with data collection or errors in the data (data management issues) • Range checks for illogical values Data Summaries

  34. Some visual ways to summarize data (one variable at a time): • Tables • Graphs • Bar charts • Histograms • Box plots Visual Data Summaries

  35. Summarizes a variable with counts and percentages The variable is categorical (e.g., nominal or ordinal) Frequency Tables

  36. Frequency Table – Cause of Death

  37. Note that you can take a continuous variable and create categories with it • How do you create categories for a continuous variable? • Choose cutoffs that are biologically meaningful • Natural breaks in the data • Precedent from prior research Frequency Tables

  38. How to choose the categories? • Talk to physician about risk categories • May look at National Cholesterol Education Program (NCEP) guidelines and categories Example: Serum Cholesterol Levels

  39. Bar Graphs • Nominal data • No order to horizontal axis • Histograms • Continuous or ordinal data on horizontal axis • Box Plots • Continuous data Graphical Summaries

  40. The width of the plot has no meaning 25% of the data <13 25% of the data 13-16 25% of the data 16-18 25% of the data >18 Box Plot

  41. http://www.medpagetoday.com/lib/content/Medpage-Guide-to-Biostatistics.pdfhttp://www.medpagetoday.com/lib/content/Medpage-Guide-to-Biostatistics.pdf http://faculty.franklin.uga.edu/dhall/sites/faculty.franklin.uga.edu.dhall/files/lec1.pdf http://www.cartercenter.org/resources/pdfs/health/ephti/library/lecture_notes/env_health_science_students/ln_biostat_hss_final.pdf http://www.meduniwien.ac.at/msi/biometrie/Lehre/phdpropedeutics/Skripten/Medical%20Biostatistics%20I%20version%202009-10.pdf http://www.hstathome.com/tjziyuan/biostatistical%20method.pdf Book should have and read!

  42. Read through Biostatistics book from http://www.hstathome.com/tjziyuan/biostatistical%20method.pdf, then • We need four groups to solve out these problems, then present it, what are they talking about? • Statistical Methods for Assessing Biomarkers (P.81-109) • Power and Sample Size Considerations in Molecular Biology (P.111-130) • Multiple Tests for Genetic Effects in Association Studies (P. 143-168) • Statistical Considerations in Assessing Molecular Markers for Cancer Prognosis and Treatment Efficacy (P. 169-189) LET DO THE WORK!

More Related