680 likes | 955 Views
MD 5108 Biostatistics for Basic Research. Lecturer: Dr K. Mukherjee Office: S16-06-100 Tel: 874 2764 Email: stamk@nus.edu.sg. Objectives To train practitioners of the biomedical sciences in the use and interpretation of statistical data analysis.
E N D
MD 5108Biostatistics for Basic Research Lecturer: Dr K. Mukherjee Office: S16-06-100 Tel: 874 2764 Email: stamk@nus.edu.sg
Objectives To train practitioners of the biomedical sciences in the use and interpretation of statistical data analysis. • explore and present data using tables, charts and graphs • ability to do simple statistical calculations with a calculator • carry out data analysis using a statistical package such as SPSS • pick the right procedure for analysing a set of data • interpret results correctly and report findings • avoid misuse and abuse of statistics • understand statistical contents of papers in medical journals • judge claims and statements critically • discuss and communicate ideas in a quantitative manner
Teaching approach • nonmathematical introduction • explanation of concepts rather than proofs • emphasis on methodology and procedures • emphasise use of statistical package rather • than manual calculation • emphasis on choosing the right procedure • emphasis on correct interpretation of results • examples from clinical research literature
Topic 1: What is statistics? “A branch of mathematics dealing with the analysis and interpretation of masses of numerical data”Merrian-Webster Dictionary “The field of study that involves the collection and analysis of numerical facts or data of any kind”Oxford Dictionary “The study of how information should be employed to reflect on, and give guidance for action, in a practical situation involving uncertainty”Vic Barnett Biostatistics: Application of statistical methods to biological, medicine and health sciences
Why the need for Statistics in Biomedicine ? Two main reasons: • Variation • attributes differ not only among individuals but also within the same individual over time • Sampling • biomedical research projects mostly carried out on small numbers of study subjects • challenging problem to project results from small samples studies to individuals at large
Necessitates the use of statistical methods in biomedicine to put numerical data into a context by which we can better judge their meaning Biological Variation
Fromsample to population Statistical methods used to produce statistical inferences about a population based on information from a sample derived from that population Population inductive statistical methods sample
Altman (1991) Practical Statistics for Medical Research, Chapman and Hall.
Bailar & Mosteller (1986) Medical Uses of Statistics, NEJM Books.
Many studies have been done on misuse of statistics in medicine
Schor and Karten (1966, J. Am. Med. Assoc.): • 149 papers classed as “analytical studies” in 3 issues of 11 most frequently read medical journals • assessment criteria: Validity with respect to: • Design of experiment? • Type of analysis performed? • Applicability of statistical test used?
Findings of Schor and Karten: • 28% of papers acceptable • 68% deficient but acceptable if reviewed • 4% unsalvageable Lesson: must be exercised when reading scientific papers in biomedical journals! Knowledge of basic biostatistics is required CARE
“ There are three kinds of lies: lies, damned lies and statistics”Benjamin Disraeli “ It is easy to lie with statistics, but it is easier to lie without them”Frederick Mosteller “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”H.G. Wells
Types of statistical methods 1. Descriptive statistical methods ·data collection and organization ·summarizing data and describing its characteristics ·presentation and publication 2. Exploratory data analysis ·play around and get a feel of the data ·preliminary analysis, often graphical ·looking for patterns and possible relationships ·are assumptions satisfied? ·which model and procedure to use?
3. Inductive (inferential) statistical methods Statistical inferences about a population based on information from a sample derived from that population • estimation, confidence intervals • hypothesis testing • prediction, forecasting • classification Population inductive statistical methods sample
Topic 2: Types of data Sources of data, the raw materials of statistics ·Routinely kept records, e.g., hospital medical records ·Surveys ·Experiments ·Clinical trials ·Data base ·Published reports Any characteristic that can be measured or classified into categories is called a variable
Types of variables (1) Qualitative variables ·cannot be measured numerically ·categorical in nature, e.g., gender ·categories must not overlap and must cover all possibilities wNominal variables (No inherent ordering of categories) §M/F, Yes/No §Blood group (A, B, AB, O) §Ethnic group (Chinese, Malay, Indian, Others) wOrdinal variables (Categories are ordered in some sense) §response to treatment: unimproved, improved, much improved §pain severity: no pain, slight pain, moderate pain, severe pain
(2) Quantitative variables ·can be measured numerically, e.g., weight, height, concentration ·can be continuous or discrete wa continuous variable can take on any value (subject to precision of measuring instrument) within some range or interval, e.g., weight, height, blood pressure, cholesterol level wa discrete variable is usually a count of something and hence takes on integer values only, e.g., number of admissions to NUH Variable types and measurement types ·have implications on how data should be displayed or summarized ·determines the kind of statistical procedures that should be used
SUMMARY Variable Types of variables Qualitative or categorical Quantitative measurement Nominal (not ordered) e.g. ethnic group Ordinal (ordered) e.g. response to treatment Discrete (count data) e.g. number of admissions Continuous (real-valued) e.g. height Measurement scales
Topic 3: Presenting data graphically Advantages of graphical data display · Let data speak for itself · Get a good feel of the data before formal analysis · Graphs and plots easier to understand and interpret · Reveal patterns in data which may shed light on the appropriate model/analysis to use e.g., Skewed or symmetric distribution Multiple peaks / mode Are there any outliers ? Relatioship between variables.
Comparison of methods • ·Bar charts can be read more accurately and offer better distinction between close together values • ·Pie charts especially useful for showing percentage distribution • ·Pie charts can display large and small % simultaneously without scale break • ·A single bar chart is preferable to a single segmented bar chart • ·A series of segmented bar charts is easier to read than a series of pie charts or ordinary bar charts
Plotting by sector rather than by profession · Look at the data from a different angle · Highlight different aspects of the data
A back to back bar chart Source: JAMA, 1978, vol 239, no 21
Comparison of methods ·Stacked bar chart is also a bar chart for the combined data ·Some of the bars in a stacked bar chart are not aligned ·Bars in clustered bar charts are aligned but it is harder to visualize how the component bars would stack up ·Back to back bar charts are applicable when there are 2 groups only, the aggregated bars are not aligned ·Series of stacked or segmented bar charts useful in showing time trend
Time Trend Exaggerate visually the increase in # prescriptions written per person by starting at 8 rather than 0
Stacked bar chart of yearly mortality rate per 1000 births Pagano & Gauvreau (1999) Principles of Biostatistics, Duxbury.
Response under two treatments Response to Treatment None Partial Complete Total Treatment A 3 15 9 27 B 2 22 30 54
A misleading bar chart By design, there are twice as many patients receiving treatment B
Can compare the response type percentages for the two treatments
Graphs for quantitative data · Histogram · Frequency polygon · Box plot
Histogram ·Divide the range of the data into a suitably chosen number of intervals/bins, all of the same width ·The number of observations that fall within each interval is plotted Relative frequency histogram Plot the proportions of observations that fall within the class intervals
Comparison of methods Histogram ·good at revealing distributional shape such as symmetry, skewness, number of peaks etc ·difficult to superimpose or draw side by side Frequency polygons · can be superimposed for easy comparison