Effective Data Presentation Techniques in Thesis Writing

How to present data or results in Thesis? Dr.Leeberk Raja MBBS.,MD Consultant, Division of Community Health Bangalore Baptist Hospital

Tables and charts • The kind of figure one uses depend on the type of information you want to convey and type of variable • Tables : better for presenting data • Charts: better for presenting the message

Types of Variables

Nominal • Individual observation which is usually a word • Variables that are mutually exclusive • No order • When categorized into two, its known as dichotomized or binary variable • Eg. Gender, Religion

Ordinal • Rank ordered but not equal distance between variable points • The difference between moderate and severe or not necessarily the difference between mild and moderate • We can not say severe is as twice as that of mild • No arithmetic operations can be done • Eg • I agree, strongly agree, very strongly agree • Mild, moderate, severe

Numerical- discrete • A natural order exists among possible values. Both ordering and magnitude are important • Parity, number of TB is cases in a year

Numerical – Continuous data • Data represents measurable quantities but not restricted to taking on certain specified values • Eg: Serum Cholesterol, height, weight • Fractional values are possible • Arithmetic applications possible • Can be transformed into discrete/ordinal/binary variable by grouping them to meaningful categories

Dependent and independent variables • Independent variable : (predictor) is a variable that is thought to influence another variable ( dependent) • Dependent variable: (outcome variable) is a variable that is dependent on an independent variable • Smoking and Lung cancer • When we draw a graph, dependent variable is on Y axis and independent on X-axis

Organizing data • When data are collected in original form, they are called raw data • When the raw data is organized into a frequency distribution, the frequency will be the number of values in a specific class of the distribution • Frequency distribution: number of times characteristic of a variable is observed in the sample • Presented in one way table or charts or graphs

Presenting the frequency distributionCategorical data – nominal, ordinal or grouped

Continuous data Presentation

Ungrouped frequency distribution • Ungrouped frequency distribution – can be used for discrete data when the range of values in the data set is not large. • Examples – number of miles patients have to travel from home to health services

Table – Frequency table (ungrouped)

Grouped frequency distributions • Grouped frequency distributions can be used when the range of values in the continuous/discrete data set is very large. The data must be grouped into classes that are more than one unit width • Examples – weight, height, age group

Birth weights of babies born in an Hospital

Guidelines for constructing a Frequency distribution • The classes must be mutually exclusive • The classes must be continuous • The classes must be equal in width

Two way tables for combination of variables

Guidelines for Tables • Present information in rows and columns ( orderly arranged) • Organize data in meaningful way • Clearly label rows and columns (no abbreviations) • Note units clearly • Show percents – round to nearest whole numbers • Show total numbers • Table no. and title • Identify the source of data

Graphs/Charts Frequency distribution: Nominal/categorical data – Pie and bar charts Continuous data – Histogram, frequency polygon Comparing variables: Nominal/categorical – grouped or clustered bar stacked/component bar Continuous data – Polygon , scatter plot , line graph

Bar Chart • Graphical way to organize data • Easier to read than tables, but give less details • The most informative graphs are relatively simple and self explanatory • The x-axis gives the independent variable and y-axis depicts the values of the dependent variable • Use few bars ( maximum of 6)

Bar charts Simple bar charts: To display frequency distribution of nominal of ordinal variable Clustered (grouped) bar charts: Results of a cross tabulation ( eg : SES Vs nutritional status) can be well presented by clustered bars, but as there will be unequal number of people in each category, it is difficult to compare length across the categories Stacked or component bar chart: To represent the proportion of people in each category ( SES Vs nutritional status). Stack bars in a clustered one top of the other.

Simple bar chart Distribution of study population by religion ( n=200) Bar length shows frequency or % Equal bar widths Zero point Religion

Grouped bar chart Menigococcal disease by quarter, Bangalore (1994-1996) (n=535) Number of cases Quarter

Stacked or component bar chart Menigococcal disease by quarter, Bangalore (1994-1996) (n=535) Proportion of cases Quarter

Pie charts • Pie charts are used to represent the distribution of categorical variable . • Rules for pie charts: • Use pie charts for data that add up to some meaningful total • Never ever use three dimensional pie charts • Avoid forcing comparisons across more than one pie chart Bar charts are better than pie charts

Distribution of study population by religion (n=200) • Shows breakdown of total quantity in categories • Useful for showing relative differences • Proportion should add upto 100% • Used for small number of Categories. Best for 6 or less. Max 10. • Angle size = 360/percent= 360/10=36 degree

Histogram • Displays data by using vertical bars of various heights to represent frequencies • Frequency distribution of continuous data • Area of each column represents number of cases • No spaces between columns • No scale breaks • Equal class intervals

Histogram

Comparison

Frequency polygon • Graph that displays data by using line that connect points plotted for frequencies at the midpoint of classes • The frequencies represent the heights of midpoints

Frequency polygon

Box and Whiskers plot • A boxplot splits the data set into quartiles. The body of the boxplot consists of a "box" (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3). • Within the box, a vertical line is drawn at the Q2, the median of the data set. Two horizontal lines, called whiskers, extend from the front and back of the box. The front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier

If the data set includes one or more outliers, they are plotted separately as points on the chart. In the boxplot below, two outliers follow the second whisker

Scatter plot • A scatterplot is a graphic tool used to display the relationship between two quantitative variables • A scatterplot consists of an X axis (the horizontal axis), a Y axis (the vertical axis), and a series of dots. Each dot on the scatterplot represents one observation from a data set. Variables have to be on continuous scale.

Scatter plot • Slope refers to the direction of change in variable Y when variable X gets bigger. If variable Y also gets bigger, the slope is positive; but if variable Y gets smaller, the slope is negative. • Strength refers to the degree of "scatter" in the plot. If the dots are widely spread, the relationship between variables is weak. If the dots are concentrated around a line, the relationship is strong

Types of Scatterplots

How to present? Association between age at onset of drinking and relapse • Row total • If p value is 0.00. Write as <0.05

t-tests • One sample ‘t’ test • ‘t’ test for two independent samples • ‘t’ test for two paired test

Comparison of two independent means (student’s t-test/unpaired t-test) • This test is used when we wish to compare two means

How to present?

Paired ‘t’ test • Same individuals are studied more than once in different circumstances E.g: Measurement made on the same people before and after intervention • Outcome variable should be continuous • The difference between pre-post measurements should be normally distributed.

How to present?

Examples • Objectives • To determine the prevalence of erectile dysfunction (ED) among patients with Diabetes Mellitus visiting the out patient and primary care facilities of a teaching hospital in Bangalore. • To assess the determinants of ED in the same population

Descriptive statistics • Age distribution ( mean, SD, range) – Categories (Tables, bar chart) • Sex distribution (pie chart • Occupation (Tables, bar chart) • No.years with DM • Peripheral neuropathy

Factors Contributing ED • Age Vs ED (Chi square, student ‘t’ test) • Sex Vs ED • Duration Vs ED • Peripheral neuropathy Vs ED

Questions?

Thank you!

Effective Data Presentation Techniques in Thesis Writing