STAT 250.3: Introduction to Biostatistics Instructor: Efi Antoniou

STAT 250.3: Introduction to BiostatisticsInstructor: Efi Antoniou Introduction

Highlights from the Syllabus • The course website will contain important information…so plan to access it frequently: http://www.stat.psu.edu/~antoniou/stat250.3 • Help is available! Don’t be afraid to ask for it.

Breakdown of Grades • Homework: Assigned every two weeks, and due at the beginning of Wednesday’s lecture. Approximately 7 homeworks. [20%] • Quizzes: There is one quiz every two weeks. There are going to be 15 minutes MC quizzes. Approximately 6 Quizzes. [25%] • Mid-term Exams: 2 midterm-exams. [30%] • Final: Comprehensive final. [25%] • Dates of quizzes and exams are on the course outline on the website. NOTE: Homework Problems/Solutions, and Study Guides will be Provided on the course web site.

Contact Information • Instructor: Efi Antoniou 330b Thomas Bldg. Office Hours: 10:00-11:00 MT email: antoniou@stat.psu.edu • TA: Shu-Min Liao 301 Thomas Bldg. Office Hours: 1:00-3:00 R email: sxl340@psu.edu

What is Statistics? • Statistics is a collection of procedures and principles for gathering data and analyzing information when faced with uncertainty. When we have a question that needs answering, we use statistics as a method to find the answer. Statistics helps us to ask the right question, collect the right data, and make the correct conclusion. Statistics is not just about number crunching! Critical thinking skills and common sense are far more important than mathematical ability.

Statistics in a Nutshell • We use statistics to make conclusions about populations from samples. 2. Describe the SAMPLE 1. Draw a Representative SAMPLE from the POPULATION 3. Use Rules of Probability and Statistics to make Conclusions about the POPULATION from the SAMPLE. T = (x – μ)/σ P(x) = x(1-p)*(n-x)p

Population, Sample & Data • Consider the following hypothesis of interest: Smoking, weight and parent’s disease status are associated with heard disease in people 18 years of age and older. • Population: All observations that are of interest to the researcher. e.g. Persons 18 years of age and older with heart disease. • Sample: The observations that are actually obtained by the researcher, a subset of the population. e.g. 50 people 18+ years old with heart disease. • Sample size: Usually is denoted by n e.g. n = 50. • Raw data: is a term used for numbers and category labels that have been collected but have not yet been processed. e.g. 50 sets of values for “smoking”, “weight” and “parents’ disease status – one for each individual.

Why do we use Samples? • Because populations are usually too large to measure every unit conveniently. • Because the measuring process might destroy the unit. • Consider the following research questions: 1. How much do PSU students spend on books a semester? 2. What percentage of fire crackers produced by ACME Fire Cracker Company are defective?

Choosing the Sample • We want a sample that is representative of the population of interest! • Considering the previous three questions would the following be appropriate samples? 1. 300 full time PSU students answered this question at the bookstore. 2. 200 firecrackers taken from one batch on June 15th, 2002.

How to Describe the Sample… It is often difficult to draw conclusions from raw data. It is helpful to summarize raw data in the form of tables or graphs for easier interpretation. Consider data from the survey given to stat 200 students that asked their sex, preference for coke or pepsi, and the fastest speed they had ever driven. The raw data would look something like this: Based on this data sheet it would be difficult to compare males and females in their choice of cola and fastest speed ever driven.

Descriptive Statistics We use descriptive statistics to summarize raw data into tables, graphs, and numerical summaries. Consider the previous data when we have summarized the variables using descriptive statistics. It is now easier to see differences between the sexes. Variables Cola by Sex:Variables Fastest Speed by Sex:

Inferential Statistics: Making Conclusions about the Population • Now that we have obtained an appropriate sample, and described the sample data we want to be able to make conclusions about the population. Consider the following example: Sarah conducts an experiment to determine whether a new type of high fat diet is better than the standard low fat diet. In her sample of 100 subjects (50 new, 50 old diet) she finds that those on the new diet lose an average of 10 lbs and those on the old diet lose an average of 8 lbs. Clearly in the sample the new diet was more effective than the old diet, but can we make this conclusion about the population? Is there enough evidence in the sample to suggest that the new diet is better for everyone, or could Sarah’s results have just been by chance (a lucky result)? In this course we will learn statistical methods to answer Sarah’s question, and rules that help us draw conclusions about populations (what we are really interested in!) based on data from the sample.

Variables & Data • Variables: Characteristics that varies from one individual to the next. • Raw Data (Data): All of the information we gather on the subjects. It includes one ore more variables.

Quantitative and Categorical data • Raw data from quantitative variables consist of numerical values taken on each individual. Examples: height, number of siblings, IQ score. • Raw data from categorical variables consist of group or category names that don’t necessarily have a logical ordering. Examples: eye color, country of residence, t-shirt size.

For tomorrow… • Skim over Chapter 1 and Sections 2.1, 2.2 2.3., 6.1 and 6.5.

STAT 250.3: Introduction to Biostatistics Instructor: Efi Antoniou