1 / 17

Data Analysis

Data Analysis. E3: Lecture 8. Data Analysis. Lecture Outline Processing and Visualizing Data Why do we do this? Processing the Luria-Delbruck Data Processing the Public Goods Data Analyzing Data (using Excel) Difference in means (t-test) Difference in distributions ( c 2 test).

Download Presentation

Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Analysis E3: Lecture 8

  2. Data Analysis • Lecture Outline • Processing and Visualizing Data • Why do we do this? • Processing the Luria-Delbruck Data • Processing the Public Goods Data • Analyzing Data (using Excel) • Difference in means (t-test) • Difference in distributions (c2 test)

  3. Data Analysis • Lecture Outline • Processing and Visualizing Data • Why do we do this? • Processing the Luria-Delbruck Data • Processing the Public Goods Data • Analyzing Data (using Excel) • Difference in means (t-test) • Difference in distributions (c2 test)

  4. Massaging? Dressing-up? Focusing 3 P colonies 4 R colonies Handling Data • After a laboratory experiment or time out in the field, you will have several data points. • How should one process this (potentially voluminous) data? • Organize it (spreadsheet programs, like Excel, can help) • Process it • Investigate portions of the data set • Look at relevant descriptive statistics • Transform data points in a well-defined way • Combine data points in a well-defined way • Visualize it • Subject it to an appropriate statistical test *

  5. Picture = Words  1000 • We are visual animals and often can see patterns when data is presented visually • Examples: • Pie-chart illustrates the distribution of values of a single variable • X-Y plot illustrates the form of the relationship between two variables • Paired histograms illustrate the relationship between the distributions of two variables. • The most appropriate picture will often depend on the data: • Categorical or quantitative? • Frequencies, counts or measurements? • Relationship between data points?

  6. Data Analysis • Lecture Outline • Processing and Visualizing Data • Why do we do this? • Processing the Luria-Delbruck Data • Processing the Public Goods Data • Analyzing Data (using Excel) • Difference in means (t-test) • Difference in distributions (c2 test)

  7. The Data • Go to our class website: • http://depts.washington.edu/kerrpost/Bio481/HomePage • On the DATA link, download the following Excel (xls) files: • “E3_LD_Processed_Data” • “E3_PG_Processed_Data” • Take care as you process and visualize the class data– the product of your efforts can be used directly in your first two lab reports.

  8. Processing the Luria-Delbruck Data DAY 1: Tuesday DAY 3: Thursday COUNT ×24 48 hours at 37C COUNT 48 hours at 37C ×3 • We’ll start by computing some useful statistics: • Mean number of colonies on a rifampicin plate. • Variance in number of colonies on a rifampicin plate. • Total number of rifampicin plates (number of replicates in the class). • Next we will compile the full distribution of rifampicin plate counts: • Actual distribution (COUNTIF function will be useful) • Expected distribution (get ready to write a complicated function!) • Let’s plot these distributions. • Finally, let’s compute the density of cells in the original wells.

  9. Data Analysis • Lecture Outline • Processing and Visualizing Data • Why do we do this? • Processing the Luria-Delbruck Data • Processing the Public Goods Data • Analyzing Data (using Excel) • Difference in means (t-test) • Difference in distributions (c2 test)

  10. Processing the Public Goods Data DAY 1: Tuesday DAY 2: Wednesday DAY 3:Thursday COUNT BK26+ pBR BK26+pBR alone BK26+pBR alone 24 hours at 37C 24 hours at 37C 24 hours at 37C 24 hours at 37C 24 hours at 37C 24 hours at 37C 24 hours at 37C 24 hours at 37C 24 hours at 37C 24 hours at 37C COUNT BK27 alone BK27 alone BK27 monocultures ↑ competitions ↓ monocultures ↑ competitions ↓ COUNT BK26+ pBR COUNT Liquid comp. Liquid comp. Init. Mix amp COUNT COUNT BK27 Agar comp. Agar comp. • We’ll start by computing the densities when alone. • Next, let’s compute the relative fitnesses (BK26+pBR relative to BK27) & plot these.

  11. Save Your Work • Save your work by renaming the data files: • “E3_LD_Processed_Data_YOUR_INITIALS” • “E3_PG_Processed_Data_YOUR_INITIALS” • Save these files on a thumb drive or email them to yourself. • We will continue to work on these during class today.

  12. Data Analysis • Lecture Outline • Processing and Visualizing Data • Why do we do this? • Processing the Luria-Delbruck Data • Processing the Public Goods Data • Analyzing Data (using Excel) • Difference in means (t-test) • Difference in distributions (c2 test)

  13. Number of flips Number of “heads” 4 1 12 3 100 25 Control High-protein Control High-protein High-protein Control frequency ♀e♂e How do we statistically evaluate data? • When you were a child, your father tells you he will let you stay up late if the result of a coin he flips is heads. • Suppose the coin comes up heads 25% of the time • Is your Dad using a fair coin? How would you evaluate this? • Suppose your hear that a high-protein diet during puberty leads to an increased height as an adult. • The mean height in a high protein treatment was 5’11” and the mean height in a control treatment was 5’5” • What would you feed your kids? How do you gauge this? • The New York Times has just done an expose about sexism in graduate admissions in a famous department of mathematics • While the number of male and female applicants was equal, the number of males admitted was greater. • Should an formal inquiry take place? How do you evaluate the data? ♀a♂a

  14. The Data • Go to your email or thumb drive and download your processed data files: • “E3_LD_Processed_Data_YOUR_INITIALS” • “E3_PG_Processed_Data_YOUR_INITIALS” • Take care as you analyze the class data– the product of your efforts can be used directly in your first two lab reports.

  15. William Sealy Gossett Student’s t-test • Using a pseudonym, “Student,” Gossett described a test for distinguishing the difference between means of 2 data sets. • The t-test uses the statistics from two groups of data (means and s.d.) to generate a third statistic (the t statistic). • If the two groups of data come from populations with the same mean, the t statistic has a characteristic distribution itself (note the shape will depend on the sample sizes). • If the computed t is extreme, then the chance that the two groups have equal means is slim (quantified by the p-value of the test). The means are significantly different if p<0.05. • Assumptions • Each datum is independent • Data is normally distributed • DEMO: Performing a t-test • Computing a p-value from a t-test • Distinguish the different types of t-tests: • Paired versus Unpaired data • Equal versus Unequal variance • One-tailed versus Two-tailed tests • How can we use t-tests for the Public Goods Data? What type of t-test is appropriate? How do you report it?

  16. Data Analysis • Lecture Outline • Processing and Visualizing Data • Why do we do this? • Processing the Luria-Delbruck Data • Processing the Public Goods Data • Analyzing Data (using Excel) • Difference in means (t-test) • Difference in distributions (c2 test)

  17. Karl Pearson c2 Test • Karl Pearson introduced a test to distinguish whether an observed set of frequencies differs from a specified frequency distribution. • The c2-test uses frequency data to generate a statistic (c2). • If the observed frequencies come from a population with the specified frequency, the c2 statistic has a characteristic distribution (the shape will depend on the # of classes). • If the computed c2 is extreme, then the chance that the observed frequencies derive from the specified distribution is slim (this is quantified by the p-value from the test). The observed frequencies are significantly different if p<0.05. • Assumptions • Frequencies are derived from independent sampling • There are not several frequencies that are very small • DEMO: Performing a chi-square test • Computing a p-value from a chi-square-test • Perform a c2 test to see if the Luria-Delbruck Data from the class differs from the frequencies expected under directed mutation. What can you conclude?

More Related