1 / 19

The W’s of Data

The W’s of Data. Data. Does have to be numbers? It can be doesn’t have to be. Without context, it’s useless! Consider 17, 21, 44, and 76 Are those data?. Data Handout. The Five W’s of Data. Answering the Five W’s of Data provide the context of the data. Who What When Where Why

effie
Download Presentation

The W’s of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The W’s of Data

  2. Data • Does have to be numbers? • It can be doesn’t have to be. • Without context, it’s useless! • Consider 17, 21, 44, and 76 • Are those data?

  3. Data Handout

  4. The Five W’s of Data • Answering the Five W’s of Data provide the context of the data. • Who • What • When • Where • Why • And if possible How

  5. Who • Rows of data correspond to individual cases about whom (or which if not people) we record some characteristics • Respondents – individuals who answer a survey • Subjects or participants – people on whom we experiment • Experimental units – inanimate subjects for experiments • Data values may also be called observations without being clear about the Who

  6. From the data sheet • Who?

  7. What • Variables – the characteristics recorded about each individual • Variables are usually recorded in the columns of a data table • Variables identify Whathas been measured • They may seem simple but think! • Variables have measurement units – it’s natural to count how many cases belong in each category. • The units tell how each value has been measured (scale)

  8. Variables • Categorical variables – name categories and answers how cases fall into these categories. Can also be a qualitative variable • Ex. Gender, Year in school, nationality, etc. • Quantitative variable – answers a question about the quantity of what is measured • Ex. Height, weight, income, etc. • Just because the data are numbers does not make it quantitative • Ex. Zip codes

  9. From the data sheet • What?

  10. Why • It’s the questions we ask a variable that shape how we think about it. • Ex. An end of class survey asks “How valuable do you think this course will be to you?” • 1 = worthless 2 = slightly 3 = middling • 4 = reasonably 5 = invaluable • Is the educational value categorical or quantitative?

  11. From the data sheet • Are variables qualitative or quantitative? • Why?

  12. Counts count • When Amazon offers free shipping, they might first analyze how purchases are shipped. • Counting summarizes the categorical variable, shipping method. • We also use counts to measure quantities such as the number of classes you are taking or how many songs you own. • Two ways to use counts: • Count the cases in each category of a categorical variable, the category label are the What and the individuals counted are the Who • The counts themselves are not data, but they are something to summarize about the data

  13. Example • Back to Amazon’s shipping • What is the categorical variable? • What? • Who? • Why?

  14. The second way is when the focus is on the number of something , which is measured by counting. • Ex. Amazon might track the growth in the number of teenage customers each month to forecast CD sales. • What? • Who? • Units? • Why? • Is teen a category? Is it a quantitative variable?

  15. Identifiers • Is your student ID number a quantitative variable? • Why? • Other examples of identifiers include UPS tracking numbers, social security numbers, driver’s license numbers • Identifier variables do not tell us anything useful about the category because there is exactly one individual in each. • The are used to: • Combine data from different sources • Protect confidentiality • Provide unique labels

  16. We need more information… • We must know Who, What, and Why to analyze but understand more we would also like to know When, Where, and How. • When can make a difference in the data. • Example Number of women with jobs outside the home in 1900 and the number of women with jobs outside the home in 2000. • Where can make a difference in the data • Example Number of high school students participating in ice hockey in Florida and Number participating in ice hockey in Minnesota

  17. How data is collected matters • Survey, interviews, observation, etc. • How could surveys be flawed, especially internet surveys?

  18. Example • Medical researchers at a large city hospital investigated the impact of prenatal care on newborn health collected data from 882 births during 1998-2000. They kept track of the mother’s age, the number of weeks the pregnancy lasted, the type of birth (cesarean, induced, natural), the level of prenatal care the mother had (none, minimal, adequate), the birth weight and sex of the baby, and whether the baby exhibited health problems (none, minor, major). • Identify the W’s, name the variables, specify for each variable whether its use indicates it should be treated as categorical or quantitative, identify the units in which it was measured or note that they were not provided.

  19. Homework p. 16 2-12 even

More Related