230 likes | 242 Views
Learn about statistics, the science of data, and how it helps make sense of the world by finding patterns and relationships. Discover the value of data and how it is used in various scenarios, such as personalized advertising and analyzing driver behavior. Explore the importance of context in interpreting data and the different types of variables: identifiers, categorical, and quantitative.
E N D
Statistics: The science of data • Data: collection of numbers, characters, images or other items along with their context that provide information about something What is Statistics and data?
Facebook: If you have a Facebook account you have probably noticed that the ads you see online tend to match your interests and activities. Much of your personal information has been sold to marketing or tracking companies. Your data are valuable! A company can find out your age, sex, education level, job, hobbies and activities. Examples
Target stores make customer profiles by collecting data about people using credit cards. Patterns the company discovers across similar customer profiles enable it to send you advertising and coupons that promote items you may be interested in purchasing. Examples
How dangerous is texting while driving? • Researchers compare reaction time of sober drivers, drunk drivers, and texting drivers. The results were striking. The texting drivers actually responded more slowly and were more dangerous than those who were above the legal limit for alcohol Examples
Data vary because we don’t see everything and because even what we do see and measure, we measure imperfectly. • Example: Ask different people the same question and you will get lots of different answers • Statistics helps us make sense of the world by seeing past the underlying variation to find patterns and relationships. Statistics is about variation
Let’s start with an example: Amazon.com • Background: Amazon started as book store in 1995. By 1997 Amazon had 2.5 million books sold to more than 1.5 million customers in 150 countries. In 2010, sales reached 34.2 billion and they now sell basically everything, including a $400,000 necklace, Yak cheese from Tibet and the largest book in the world. What are Data?
So how did they do it? How do they track their customers? • The answer is data! What are Data?
Your name and address? Yes, but they are not numbers. Numbers only? The amount of your last purchase. What are Data? Zip Code? This is a number, but is it used for analysis such as average?
Think of some data points that Amazon may collect: • Try to guess what each column represents. What are Data?
Why is this hard? • Because there is no context. If we don’t know what values are measured and what is measured about them, the values are meaningless. • We can make the meaning clear if we organize them in a data table: What are Data?
Data must have context to be meaningful. • Without context data cannot be interpreted. • What information provides good context? • Who • What • Where • Why • When • How Context
Are the numbers listed above data? • Data must have contextto be meaningful. The numbers listed above could be test scores, ages of a group of golfers, or the uniform numbers of the starting backfield on the football team. • Without context data cannot be interpreted. 17, 21, 44, 76
How the data are collected can make the difference between insight and nonsense. For example, data that come from a voluntary survey on the Internet are almost always worthless. The When • Time frame – Data recorded in 1803 means something much different than data recorded now The Where • Place – data measured in India may be different than data measured in Mexico. • More specific – indoors/outdoors, house/office The How
In general the rows of a data table correspond to the individual cases about the whom/which the data was collected, but cases go by different names depending on the situation: • Individuals who answer a survey are called respondents • People on whom we experiment are called subjects or participants • In a database, the rows are called records • Otherwise we call them what they are: customers, economic quarters, or companies, etc. The Who
Characteristics recorded about each individual are called (variables) usually the columns. • Can be broken into three categories: • Identifiers • Categorical • Quantitative The What
Identifiers are useful but not typically used for analysis. • Everyone has a unique one and they are useful for not confusing cases, but not needed to be analyzed. • Examples: Student ID numbers, driver license numbers, social security numbers The What
Categorical Variables: Tell the group/category each individual belongs to. • Usually text values, not numbers. Any descriptive responses are usually categorical. • Examples: Male/Female, pierced/not, eye color, state, country • Numerical examples: zip code, area code The What
Quantitative Variables: When a variable contains measured numerical values for which it makes sense to find an average, usually with units. • The units provide a meaning and also a scale in particular situations so we know how far apart two variables are. • Examples: Cost, life span, distance, degrees The What
Either/or: Some variables with numeric values can be either categorical or quantitative depending on what we want to know • Example: Age • Quantitative – Amazon wants to know the average age of those customers that visit their site after 3 am. • Categorical – When deciding which album to feature when you visit the site, they’ll have categories child, teen, adult, senior. The What
Example – Identify each variable as categorical or quantitative. • A consumer reports article about 25 tablet computers lists each tablet’s manufacturer, cost, battery life (hours), operating system (iOS/Android), and overall performance score. • Manufacturer – Categorical • Cost – Quantitative • Battery life – Quantitative • Operating system – categorical • Performance score – Either The What
Suppose a Consumer Reports article (published in June 2005) on energy bars gave the brand name, flavor, price, number of calories and grams of protein and fat. Identify the following • Who: • What: • When: • Where: • How: • Why: • Categorical variables: • Quantitative Variables (with units): Example
Popular magazines and websites rank colleges and universities on their “academic quality” in serving undergraduate students. Describe two categorical variables and two quantitative variables that you might record for each institution. Exit Slip