1 / 13

Looking at data

Looking at data. Visualization tools. Goals for sections 1.1-1.2. Organize data and break it down into manageable pieces. Learn the terminology for discussing data. Describe patterns and identify peculiarities. Use graphs and numerical summaries. Individuals.

tino
Download Presentation

Looking at data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Looking at data Visualization tools

  2. Goals for sections 1.1-1.2 • Organize data and break it down into manageable pieces. • Learn the terminology for discussing data. • Describe patterns and identify peculiarities. • Use graphs and numerical summaries.

  3. Individuals • A data set contains information about individuals (people, objects, experimental subjects, etc.). • In a typical grading spreadsheet, the students in the class are the individuals; each student is typically assigned one row in the spreadsheet.

  4. Variables • A variable describes some characteristic of an individual. • Each variable has its own column in our spreadsheet. • Variables could include: name, class year, number of absences, homework grade average, midterm test score, etc.

  5. Questions to ask yourself • What individuals are described? How many are there? • Exactly what do the variables describe, and in what units? How many variables are there? • How and by whom were the data collected? Are the variables appropriate for answering questions of interest?

  6. Descriptive statistics • Statistics that we see in the media and other everyday sources are usually descriptive statistics. • Descriptive statistics are summaries of data. • They include charts, graphs and summary statistics (mean, standard deviation, etc.).

  7. Categorical (Qualitative) Variables • Records which category an individual belongs to. • Qualitative variables can be nominal or ordinal. • Nominal example: gender • Ordinal example: class (fresh., soph., jr., sr.) • Arithmetic operations cannot be performed on these values in a way that makes sense.

  8. Quantitative variables • Take on numerical values. • Quantitative variables can be continuous or discrete. • Continuous example: Body weight • Discrete example: Size of family • Arithmetic operations can be performed on these values in a way that makes sense.

  9. Distributions • Tell which values a variable can take, and how often the variable take that value. • The distribution of age groups in the U.S. population is (based on 1999 data): Under 18: 25.7% 18-64 years: 61.6% Over 65 and over: 12.7% • The distribution of gender in the U.S. population is (based on 2000 data): Female: 143.4 mil Male: 138.1 mil

  10. Graphs for qualitative variables • Pie chart: A circle (“pie”) represents all the individuals. “Slices” represent the number or percentage of individuals in each category. • Bar graph/chart: Number or percentage of individuals in each category represented by bars of differing heights.

  11. Graphs for quantitative variables • Histogram • Bars represent the number or percentage of individuals in a certain numerical range of the variable • Stemplot (or “stem-and-leaf” plot) • Similar to a sideways histogram and is easily done with pencil and paper

  12. Describing distributions using graphs • Get an idea of the overall pattern and notice which, if any, observations deviate from it. (These are called outliers). • Unimodal (1 peak)? Bimodal (2 peaks)? • Symmetric? Right-skewed (right tail longer than left)? Left-skewed (left tail longer than right)? • Note the center and spread of the distribution.

  13. Dealing with more than one quantitative variable • Time plot (or “time series” plot) • Values are plotted on y axis with the time of observation on the x axis. • Back-to-back stemplot • Easy to do with pencil and paper, but Minitab won’t make them. • Stacked histograms • Use relative frequencies rather than counts and same axes to facilitate comparisons.

More Related