130 likes | 224 Views
Looking at data. Visualization tools. Goals for sections 1.1-1.2. Organize data and break it down into manageable pieces. Learn the terminology for discussing data. Describe patterns and identify peculiarities. Use graphs and numerical summaries. Individuals.
E N D
Looking at data Visualization tools
Goals for sections 1.1-1.2 • Organize data and break it down into manageable pieces. • Learn the terminology for discussing data. • Describe patterns and identify peculiarities. • Use graphs and numerical summaries.
Individuals • A data set contains information about individuals (people, objects, experimental subjects, etc.). • In a typical grading spreadsheet, the students in the class are the individuals; each student is typically assigned one row in the spreadsheet.
Variables • A variable describes some characteristic of an individual. • Each variable has its own column in our spreadsheet. • Variables could include: name, class year, number of absences, homework grade average, midterm test score, etc.
Questions to ask yourself • What individuals are described? How many are there? • Exactly what do the variables describe, and in what units? How many variables are there? • How and by whom were the data collected? Are the variables appropriate for answering questions of interest?
Descriptive statistics • Statistics that we see in the media and other everyday sources are usually descriptive statistics. • Descriptive statistics are summaries of data. • They include charts, graphs and summary statistics (mean, standard deviation, etc.).
Categorical (Qualitative) Variables • Records which category an individual belongs to. • Qualitative variables can be nominal or ordinal. • Nominal example: gender • Ordinal example: class (fresh., soph., jr., sr.) • Arithmetic operations cannot be performed on these values in a way that makes sense.
Quantitative variables • Take on numerical values. • Quantitative variables can be continuous or discrete. • Continuous example: Body weight • Discrete example: Size of family • Arithmetic operations can be performed on these values in a way that makes sense.
Distributions • Tell which values a variable can take, and how often the variable take that value. • The distribution of age groups in the U.S. population is (based on 1999 data): Under 18: 25.7% 18-64 years: 61.6% Over 65 and over: 12.7% • The distribution of gender in the U.S. population is (based on 2000 data): Female: 143.4 mil Male: 138.1 mil
Graphs for qualitative variables • Pie chart: A circle (“pie”) represents all the individuals. “Slices” represent the number or percentage of individuals in each category. • Bar graph/chart: Number or percentage of individuals in each category represented by bars of differing heights.
Graphs for quantitative variables • Histogram • Bars represent the number or percentage of individuals in a certain numerical range of the variable • Stemplot (or “stem-and-leaf” plot) • Similar to a sideways histogram and is easily done with pencil and paper
Describing distributions using graphs • Get an idea of the overall pattern and notice which, if any, observations deviate from it. (These are called outliers). • Unimodal (1 peak)? Bimodal (2 peaks)? • Symmetric? Right-skewed (right tail longer than left)? Left-skewed (left tail longer than right)? • Note the center and spread of the distribution.
Dealing with more than one quantitative variable • Time plot (or “time series” plot) • Values are plotted on y axis with the time of observation on the x axis. • Back-to-back stemplot • Easy to do with pencil and paper, but Minitab won’t make them. • Stacked histograms • Use relative frequencies rather than counts and same axes to facilitate comparisons.