510 likes | 527 Views
Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Email: yuppal@ysu.edu. Lecture 1 — Schedule. Goals of the course Data and statistics Tabular methods for summarizing data Graphical methods for summarizing data. Why use Statistics?.
E N D
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal Email: yuppal@ysu.edu
Lecture 1 — Schedule • Goals of the course • Data and statistics • Tabular methods for summarizing data • Graphical methods for summarizing data
Why use Statistics? • To make sense of large amounts of data: • What are the demographics of Youngstown in 2000? • Have U.S. wages increased since 1975? • To test hypotheses: • Is demand curve downward sloping? • Are GDP and Saving Rate positively correlated? • To make predictions: • What might happen to savings behavior after a large tax cut?
Data: Basic Definitions • Data: a set of measurements • Dataset: all data collected for one study • Element, or unit: an entity on which data are collected • Variable: a property or attribute of each unit • Observation: the values of all variables for one unit
Data: Basic Definitions Variables Observation Element Names Stock Annual Earn/ Exchange Sales($M) Share($) Company AMEX 73.10 0.86 OTC 74.00 1.67 NYSE 365.70 0.86 NYSE 111.40 0.33 AMEX 17.60 0.13 Dataram EnergySouth Keystone LandCare Psychemedics Data Set
Data: Scales of Measurement • Four scales of measurement: • Nominal, ordinal, interval, and ratio scales • Scale determines which methods of summarization and analysis are appropriate for any given variable
Data: Scales of Measurement • Characteristic • Nominal, like a label or name for a characteristic • e.g., color: red, green, blue • race: black, Hispanic, white, Asian • binary: (male, female), (yes, no), (0, 1) • Ordinal, still a characteristic, but having a natural order • e.g., how was service?: poor, average, good
Data: Scales of Measurement • Numeric • Interval scale • Numeric data showing the properties of ordinal data • e.g., SAT scores, Fahrenheit temperature • Ratio scale • Ordered, numeric data with real zero • e.g., income, distance, price, quantity • http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/node5.html
Data: Other Classifications • Qualitative, or categorical: measures a quality • Quantitative: numeric values that indicate how much or how many • Cross-sectional: data collected at one point in time • Time series: data collected over several time periods • Panel or longitudinal: combination of cross-sectional and time series
Data: Summary of Definitions Data Qualitative Quantitative Numerical Numerical Nonnumerical Nominal Ordinal Nominal Ordinal Interval Ratio
Statistical Inference: Definitions • Population: the set of all elements of interest in a study • Sample: a subset of the population • Statistical Inference: the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population
Statistical Inference: Process 1. Population consists of all tune-ups. Average cost of parts is unknown. 2. A sample of 50 engine tune-ups is examined. 3. The sample data provide a sample average parts cost of $79 per tune-up. 4. The sample average is used to estimate the population average.
Descriptive Statistics: Definition • Descriptive statistics are the tabular, graphical, and numerical methods used to summarize data
Descriptive Statistics: Common Methods • Some common methods: • Tabular • Frequency table (for one variable) • Crosstabulation, or crosstab (for more than one variable) • Graphical • Bar graph (for categorical variables) • Histogram (for interval- or ratio-scaled variables) • Scatterplot (for two variables) • Numerical • Mean (arithmetic average)
Summarizing Qualitative Data • Frequency distribution • Relative frequency distribution • Bar graph • Pie chart • Objective is to provide insights about the data that cannot be quickly obtained by looking at the original data
Distribution Tables • Frequency distribution is a tabular summary of the data showing the frequency (or number) of items in each of several non-overlapping classes • Relative frequency distribution looks the same, but contains proportion of items in each class
Summarizing Quantitative Data • Frequency Distribution • Relative Frequency Distribution • Dot Plot • Histogram • Cumulative Distributions
Example 2: Rental Market in Youngstown • Suppose you were moving to Youngstown, and you wanted to get an idea of what the rental market for an apartment (having more than 1 room) is like • I have the following sample of rental prices
Example: Rental Market in Youngstown • Sample of 28 rental listings from craigslist:
Frequency Distribution • To deal with large datasets • Divide data in different classes • Select a width for the classes
Frequency Distribution (Cont’d) • Guidelines for Selecting Number of Classes • Use between 5 and 20 classes • Datasets with a larger number of elements usually require a larger number of classes • Smaller datasets usually require fewer classes
Frequency Distribution • Guidelines for Selecting Width of Classes • Use classes of equal width • Approximate Class Width =
Frequency Distribution • For our rental data, if we choose six classes: • Class Width = (750-330)/6 = 70
Relative Frequency • To calculate relative frequency, just divide the class frequency by the total Frequency
Relative Frequency • Insights gained from Relative Frequency Distribution: • 32% of rents are between $539 and $609 • Only 7% of rents are above $680
Histogram Histogram of Youngstown Rental Prices
.35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Symmetric • Left tail is the mirror image of the right tail • Example: heights and weights of people Relative Frequency
.35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Moderately Left or Negatively Skewed • A longer tail to the left • Example: exam scores Relative Frequency
.35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Moderately Right or Positively Skewed • A longer tail to the right • Example: hourly wages Relative Frequency
.35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Highly Right or Positively Skewed • A very long tail to the right • Example: executive salaries Relative Frequency
Cumulative Distributions • Cumulative frequency distribution: • shows the number of items with values less than or equal to a particular value (or the upper limit of each class when we divide the data in classes) • Cumulative relative frequency distribution: • shows the proportion of items with values less than or equal to a particular value (or the upper limit of each class when we divide the data in classes) • Usually only used with quantitative data!
Cumulative Distributions • Youngstown Rental Prices
Crosstabulations andScatter Diagrams • So far, we have focused on methods that are used to summarize data for one variables at a time • Often, we are really interested in the relationship between two variables • Crosstabs and scatter diagrams are two methods for summarizing data for two (or more) variables simultaneously
Crosstabs • A crosstab is a tabular summary of data for two variables • Crosstabs can be used with any combination of qualitative and quantitative variables • The left and top margins define the classes for the two variables
Example: Data on MLB Teams • Data from the 2002 Major League Baseball season • Two variables: • Number of wins • Average stadium attendance
Crosstab Frequency distribution for the wins variable Frequency distribution for the attendance variable
Crosstabs: Row or Column Percentages • Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables
Crosstab: Simpson’s Paradox • Data in two or more crosstabulations are often aggregated to produce a summary crosstab • We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstab • Simpsons’ Paradox: • In some cases, the conclusions based upon an aggregated crosstab can be completely reversed if we look at the unaggregated data
Crosstab: Simpsons Paradox Frequency distribution for the wins variable Frequency distribution for the attendance variable
Scatter Diagram and Trendline • A scatter diagram, or scatter plot, is a graphical presentation of the relationship between two quantitative variables • One variable is shown on the horizontal axis and the other is shown on the vertical axis • The general pattern of the plotted lines suggest the overall relationship between the variables • A trendline is an approximation of the relationship
Scatter Diagram • A Positive Relationship: y x
Scatter Diagram • A Negative Relationship y x
Scatter Diagram • No Apparent Relationship y x