90 likes | 231 Views
CHAPTER 2. 2.1 - Basic Definitions and Properties Population Characteristics = “Parameters” Sample Characteristics = “Statistics” Random Variables ( Numerical vs. Categorical ) 2.2, 2.3 - Exploratory Data Analysis Graphical Displays Descriptive Statistics
E N D
CHAPTER 2 2.1 - Basic Definitions and Properties Population Characteristics = “Parameters” Sample Characteristics = “Statistics” Random Variables (Numerical vs. Categorical) 2.2, 2.3 - Exploratory Data Analysis Graphical Displays Descriptive Statistics Measures of Center (mode, median, mean) Measures of Spread (range, variance, standard deviation)
POPULATION – composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative • Quantitative [measurement] • length • mass • temperature • pulse rate • # puppies • shoe size 10½ 11 10
CONTINUOUS (can take their values at any point in a continuous interval) DISCRETE (only take their values in disconnected jumps) POPULATION–composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite). What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative • Quantitative [measurement] • length • mass • temperature • pulse rate • # puppies • shoe size
Qualitative [categorical] • video game levels (1, 2, 3,...) • income level(1 = low, 2 = mid, 3 = high) • zip code • ID # • color (Red, Green, Blue) ORDINAL, RANKED 1, “Success” 0, “Failure” X = POPULATION– composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite). What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative NOMINAL 1 2 3 • IMPORTANT CASE: • Binary (or Dichotomous) • Gender (Male / Female) • “Pregnant?” (Yes / No) • Coin toss (Heads / Tails) • Treatment (Drug / Placebo)
Qualitative [categorical] • video game levels (1, 2, 3,...) • income level(1 = low, 2 = mid, 3 = high) • zip code • ID # • color (Red, Green, Blue) ORDINAL, RANKED 1, “Success” 0, “Failure” X = Example: Excel file of patient blood types POPULATION – composed of “units” (people, rocks, toasters,...) Another way… define X using “indicator variables”: Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite). Note that I1 + I2 + I3 = 1 What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative NOMINAL 1 2 3 • IMPORTANT CASE: • Binary (or Dichotomous) • Gender (Male / Female) • “Pregnant?” (Yes / No) • Coin toss (Heads / Tails) • Treatment (Drug / Placebo) Note that each patient row sums to 1, i.e., O + A + B + AB = 1.
“Population Distribution of X” (somewhat idealized) “Population Distribution of X” (somewhat idealized) X X POPULATION –composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite). “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative Population “standard deviation” Population mean (“mu”) and (“sigma”) are examples of parameters – nonrandom “population characteristics” whose exact values cannot be directly measured, but can (hopefully) be estimated from known “sample characteristics” – statistics.
“Population Distribution of X” (somewhat idealized) X POPULATION–composed of “units” (people, rocks, toasters,...) Random variable X (Example: X = Age) How do we infer information about the population variable X? = value of X for 1st individual x1 x3 = value of X for 2nd individual x2 x6 x4 …etc…. x5 xn SAMPLE of size n
“Population Distribution of X” (somewhat idealized) X POPULATION –composed of “units” (people, rocks, toasters,...) Random variable X (Example: X = Age) “Parameter Estimation” “Statistical Inference” x1 x3 x1 + x2 + x3 + x4 + x5 + x6 + … + xn x2 x6 x4 n …etc…. x5 xn Sample mean An example of a statistic SAMPLE of size n x = x1 x2 x4 x5 xn x3 x6 There are many potential random samples of a fixed size n, each with its own estimate of µ. It will eventually become important to understand the structure of their variability.
Statistics are numerical values that are culled from a random sample of measurements taken from a specific population, in an effort to “summarize” its overall distribution, and estimate certain parameters (i.e., numerical characteristics) of that population. • Statistics – as a discipline – consists of a collection of formal testing procedures, designed to infer a conclusion regarding a specific hypothesis about the population, based on the sample data. • Statistics is sometimes referred to as the “search for sources of random variation” in a system. How much of a signal is genuinely significant information to be detected, and how much is random “noise”? • The “classical scientific method” provides a general framework for conducting formal statistical analysis.