1 / 40

Contact Information

Contact Information. Dr. Daniel Simons Vancouver Island University Faculty of Management Building 250 - Room 416 Office Hours: T/TH 11:30 – 12:30, W 12:00 – 13:00 simonsd@viu.ca. Suggestions for Best Individual Performance. Attend all classes

tamarr
Download Presentation

Contact Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Contact Information Dr. Daniel Simons Vancouver Island University Faculty of Management Building 250 - Room 416 Office Hours: T/TH 11:30 – 12:30, W 12:00 – 13:00 simonsd@viu.ca

  2. Suggestions for Best Individual Performance Attend all classes Take notes. Course covers a lot of material and your notes are essential Complete all assignments (not for grade) Read the book Participate, enrich class discussion, provide feedback and ask questions Revise materials between classes, integrate concepts, make sure you understand the tools and their application Don’t hesitate to contact me if necessary

  3. Evaluation Method Tests have a mix of problems that evaluate • Concepts • Problem sets (assignments) • Class applications • Readings • New applications • Closed book time constrained tests to reward knowledge and speed • Each test covers slides, assignments, and required readings. • Evaluation system may not be perfect but it works

  4. STATISTICAL PRINCIPLES A review of the basic principles of statistics used in business settings. REVIEW OF QUME 232

  5. The probability framework for statistical inference Estimation Hypothesis Testing Confidence intervals

  6. Probability • A random variable X is a variable whose numerical value is determined by chance, the outcome of a random phenomenon • A discrete random variable has a countable number of possible values, such as 0, 1, and 2 • A continuous random variable, such as time and distance, can take on any value in an interval • A probability distribution P[Xi] for a discrete random variable X assigns probabilities to the possible values X1, X2, and so on • For example, when a fair six-sided die is rolled, there are six equally likely outcomes, each with a 1/6 probability of occurring • Figure 17.1 shows this probability distribution

  7. Probability Distribution for a Six-Sided Die

  8. Mean, Variance, and Standard Deviation • The expected value (or mean) of a discrete random variable X is a weighted average of all possible values of X, using the probability of each X value as weights: • the variance of a discrete random variable X is a weighted average, for all possible values of X, of the squared difference between X and its expected value, using the probability of each X value as weights: • The standard deviationσ is the square root of the variance

  9. Continuous Random Variables • Our examples to this point have involved discrete random variables, for which we can count the number of possible outcomes: • The coin can be heads or tails; the die can be 1, 2, 3, 4, 5, or 6 • For continuous random variables, however, the outcome can be any value in a given interval • For example, Figure 17.2 shows a spinner for randomly selecting a point on a circle • A continuous probability density curve shows the probability that the outcome is in a specified interval as the corresponding area under the curve • This is illustrated for the case of the spinner in Figure 17.3

  10. Pick a Number, Any Number

  11. Figure 17.3 A Continuous Probability Distribution for the Spinner

  12. Standardized Variables • To standardize a random variable X, we subtract its mean and then divide by its standard deviation : (17.3) • No matter what the initial units of X, the standardized random variable Z has a mean of 0 and a standard deviation of 1 • The standardized variable Z measures how many standard deviations X is above or below its mean: • If X is equal to its mean, Z is equal to 0 • If X is one standard deviation above its mean, Z is equal to 1 • If X is two standard deviations below its mean, Z is equal to –2 • Figures 17.4 and 17.5 illustrates this for the case of dice and fair coin flips, respectively

  13. Probability Distribution for Six-Sided Dice, Using Standardized Z

  14. Probability Distribution for Six-Sided Dice, Using Standardized Z

  15. Probability Distribution for Six-Sided Dice, Using Standardized Z

  16. Figure 17.5a Probability Distribution for Fair Coin Flips, Using Standardized Z

  17. Figure 17.5b Probability Distribution for Fair Coin Flips, Using Standardized Z

  18. Figure 17.5c Probability Distribution for Fair Coin Flips, Using Standardized Z

  19. The Normal Distribution • The density curve for the normal distribution is graphed below • The probability that the value of Z will be in a specified interval is given by the corresponding area under this curve • These areas can be determined by consulting statistical software or a table, such as Table B-7 in Appendix B • Many things follow the normal distribution (at least approximately): • the weights of humans, dogs, and tomatoes • The lengths of thumbs, widths of shoulders, and breadths of skulls • Scores on IQ, SAT, and GRE tests • The number of kernels on ears of corn, ridges on scallop shells, hairs on cats, and leaves on trees

  20. The Normal Distribution

  21. The Normal Distribution (cont.) • The central limit theorem is a very strong result for empirical analysis that builds on the normal distribution • The central limit theorem states that: • if Z is a standardized sum of N independent, identically distributed (discrete or continuous) random variables with a finite, nonzero standard deviation, then the probability distribution of Z approaches the normal distribution as N increases

  22. Sampling First, let’s define some key terms: Population: the entire group of items that interests us Sample: the part of this population that we actually observe Statistical inference involves using the sample to draw conclusions about the characteristics of the population from which the sample came

  23. Selection Bias • Any sample that differs systematically from the population that it is intended to represent is called a biased sample • One of the most common causes of biased samples is selection bias, which occurs when the selection of the sample systematically excludes or underrepresents certain groups • Selection bias often happens when we use a convenience sample consisting of data that are readily available • Self-selection bias can occur when we examine data for a group of people who have chosen to be in that group

  24. Survivor and Nonresponse Bias • A retrospective study looks at past data for a contemporaneously selected sample • for example, an examination of the lifetime medical records of 65-year-olds • A prospective study, in contrast, selects a sample and then tracks the members over time • By its very design, retrospective studies suffer from survivor bias: we necessarily exclude members of the past population who are no longer around! • Nonresponse bias: The systematic refusal of some groups to participate in an experiment or to respond to a poll

  25. The Power of Random Selection • In a simple random sample of size N from a given population: • each member of the population is equally likely to be included in the sample • every possible sample of size N from this population has an equal chance of being selected • How do we actually make random selections? • We would like a procedure that is equivalent to the following: • put the name of each member of the population on its own slip of paper • drop these slips into a box • mix thoroughly • pick members out randomly • In practice, random sampling is usually done through some sort of numerical identification combined with a computerized random selection of numbers

  26. Estimation First, some terminology: Parameter: a characteristic of the population whose value is unknown, but can be estimated Estimator: a sample statistic that will be used to estimate the value of the population parameter Estimate: the specific value of the estimator that is obtained in one particular sample Sampling variation: the notion that because samples are chosen randomly, the sample average will vary from sample to sample, sometimes being larger than the population mean and sometimes lower

  27. Sampling Distributions • The sampling distribution of a statistic is the probability distribution or density curve that describes the population of all possible values of this statistic • For example, it can be shown mathematically that if the individual observations are drawn from a normal distribution, then the sampling distribution for the sample mean is also normal • Even if the population does not have a normal distribution, the sampling distribution of the sample mean will approach a normal distribution as the sample size increases • It can be shown mathematically that the sampling distribution for the sample mean has the following mean and standard deviation: (17.5)

  28. The Mean of the Sampling Distribution A sample statistic is an unbiased estimator of a population parameter if the mean of the sampling distribution of this statistic is equal to the value of the population parameter Because the mean of the sampling distribution of X is μ, X is an unbiased estimator of μ

  29. The Standard Deviation of the Sampling Distribution • One way of gauging the accuracy of an estimator is with its standard deviation: • If an estimator has a large standard deviation, there is a substantial probability that an estimate will be far from its mean • If an estimator has a small standard deviation, there is a high probability that an estimate will be close to its mean

  30. The t-Distribution • When the mean of a sample from a normal distribution is standardized by subtracting the mean of its sampling distribution and dividing by the standard deviation of its sampling distribution, the resulting Z variable has a normal distribution • W.S. Gosset determined (in 1908) the sampling distribution of the variable that is created when the mean of a sample from a normal distribution is standardized by subtracting and dividing by its standard error (≡ the standard deviation of an estimator):

  31. The t-Distribution (cont.) • The exact distribution of t depends on the sample size, • as the sample size increases, we are increasingly confident of the accuracy of the estimated standard deviation • Table B-1 at the end of the textbook shows some probabilities for various t-distributions that are identified by the number of degrees of freedom: degrees of freedom = # observations - # estimated parameters

  32. Hypothesis Testing The hypothesis testing problem(for the mean): make a provisional decision, based on the evidence at hand, whether a null hypothesis is true, or instead that some alternative hypothesis is true. That is, test H0: E(Y) = Y,0 vs. H1: E(Y) > Y,0 (1-sided, >) H0: E(Y) = Y,0 vs. H1: E(Y) < Y,0 (1-sided, <) H0: E(Y) = Y,0 vs. H1: E(Y) Y,0 (2-sided)

  33. Some terminology for testing statistical hypotheses: p-value= probability of drawing a statistic (e.g. ) at least as adverse to the null as the value actually computed with your data, assuming that the null hypothesis is true. The significance level of a test is a pre-specified probability of incorrectly rejecting the null, when the null is true.

  34. What is the link between the p-value and the significance level?

  35. Confidence Intervals A 95% confidence intervalfor Y is an interval that contains the true value of Y in 95% of repeated samples. Digression: What is random here? The values of Y1,…,Yn and thus any functions of them – including the confidence interval. The confidence interval it will differ from one sample to the next. The population parameter, Y, is not random, we just don’t know it.

  36. Confidence Intervals • A confidence interval measures the reliability of a given statistic such as X • The general procedure for determining a confidence interval for a population mean can be summarized as: 1. Calculate the sample average X 2. Calculate the standard error of X by dividing the sample standard deviation s by the square root of the sample size N 3. Select a confidence level (such as 95 percent) and look in Table B-1 with N-1 degrees of freedom to determine the t-value that corresponds to this probability 4. A confidence interval for the population mean is then given by:

  37. Meanwhile, be careful about the interpretation of confidence intervals. If a 95% CI for the mean age of VIU students is 18.38 – 25.62, The correct interpretation is that, if the data collection were repeated (on students, the same way as in the original sample), and if we were to estimate µ, 95% of the time we would expect the estimated µ to be in the range 18.38 - 25.62.

  38. Sampling from Finite Populations • Notably, a confidence interval does not depend on the size of the population • This may first seem surprising: if we are trying to estimate a characteristic of a large population, then wouldn’t we also need a large sample? • The reason why the size of the population doesn’t matter is that the chances that the luck of the draw will yield a sample whose mean differs substantially from the population mean depends on the size of the sample and the chances of selecting items that are far from the population mean • That is, not on how many items there are in the population

  39. Key Terms • Selection, survivor, and   nonresponse bias • Sampling distribution • Population mean • Sample mean • Population standard deviation • Sample standard deviation • Degrees of freedom • Confidence interval Random variable Probability distribution Expected Value Mean Variance Standard deviation Standardized random   variable Population Sample

More Related