1 / 117

Probability Concepts in Statistics: Introduction and Applications

This refresher lecture provides an overview of probability concepts in statistics, including probability theory, data collection methods, and types of data. Learn how to describe data numerically and graphically, and understand the different types of variables. Explore the importance of samples in making estimates and inferences about the population. Discover the organization and description of qualitative and quantitative data, as well as the different distributions for discrete and continuous variables. Gain a deeper understanding of probability and its applications in statistical analysis.

scottandrew
Download Presentation

Probability Concepts in Statistics: Introduction and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TR 555 Statistics “Refresher”Lecture 1: Probability Concepts • References: • Penn State University, Dept. of Statistics • Statistical Education Resource Kit • a collection of resources used by faculty in Penn State's Department of Statistics in teaching introductory statistics courses.   • Page maintained by Laura J. Simon, Sept. 2003 • Statistics: Making Sense of Data (MIT) • William Stout, John Marden and Kenneth Travers • http://www.introductorystatistics.com/ Sept. 2003 • Tom Maze, stat course prepared for KDOT, 2003

  2. Outline • Overview of statistics • Types of data • Describing data numerically and graphically • Probability and random variables

  3. Probability and Statistics • Probably is the likelihood of an event occurring relative to all other events • Example: • If a coin is flipped, what is the probability of getting a heads • 0.5 • Given that the last flip was a heads what is the probability that the next will be heads • 0.5 • Statistics is the measurement and modeling of random variables • Example: • If our state averages 200 fatal crashes per year, what is the probability of having one crash today. Poisson distribution – k = average per time period. 200/365 = 0.55 • P(1 = x) = ((kt)x/x!)e-kt=(0.55*1)1/1!)e-0.55(1)= 0.32

  4. Data Collection • Designing experiments • Does aspirin help reduce the risk of heart attacks? • Observational studies • Polls - Clinton’s approval rating

  5. Relationship Y1 X1 Variable Types • Deterministic • Assume away variation and randomness • Known with certainty • One to one mapping of independent variable to dependent variable

  6. Less Likely Most Likely Less Likely Probability that it could be any of these values Variable Types Continued • Random or Stochastic • Recognized uncertainty of an event • One to one distribution mapping of independent variable to dependent variable

  7. Population The set of data (numerical or otherwise) corresponding to the entire collection of units about which information is sought

  8. Sample A subset of the population data that are actually collected in the course of a study.

  9. WHO CARES? In most studies, it is difficult to obtain information from the entire population. We rely on samples to make estimates or inferences related to the population.

  10. Organization and Description of Data • Qualitative vs. Quantitative data • Discrete vs. Continuous Data • Graphical Displays • Measures of Center • Measures of Variation

  11. Qualitative (Categorical) Data The raw (unsummarized) data are merely labels or categories Quantitative (Numerical) Data The raw (unsummarized) data are numerical

  12. Qualitative Data Examples • Class Standing (Fr, So, Ju, Sr) • Section # (1,2,3,4,5,6) • Automobile Make (Ford, Chevrolet, Nissan) • Questionnaire response (disagree, neutral, agree)

  13. Quantitative Data Examples (measures) • Voltage • Height • Weight • SAT Score • Number of students arriving late for class • Time to complete a task

  14. Discrete Data Continuous Data Only certain values are possible (there are gaps between the possible values) Theoretically, any value within an interval is possible with a fine enough measuring device

  15. Discrete Data Examples • Number of students late for class • Number of crimes reported to SC police • Number of times the word number is used (generally, discrete data are counts)

  16. Discrete Variable ModelPoisson Distribution (0.55*t)x/x!)e-0.55(t)

  17. Continuous Data Examples • Voltage • Height • Weight • Time to complete a homework assignment

  18. Continuous Variable ModelExponential Distribution Probability of first Fatal at time t = ke-tk

  19. Continuous Probability Function Cumulative Probability of Time Till First Fatal t = 1 - e-tk

  20. Nominal Data • A type of categorical data in which objects fall into unordered categories, for example: • Hair color • blonde, brown, red, black, etc. • Race • Caucasian, African-American, Asian, etc. • Smoking status • smoker, non-smoker

  21. Ordinal Data • A type of categorical data in which order is important. For example … • Class • fresh, sophomore, junior, senior, super senior • Degree of illness • none, mild, moderate, severe, …, going, going, gone • Opinion of students about riots • ticked off, neutral, happy

  22. Binary Data • A type of categorical data in which there are only two categories. • Binary data can either be nominal or ordinal, for example … • Smoking status • smoker, non-smoker • Attendance • present, absent • Class • lower classman, upper classman

  23. Interval and Ratio Data • Interval • Interval is important, but no meaningful zero • e.g, temperature in farenheit • Ratio • has a meaningful zero value • e.g., temperature in Kelvin, crash rate

  24. Who Cares? The type(s) of data collected in a study determine the type of statistical analysis used.

  25. Proportions • Categorical data are commonly summarized using “percentages” (or “proportions”). • 11% of students have a tattoo • 2%, 33%, 39%, and 26% of the students in class are, respectively, freshmen, sophomores, juniors, and seniors

  26. Averages • Measurement data are typically summarized using “averages” (or “means”). • Average number of siblings Fall 1998 Stat 250 students have is 1.9. • Average weight of male Fall 1998 Stat 250 students is 173 pounds. • Average weight of female Fall 1998 Stat 250 students is 138 pounds.

  27. Descriptive statistics Describing data with numbers: measures of location

  28. Mean • Another name for average. • If describing a population, denoted as , the greek letter “mu”. • If describing a sample, denoted as x, called “x-bar”. • Appropriate for describing measurement data. • Seriously affected by unusual values called “outliers”. _

  29. Calculating Sample Mean Formula: That is, add up all of the data points and divide by the number of data points. Data (# of classes skipped): 2 8 3 4 1 Sample Mean = (2+8+3+4+1)/5 = 3.6 Do not round! Mean need not be a whole number.

  30. Population Mean The mean of a random variable X is called the population mean and is denoted It is also called the expected value of X or the expectation of X and is denoted E(X).

  31. Median • Another name for 50th percentile. • Appropriate for describing measurement data. • “Robust to outliers,” that is, not affected much by unusual values.

  32. Calculating Sample Median Order data from smallest to largest. If odd number of data points, the median is the middle value. Data (# of classes skipped): 2 8 3 4 1 Ordered Data: 12 3 4 8 Median

  33. Calculating Sample Median Order data from smallest to largest. If even number of data points, the median is the average of the two middle values. Data (# of classes skipped): 2 8 3 4 1 8 Ordered Data: 12 3 4 8 8 Median = (3+4)/2 = 3.5

  34. Mode • The value that occurs most frequently. • One data set can have many modes. • Appropriate for all types of data, but most useful for categorical data or discrete data with only a few number of possible values.

  35. Most appropriate measure of location • Depends on whether or not data are “symmetric” or “skewed”. • Depends on whether or not data have one (“unimodal”) or more (“multimodal”) modes.

  36. Symmetric and Unimodal

  37. Symmetric and Bimodal

  38. Skewed Right

  39. Skewed Left

  40. Choosing Appropriate Measure of Location • If data are symmetric, the mean, median, and mode will be approximately the same. • If data are multimodal, report the mean, median and/or mode for each subgroup. • If data are skewed, report the median.

  41. Descriptive statistics Describing data with numbers: measures of variability

  42. Range • The difference between largest and smallest data point. • Highly affected by outliers. • Best for symmetric data with no outliers.

  43. Interquartile range • The difference between the “third quartile” (75th percentile) and the “first quartile” (25th percentile). So, the “middle-half” of the values. • IQR = Q3-Q1 • Robust to outliers or extreme observations. • Works well for skewed data.

  44. Variance 1. Find difference between each data point and mean. 2. Square the differences, and add them up. 3. Divide by one less than the number of data points.

  45. Variance • If measuring variance of population, denoted by 2 (“sigma-squared”). • If measuring variance of sample, denoted by s2 (“s-squared”). • Measures average squared deviation of data points from their mean. • Highly affected by outliers. Best for symmetric data. • Problem is units are squared.

  46. Population Variance The variance of a random variable X is called the population variance and is denoted

  47. Standard deviation • Sample standard deviation is square root of sample variance, and so is denoted by s. • Units are the original units. • Measures average deviation of data points from their mean. • Also, highly affected by outliers.

  48. Population Standard Deviation The population standard deviation is the square root of the population variance and is denoted

  49. What is the variance or standard deviation? (MPH)

  50. Variance or standard deviation Sex N Mean Median TrMean StDev SE Mean female 126 91.23 90.00 90.83 11.32 1.01 male 100 06.79 110.00 105.62 17.39 1.74 Minimum Maximum Q1 Q3 female 65.00 120.00 85.00 98.25 male 75.00 162.00 95.00 118.75 Females: s = 11.32 mph and s2 = 11.322 = 128.1 mph2 Males: s = 17.39 mph and s2 = 17.392 = 302.5 mph2

More Related