1 / 39

Chapter 1. the Nature of Probability and Statistics

Chapter 1. the Nature of Probability and Statistics. Zheng Chen @SUNO. 1. What is Probability?. Probability is the chance that something is happen

Download Presentation

Chapter 1. the Nature of Probability and Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1. the Nature of Probability and Statistics Zheng Chen @SUNO 1

  2. What is Probability? Probability is the chance that something is happen Example: There are 5 color balls in a bag, 3 of them are red and 2 of them are blue. If we arbitrarily pick 1 ball out, how is the chance to pick the red one? 2

  3. What is Statistics? Statistics is the science of conducting studies to do the following things: • collect data; • organize data; • summarize data; • analyze data; • draw some conclusions from data, such as prediction. 3

  4. Statistics Examples • Nearly one in seven US families are struggling with bills from medical expenses even though they have medical insurance. • The average credit card debt per household in 2003 was $9250. • About 15% men in US are left handed and 9% women in US are left handed. • Average credit card debt among indebted young adults increased by 55 % between 1992 and 2001, to $4,088. • 41% of college students have a credit card. Of the students with cards, about 65% pay their bills in full every month, which is higher than the general adult population. 4

  5. Data and Variable • Data are values studied in the statistics. • Data are either measurement or observations.Measurement data are from measurement, such average height of US boys at age 18. Observations data are from observations, such as how many flights were delayed in 2008 Christmas holiday in New York LGA Airport? • A collection of data values is called data set, each value in a data set is called data value or datum. • Variableis a characteristic or attribute that can assume different values from a datum 5

  6. Descriptive Statistics Descriptive Statistics consists of • Collecting data • Organize data, • Summarize data • Presentation of data Simply say, Descriptive Statistics means to make explanation to the collect to data. 6

  7. Inferential Statistics Inferential Statistics means to derive some conclusions from the given data, such as prediction or forecasting. One important area of Inferential Statistics is hypothesis testing, which is a kind of decision making. For example, a car manufacture tries to introduce a new car model to the market. Before they However, we must know whether this new model has market demand value? 7

  8. Classify the following as Descriptive or Inferential Statistics • In September 2008, ABC News predict that Barack Obama would win 2008 US president election. • Nine of ten on-the-job fatality are men. • The median household income at Detroit 2008 is $28,000. • About 85% of all lung cancers are in people who smoke or who have smoked. • Smoking cigarette can increase the chance of getting lung cancer by 400%. • The US national average annual medicine expenditure per person is $1052. • In 2009 Fall, the SUNO students enrolment will reach 4000 up. 8

  9. Data Types • Qualitative Data: Each Qualitative data can be placed into distinct categories according to some characteristic or attribute. Male and Female,Banana, Apple and Orange,English, French and Spanish,Names of students. • Quantitative Data: Each Quantitative data assumes values that can be counted. SUNO Students’ enrolment in Spring semesters in 2010, 2011; Income of US families;Temperature of the regions 9

  10. Questions • Are Zip codes Qualitative or Quantitative data? • Are Grades A, B, C, D, and F Qualitative or Quantitative? 10

  11. Answers • Zip codes are Qualitative or Quantitative?We know that zip code use digital numbers. However, what is the difference between zip codes 77024 and 70001? Does inequality 77024>70001 mean something?Conclusion : Zip code is Qualitative data • Grades A, B, C, D, F are Qualitative or Quantitative?We know that grades A, B, C, D, F can be compared or ranked. A is better than B, B is better than C, C is better than D, D is better than F. Although that A, B, C, D, F are not numbers, but they are quantitative data. 11

  12. Classify the following as Qualitative or Quantitative data? • Number of cars sold in 2008 at New Orleans. • Colors of cars used by SUNO students and faculties. • Total minutes for driving from New Orleans to Baton Rouge • Classification (infant, toddler, preschool) of children in a day-care center. • The total weight of crow-fish caught by Louisiana fish men in 2008. • The marital status of SUNO students • The original nationalities of SUNO faculties. • Multiple choice test answer, (A, B, C, D, E ) • The survey choices (agree, not agree, neutral, no response) for the survey about merging SONU with University of New Orleans. 12

  13. Homework Are Television Channel numbers Qualitative or Quantitative data? Explain your reasons too. 13

  14. Discrete or Continuous Data Quantitative data can distinguished into Discrete data and continuous data. Discrete data usually use integers, such asHow many students are registered in 2009 Spring? How many days is needed to finish the given task? Continuous data usually use decimals, such as Students height, Temperature in the region.US familiesincome. 14

  15. Classify the following as Discrete or Continuous Data? • Number of pizza sold each day by a Pizza Hut. • Water Temperatures in all Lakes near by New Orleans. • Weight of SUNO students. • Foot ball scores of LSU football team in this regular season. • Lifetime (in hours) of 12 flash light batteries. • Number of sandwiches sold in a MacDonald near SUNO every day. • Capacity (in gallons) of a reservoirs in Jefferson County. • GPA of students in a larger university 15

  16. Boundary of Continuous Data Continuous data usually use decimals. However, we cannot use finitely long decimals and we have to round up the recorded data at some positions. If only use one decimal position, then 17.23 must be rounded to 17.2 and 17.26 must be round up to 17.3. Therefore, the recorded weight 86 pound means from 85.5 (not including 86.5) pound to 86.6 pound. We can write as 85.5 <weight≤86.5 or simply 85.5-86.5 pound, Here 85.5-86.5 is call the boundary of the data weight. 16

  17. Give the boundary of each data • 42,8 miles, • 1.6 millimeter, • 5.36 gallons, • 15 tons, • 93.8 ounce, • 40 inches. 17

  18. Measurement Scales ----4 types of scales: nominal , ordinal, interval and ratio • Nominal level measurement, it is another word for qualitative data. It is exclusive, no ranking and no ordering among nominal data. For example, SUNO instructors can be categorized as math instructors, bio instructors, chemistry instructors and etc. • Ordinal level measurement. The ordinal level data can be ranked, but the precise differences between the ranks do not exit. For example, final grades in math250 are A, B, C, D and F; there maybe a variation among A. Another example, a sperker can be ranked as superior, average or poor. 18

  19. Interval level measurement, which ranks data and precision differences between units of measure do exist; however, there is no meaningful zero. For example: Temperature in New Orleans in each month of 2008. It is obviously, we can compare temperature 72F and 73F, The difference 1F has general meaning. However, there is no true zero in Interval measurement. 0F does not mean there is no temperature, just another temperature. • Ratio level measurement which has properties of interval measurement, besides, the ratio exists between tow different members of the population. For example, the salary, height, Age are all ratio level data. One person’s salary is the double of the salary of another person makes sense. 19

  20. Examples of Measurement Scales 20

  21. Classify the following data levels • Pages in New Orleans telephone yellow book, • Ranking of would golf players, • Weight of GM refrigerator, • Band Width of Internet speed in kbps (1000 bits per second), • BTU specific number of an air conditioner. (one BTU means the heat amount stored in one pound water for 1 degree Fahrenheit. ) • Age of students in a classroom, • Temperatures of 8 refrigerators. 21

  22. Population and Sample • A Population is the whole collection of the objects studied in a statistics research. For example: Nearly one in seven US families are struggling with bills from medical expenses even though they have medical insurance. In this example, the population is all US families • A Sample is a partial collection selected from a Population. In the above example, arbitrarily select 100 families from New Orleans will be a sample. 22

  23. What is random number? Random number is a number that only relies on the chance. For example, we write numbers 1 to100 on the spheres of 100 ping pong balls. Then put those 100 ping pong balls in a box. If we arbitrarily pick one ball out from the box, which number could be on the ball? The fact is that any number from 1 to 100 is possible and every number has the same chance. The out put number is an random integer from 1 to100. 23

  24. How to select random object? A way to choose randomly a person from SUNO: assign each person a number at SUNO, different people have different numbers, then choose one randomly from these numbers, which computer can do it or we can do it by hands. 24

  25. Generate random number tables Many internet web sites have tool to generate random number tables, such as http://stattrek.com/Tables/Random.aspx 52 58 41 90 19 05 74 39 44 83 38 47 74 67 34 60 09 09 61 78 02 56 94 54 100 34 84 35 56 45 80 48 91 60 81 92 85 07 40 68 83 86 04 49 39 96 100 62 74 11 27 07 68 58 59 56 66 37 50 64 96 47 46 77 57 63 47 95 50 10 06 71 49 88 70 52 79 99 65 65 40 40 93 09 33 87 25 85 32 66 89 66 62 50 85 79 23 92 12 24 16 39 71 73 15 17 35 81 44 01 31 94 06 43 59 38 99 90 91 88 97 69 55 69 27 79 51 82 89 94 78 26 82 41 11 02 80 20 01 84 10 30 97 96 72 45 98 41 65 92 57 90 63 71 21 98 93 81 17 11 55 97 44 55 48 70 76 05 46 49 67 86 76 33 63 99 38 75 64 43 04 22 23 19 29 100 87 01 59 84 82 13 21 26 10 58 87 73 42 07 12 52 06 15 42 62 02 28 03 77 29 72 70 24 88 22 95 03 53 03 25 13 48 43 86 25

  26. Use Excel to make random number table The excel random function is rand(), which is decimal number from 0.0 to 1.0. So we can use 100*rand() to get decimal number from 0.0 to 100.0. However, we only need integer numbers, we use function int(100*rand()) to get integer numbers from 0 to 100. Than use module method to remove 100 by function and get random numbers from 0 to 99 Mod(int(100*rand() , 100)If we want get random numbers from 1 to 100, use function Mod(int(100*rand()) , 100) +1 int(99*rand() +1Now we can make random numbers table 1 to 85. 26

  27. Homework 1. Use Excel to make a random number table with numbers from 1 to 85 in 10 rows and 200 columns. 2. Use this table to get 20 random numbers that from 1 to 37. 27

  28. Select sample by random method Nearly one in seven US families are struggling with bills from medical expenses even though they have medical insurance. Suppose we arbitrarily select 100 families from New Orleans region, Is this a good sample for the US families population. Certainly not. The medical expenses situation of Virginia State is very different from New Orleans region. In this example in order to get a good representative sample for the statistics research, we must use a random method to choose some families. We can design to pick 100 families from each state randomly. Or we can design to pick families randomly with total numbers that is proportional to the population of each state. This method is called random method. Such sample is called the random sample 28

  29. Select sample by Systematic method If all population objects are already arranged in a random order, then we can randomly pick the first object, the pick the next object by gap of some integer k. For example, in a product assembly line, after choose the first item randomly, we than can choose every tenth item after. Notice: The gap number should not be too small for many big population. If we use telephone book and choose numbers by the systematic method with a small gap number, the all telephone numbers chosen from book could be located in the same region. Also the gap number is better to use prime numbers. For example if a population has about 100 objects and we need to choose 20 from it. The we can use gap number 37. 29

  30. Select sample by Stratified method Some time we can divide whole population into exclusive groups. Then pick up sample from each group. For example, in the 2008 president election poll, we can use age, race, education level, and gender to divide all possible voters into several groups. Then pick up sample from each group and make surveys. . 30

  31. Select sample by Cluster method Divide whole population into several groups, called clusters, by some criteria such as geographic region or the departments. Then randomly pick up one or several clusters as the sample to do the survey. For example, we want know the medical bill situation of the families in New Orleans area. We can consider each parish district is a cluster. Then we may randomly pick up some clusters (parish districts) as the sample and do the survey to every family in the clusters. Notice the difference: Cluster method is to pick several groups and Stratified method is to pick partial from each group.

  32. Convenient sampling methods In addition to the above basic sampling methods, another important sampling method is the convenient sampling. For example, we want to survey that presidential voting preference for the local residents. We can stay in a mall and interview one customer coming to the mall in every 30 minutes. This way is called convenient sampling methods. Survey on the street is also belong to Convenient sampling. 32

  33. Classify the sampling method • In a large school, all faculties in two building were interviewed for questions about students homework performance. • Every 10th customer entering the mall were asked to name his/her favorite color. • The winning lottery numbers of last week. • Every 100th hamburger manufactured was checked by FDA officer. • Mail carriers of a large city are divided into four groups according to the gender (Male or Female) and according to whether they walk or ride on their route. Then 10 are selected from each group to determine whether they have been bitten by a dog in the last year. • Interview persons on the French Quarter street. 33

  34. "We have polled the entire population, Your Majesty, and we have come out with the exactly results you ordered!" 34

  35. Homework Select 37 telephone numbers of SUNO 1) by random sampling method 2) by systematic sampling method Explain the details about the selecting process 35

  36. Observation Statistic Study In an Observation Statistic Study, the researcher merely observes what is happening and what has been happened in the past and tries to draw conclusions based on those observation results. For example, we can collect the SUNO students Math250 grades this year and a year before Katrina to make comparison. 36

  37. Experimental Statistic Study In an Experimental Statistic Study, the researcher tries to manipulate one or more data values to see whether the manipulations can influence the outputs. For example, a farmer wants to find the relation between the crop production and the using of two different fertilizer. So he designs experiment on two plots of land, plant same crop at the same time with same amount of fertilizer of each brand. 37

  38. Identify the following as Observation or Experimental • Subjects were randomly assigned to two groups, and one group given an herb and other group a placebo, then compare the influence after 6 months. • A researcher stood a busy intersection to record the color of the vehicle that runs red light. • A researcher finds that people who are more hostile have higher total cholesterol levels than those who are less hostile. • Subjects are randomly divided into four groups. Each group is placed on one of four special diets. After 6 months, the blood pressures of the groups are compared to see if diet has any effect on blood pressure. 38

  39. Suspect Samples • If the statistics of a sample is far away from the statistics of the population, then this sample is called suspect samples • If the sample’s size is to small, then this sample may be suspect. If we want to know the average GPA of the SUNO students. Right now we have about 3000 SUNO students. If we only pick randomly pick 10 students, we may get too higher or too lower GPA. If randomly pick 50 students, then the results would be better. • When we do mail survey, the sample also could be suspect. This is because, many persons may not response at all. Only those has strong favored or strong against like to make response. • Convenient sample could be suspect too. If a teacher only do survey only in his/her teaching class. The result may not represent whole school students.

More Related