560 likes | 842 Views
Wonderful World of. Statistics. A focus on Sampling and Sampling Methods. Menu. Statistics. Sampling Methods. Definitions. Measures of Centre. Assessment Tips. Measures of Spread. Practice Tasks. On Your Calculator.
E N D
Wonderful World of Statistics A focus on Sampling and Sampling Methods
Menu Statistics Sampling Methods Definitions Measures of Centre Assessment Tips Measures of Spread Practice Tasks On Your Calculator For clarification, click on any step you do not understand to see that element broken down The example used throughout this presentation is trying to find the mean height of WBHS pupils
Sampling Methods In this presentation you will see a number of sampling methods, their benefits and drawbacks. Simple Random Sample Cluster Sampling Systematic Sampling Stratified Sampling Note: For more detailed instructions on any of the example click on the step you misunderstand
Measures of Central Tendency In this presentation you will learn how to calculate a number of measures of average or centre, as well as their benefits and drawbacks Mean Median Mode Note: For more detailed instructions in any of the examples click on the step you misunderstand
Measures of Spread In this presentation you will learn how to find a number of measures of spread as well as their drawbacks and advantages. You will also need to decide which measure of spread and which measure of centre go together. Standard Deviation Interquartile Range Range Note: For more detailed instructions in any of the examples click on the step you misunderstand
Simple Random Sample The simplest unbiased sample. 1- Number the entire population. 2- Generate random numbers. 3- Proceed until you have as many as you need ignoring any repeats. • Example (Heights of WBHS students) • Get a copy of the School Roll. • Number every person • Generate Random numbers from 1 to the maximum you need. • Proceed until you have the desired sample size ignoring repeats.
Simple Random Sample Disadvantages May not represent strata Needs an entire population list Advantages Cheap Easy to carry out Unbiased
Cluster Sampling The easiest unbiased sample. • Sort your data into clusters based on location. • Randomly choose the cluster. • Perform a simple random sample on the chosen cluster. • Example (Heights of WBHS students) • Get a copy of the School Roll. • Sort into clusters eg year levels • Randomly select the cluster. • Randomly generate a sample from each cluster. • Care with clusters as Juniors are much shorter than Seniors
Cluster Sampling Disadvantages Needs an entire population list Can be biased if clusters strongly affect the statistics. Advantages Very Cheap Very Easy to carry out Unbiased
Systematic Sampling A relatively quick way to pick an unbiased sample • List the entire population. • Decide on your step size (Total ÷ Sample size = n). • Randomly generate a starting point. • Step every nth data point till you have your sample. • Example (Heights of WBHS students) • Get an alphabetical copy of the School Roll. • Step Size = Total ÷ Sample size • Randomly generate a starting point. • Starting from the beginning use the step size to pick the rest of the sample
Systematic Sampling Disadvantages Needs an entire population list If population list is ordered then sample can become biased Advantages Cheap Easy to Choose Sample Unbiased
Stratified Sampling The most reliable sampling method. • Sort the data into strata based on information you already know. • Calculate the proportions for each strata. • Perform a Simple Random Sample on each of the strata. • Example (Heights of WBHS students) • Get a copy of the School Roll separated into year levels. • Calculate the sample size for each year group (strata). • Perform a simple random sample on each year group to their specific sample size.
Stratified Sampling Disadvantages Needs entire population list Information about entire population needs to be known beforehand Time consuming Advantages Unbiased Completely representative of each of the strata Most reliable estimates
Generate a Random Number • Decide on the starting number (in this case 1) • Decide how many you need (In the case of the school 529 students) • Choose your calculator Casio FX-82 Casio Graphic Texas
Random Number on a Casio Graphics Calculator • Decide on the starting number (in this case 1) • Decide how many you need (In the case of the school 529 students) • In Run Mode Intg OPTN – F6 – F4 – F5 Ran# OPTN – F6 – F3 – F4 On Screen Intg(529 × Ran# + 1) Intg(529 × Ran# + 1) F3 F4 F6 OPTN ) ( 8 7 5 × + 1 Population size or Strata size Starting Value
Random Number on a Casio FX - 82 • Decide on the starting number (in this case 1) • Decide how many you need (In the case of the school 529 students) • Ran# = 2nd function · • On screen Ran# × 529 + 1 = note Ignore any decimal in the answer RAN#×529+1 shift Starting value Population size or strata size ·
Random Number on a Texas • Decide on the starting number (in this case 1) • Decide how many you need (In the case of the school 529 students) RANDI PRB → RANDI , 2nd Function ) 3. On Screen RANDI(1 , 529) RANDI(1,529) 2nd PRB ) Starting Value End Value
Simple Random Sample The simplest unbiased sample. • Number the entire population. • Generate random numbers. • Proceed until you have as many as you need ignoring any repeats. • Example (Heights of WBHS students) • Get a copy of the School Roll. • Number every person from 1 (to 529) • Generate Random numbers from 1 to the maximum you need (529). • Proceed until you have the desired sample size ignoring repeats.
Strata Proportions • Number of people in strata divided by total in population. • Multiplied by number of people wanted in total sample. • Example (Heights of WBHS students) • 529 people on School Roll. • 115 year 10’s • Sample size of 30 • So year 10 sample size • 115 ÷ 529 × 30 = 6.52 • So take 7 year 10 students
Systematic Step Sizes Number of people in population divided by Sample Size • Example (Heights of WBHS students) • 529 people on School Roll. • Sample size of 30 • So Step size • 529 ÷ 30 = 17.63333 • So take every 17th student from the starting position
Systematic Stepping Starting at the random start point step out till you get desired sample size. • Example (Heights of WBHS students) • Random starting point 803, step size 29 • 803rd student on alphabetical list is where we start. • Then 832nd student, 861st student, we have now reached the end of the roll so start at the beginning 890= 15th student then 45th student…
Mean • Add up all of the values in the sample. • Divide by the sample size. Calculator Method Advantages Easy to calculate for large samples. Accurate and well understood Disadvantages Affected by outliers
List all the values in order. Find the central value Median Advantages Accurate Not affected much by Outliers Disadvantages Not so widely known as an average Time consuming to list large sample in order
List all the values Find the most common item Mode Advantages Can calculate mode for data that is not numeric or ordered Not affected much by Outliers Very easy to calculate Disadvantages Can be inaccurate for numeric or data that can be ordered
Statistics on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas
Statistics on a Casio Graphics Calculator In Stat Mode In list 1 enter all data values In list 2 enter their frequencies F2 (CALC) F6 (SET) Should read Exit F1 (1VAR) (All Statistics are listed χ is mean, χσn is std. dev.) 1Var XList :List1 1Var Freq :List2 2Var XList :List3 2Var YList :List4 2Var Freq :List5 F2 F1 F6 EXIT S.D. using table
Entering Data on Casio Graphics Calculator Enter each data value in List 1 followed by EXE List 1 List 2 List 3 List4 1 2 3 4 5 Enter the frequency of each data value in List 2 followed by EXE Note If all of the frequencies are 1 then you don’t need to enter the frequencies. In the Set Menu change the 1Var Freq to 1 instead of list 2 EXE
Statistics on a Casio FX 82 Calculator • Put your calculator into statistics mode • Mode 2 • Clear the statistics memory • Shift Mode 1 • Enter the data carefully • 180cm M+ • Calculate desired statistics • Shift 2 • χ mean • χσn standard deviation Scl mode clr all 1 2 3 mode shift Shown on Screen M+ S.D. using table
Entering Data on Casio FX 82 Calculator n = 1 Enter each data value followed by M+ ‘n’ is the number of data values that you have entered M+ Note Be very careful entering the data values as you cannot review them later to make sure that they are correct.
Statistics on a Texas Calculator • Put your calculator into statistics mode • 2nd Function DATA • 1 - VAR • Enter the data carefully • DATA • Calculate desired statistics • STATVAR • Shift between statistics with arrow keys • n number of data values • χ mean • σχ standard deviation n x Sx σx 2nd DATA STATVAR S.D. using table
Entering Data on a Texas Calculator Press the Data Key to begin X1 = 180 Begin entering data. X1 is the data value Followed by the down arrow 2nd DATA Freq1 is that data values frequency Followed by the down arrow X2 is next then Freq2 To check data use up arrow
Definitions • Population The entire list of those people or things that you wish to sample • Census A survey of an entire population • Sample A small group of a population • Parameters Facts about an entire population gained from a census (Notation: mean ‘μ’ or standard deviation ‘σ’) • Statistics Estimates of population parameters calculated from a sample (Notation: mean ‘χ’ or standard deviation ‘s’) • Representative A sample that appears to represent all elements of the in the correct proportions population • Bias A sampling method that does not give every element of the population an equal chance of selection
This is a calculation of the average difference between the data values and the mean. This measure of spread applies to the mean. Standard Deviation Use Calculator to Calculate Use table to calculate Advantages Easy to calculate for large samples on calculator. Accurate Very useful for certain types of data Disadvantages Affected by outliers Possibly not so well understood
Calculate the upper and lower quartiles. Upper quartile minus lower quartile. This measure of spread applies to the median Interquartile Range Advantages Well understood Unaffected by outliers Disadvantages Easy to calculate for large samples.
Find the highest and lowest value. Highest value minus the lowest value. This measure of spread applies to all measures of centre. Range Advantages Well understood Unaffected by outliers Disadvantages Easy to calculate for large samples.
Standard Deviation by Table Mean Calculated as usual, doesn’t change Data values minus the Mean Data Values From your sample or census χχχ – χ (χ – χ)2 180 165 15 225 150 165 -15 225 165 165 0 0 170 165 5 25 160 165 -5 25 Square of each of the values to the left Final Standard Deviation is the square root of this value so s = 10 Total 825 0 500 Mean 165 100 Use Calculator to Calculate
Calculating Quartiles List all the values in order. Find the central value Discard that central value Find the central value of the remaining two halves. These 2 numbers are the upper and lower quartiles • Example (Heights of WBHS students) • Data Values • 165, 170, 173, 180, 182, 183, 191, 192 • 2. Central value middle of 180 and 182 • so median is 181 • 3. Discard 181 and calculate middle of each half. • 165, 170, 173, 180//182, 183, 191, 192 • Lower quartile Upper quartile • 171 187
Things to Consider • Is my sample representative of the population? • Need to consider whether any strata present in the data are represented in approximately the correct proportions. • Need to consider the presence of any apparent outliers in the sample chosen, and the effect they will have on estimates of population parameters.
Things to Consider • Is my sample representative of the population? • Estimates are more reliable when taken from a large sample as the effects of outliers are lessened. • Consider the size of the s.d. • A larger value of s suggests considerable variation in the data values. Thus taking another sample could produce quite different statistics. • Ask yourself, “If I were to repeat this sampling process, would I get the same results?”
Things to Consider • How could I improve my sampling method? • Need to choose a sampling method which eliminates bias, and which gives the best chance of choosing a representative sample. (Bias exists when some of the population members have greater or lesser chance of being included in the sample.) • Need to discuss which statistics would give the best estimates of population parameters, including the effect of outliers.
Things to Consider • Would I get the same or similar results if I repeated the same process? • Are there outliers or extreme values that may affect the sample statistics? If so then I probably wouldn’t get similar results. • Is the standard deviation (or measure of spread) large when compared to the mean, if it is then repeating the same results is unlikely.
Things to Consider • When answering question or stating conclusions; • Answers need to be precise and refer to actual data values present in the sample and/or population. • Strata must be clearly defined. • Answers cannot be vague or rote-learnt without referring specifically to the context of the assessment. • Students must be very clear that the sample statistics are ESTIMATES of the population parameters. • They must NOT state that the population mean is … unless they have taken a census of the whole population!
Practice Tasks Real Estate Stats
On Your Calculator In this part of the presentation you can check on exactly how to use your calculator effectively to help with Statistics Generating Random Numbers Entering Data Calculating Statistics Note: For more detailed instructions on any of the example click on the step you misunderstand
Entering Data on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas
Statistics on a Calculator Choose your calculator Casio FX-82 Casio Graphic Texas
Statistics on a Casio Graphics Calculator In Stat Mode In list 1 enter all data values In list 2 enter their frequencies F2 (CALC) F6 (SET) Should read Exit F1 (1VAR) (All Statistics are listed χ is mean, χσn is std. dev.) 1Var XList :List1 1Var Freq :List2 2Var XList :List3 2Var YList :List4 2Var Freq :List5 F2 F1 F6 EXIT S.D. using table
Entering Data on Casio Graphics Calculator Enter each data value in List 1 followed by EXE List 1 List 2 List 3 List4 1 2 3 4 5 Enter the frequency of each data value in List 2 followed by EXE Note If all of the frequencies are 1 then you don’t need to enter the frequencies. In the Set Menu change the 1Var Freq to 1 instead of list 2 EXE
Statistics on a Casio FX 82 Calculator • Put your calculator into statistics mode • Mode 2 • Clear the statistics memory • Shift Mode 1 • Enter the data carefully • 180cm M+ • Calculate desired statistics • Shift 2 • χ mean • χσn standard deviation Scl mode clr all 1 2 3 mode shift Shown on Screen M+ S.D. using table
Entering Data on Casio FX 82 Calculator n = 1 Enter each data value followed by M+ ‘n’ is the number of data values that you have entered M+ Note Be very careful entering the data values as you cannot review them later to make sure that they are correct.