640 likes | 746 Views
Engr/Math/Physics 25. Chp7 Statistics-1. Bruce Mayer, PE Licensed Electrical & Mechanical Engineer BMayer@ChabotCollege.edu. Learning Goals. Use MATLAB to solve Problems in Statistics Probability Use Monte Carlo (random) Methods to Simulate Random processes
E N D
Engr/Math/Physics 25 Chp7Statistics-1 Bruce Mayer, PE Licensed Electrical & Mechanical EngineerBMayer@ChabotCollege.edu
Learning Goals • Use MATLAB to solve Problems in • Statistics • Probability • Use Monte Carlo (random) Methods to Simulate Random processes • Properly Apply Interpolation or Extrapolation to Estimate values between or outside of know data points
Histogram • Histograms are COLUMN Plots that show the Distribution of Data • Height Represents Data Frequency • Some General Characteristics • Used to represent continuous grouped, or BINNED, data • BIN SubRange within the Data • Usually Does not have any gaps between bars • Areas represent %-of-Total Data
HistoGram ≡ Frequency Chart • A HistoGram shows how OFTEN some event Occurs • Histograms areoften constructedusing FrequencyTables
MATLAB has 6 Forms of the Histogram Cmd The Simplest Histograms In MATLAB TmaxOAK = [70, 75, 63, 64, 65, 66, 65, 65, 67, 78, 75, 73, 79, 71, 72, 67, 69, 69, 70, 74, 71, 72, 71, 74, 77, 77, 86, 90, 90, 70, 71, 66, 66, 72, 68, 73, 72, 82, 91, 82, 76, 75, 72, 72, 69, 70, 68, 65, 67, 65, 63, 64, 72, 70, 68, 71, 77, 65, 63, 69, 69, 67] Hist(y) • Generates a Histogram with 10 bins • Example: Max Temp at Oakland AirPort in Jul-Aug08 • The Plot Statement hist(TmaxOAK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland Airport - Jul-Aug08')
hist Result for Oakland • It was COLD in Summer 08 • Bin Width = (91-63)/10 = 2.8 °F
Next Example: Max Temp at Stockton AirPort in Jul-Aug08 Histograms In MATLAB TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89, 94, 100, 99, 103, 103, 103, 97, 91, 83, 84, 90, 89, 95, 94, 99, 97, 94, 102, 103, 107, 98, 86, 89, 95, 91, 84, 93, 98, 104, 105, 107, 103, 91, 90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93, 97, 96, 97, 101, 92, 89, 92, 93, 94] Hist(y) • Generates a Histogram with 10 bins • The Plot Statement hist(TmaxSTK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title(‘Stockton Airport - Jul-Aug08')
hist Result for Stockton • It was HOT in Summer 08 • Bin Width = (107-81)/10 = 2.6 °F
Adjust The number and width of the bins using hist Command Refinements • Consider Summer 08 Max-Temp Data from Oakland and Stockton hist(y,N) hist(y,x) • Where • N an integer specifying the NUMBER of Bins • x A vector that Specs CENTERs of the Bins • Make 2 Histograms • 17 bins • 60F→110F by 2.5’s
hist Plots 17 Bins hist(TmaxOAK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - Jul-Aug08') >> hist(TmaxSTK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug08')>>
hist Plots Same Scale >> x = [60:2.5:110]; hist(TmaxOAK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - Jul-Aug08') >> x = [60:2.5:110]; >> hist(TmaxSTK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug08')
Hist can also provide numerical Data about the Histogram hist Numerical Output k = 2 5 1 10 16 7 9 2 7 3 • We can also spec the number and/or Width of Bins n = hist(y) >> k13 = hist(TmaxSTK,13) k13 = 2 2 4 4 6 10 10 7 5 2 6 2 2 >> k2_5s = hist(TmaxOAK,x) • Gives the number of values in each of the (default) 10 Bins • For the Stockton data
hist Numerical Output • Bin-Count and Bin-Locations (Frequency Table) for the Oakland Data >> [u, v] = hist(TmaxOAK,x) u = 0 3 11 7 159 6 4 1 2 1 0 3 0 0 0 0 0 0 0 0 v = 60.0000 62.5000 65.0000 67.5000 70.0000 72.5000 75.0000 77.5000 80.0000 82.5000 85.0000 87.5000 90.0000 92.5000 95.0000 97.5000 100.0000 102.5000 105.0000 107.5000 110.0000
Make Line-Plot of Temp Data for Stockton, CA Use the Tools Menu to find the Data Statistics Tool Data Statistics Tool - 1 Time for LIVE Demo
Use the Tool to Add Plot Lines for The Mean ±StdDev Data Statistics Tool - 2
Quite a Nice Tool, Actually The Result Data Statistics Tool - 3 • The Avg Max Temp Was 96.97 °F
Probability • Probability The LIKELYHOOD that a Specified OutCome Will be Realized • The “Odds” Run from 0% to 100% • Class Question: What are the Odds of winning the California MEGA-MILLIONS Lottery? Exactly! 175 711 536 : 1
175 711 536 ... EXACTLY???!!! • To Win the MegaMillions Lottery • Pick five numbers from 1 to 56 • Pick a MEGA number from 1 to 46 • The Odds for the 1st ping-pong Ball = 5 out of 56 • The Odds for the 2nd ping-pong Ball = 4 out of 55, and so On • The Odds for the MEGA are 1 out of 46
175 711 536 ... Calculated • Calc the OverAll Odds as the PRODUCT of each of the Individual OutComes • This is Technically a COMBINATION
175 711 536 ... is a DEAL! • The ORDER in Which the Ping-Pong Balls are Drawn Does NOT affect the Winning Odds • If we Had to Match the Pull-Order: • This is a PERMUTATION
Consider Data on the Height of a sample group of 20 year old Men Normal Distribution - 1 • We can Plot this Frequency Data using bar >> y_abs=[1,0,0,0,2,4,5,4,8,11,12,10,9,8,7,5,4,4,3,1,1,0,1]; >> xbins = [64:0.5:75]; >> bar(xbins, y_abs), ylabel('No.'), xlabel('Height (Inches'), title('Height of 20 Yr-Old Men')
We can also SCALE the Bar/Hist such that the AREA UNDER the CURVE equals 1.00, exactly Normal Distribution - 2 • The Game Plan for Scaling • Calc the Height of Each Bar To Get the Total Area = [Bin Width] x [Σ(individual counts)] • The individual Bar Area =[Bin Width] x [individual count] • %-Area any one bar → [Bar Areas]/[Total Area]
We can Use bar to Plot the Scaled-Area Hist. Normal Distribution - 3 >>y_abs=[1,0,0,0,2,4,5,4,8,11,12,10,9,8,7,5,4,4,3,1,1,0,1]; >> xbins = [64:0.5:75]; >> TotalArea = sum(0.5*y_abs) >> y_scale = 100*y_abs/TotalArea; >> bar(xbins, y_scale), ylabel('Fraction (%/inch)'), xlabel('Height (inches)'), title('Height of 20 Yr-Old Men')
This is a Good Time for a UNITS Check Remember, our GOAL → the Area Under the Curve = 1 Recall From the Plot the UNITS for the y-axis → %/inch (?) The Units come from these MATLAB Statements Normal Distribution - 4 TotalArea = sum(0.5*y_abs) Bin Width in INCHES • So TotalArea is in inches•No. • Now y_scale y_scale = 100*y_abs/TotalArea; • Cont. on Next Slide
The Units Analysis for y-scale Normal Distribution - 5 • Recall From MTH1 that for y = f(x) displayed in BAR Form the Area Under the Curve y_scale = 100*y_abs/TotalArea;
In this Case y(x) → y_scalein %/inch Δx → Bin Width = 0.5 in inches Then The Units Analysis for Our “integration” Normal Distribution - 6 • Check the integration Example
Normal Distribution - 7 • The 71” Bar Area = Hgt•Width: • Example 71” • Alternatively from the Absolute values • The Total Abs Area = 50 No.•inch
Because the Area Under the Scaled Plot is 1.00, exactly, The FRACTIONAL Area under any bar, or set-of-bars gives the probability that any randomly Selected 20 yr-old man will be that height e.g., from the Plot we Find 67.5 in → 8 %/in 68 in → 16 %/in 68.5 in → 22%/in Summing → 46 %/in Multiply the Uniform BinWidth of 0.5 in → 23% of 20 yr-old men are 67.25-68.75 inches tall Probability Distribution Fcn (PDF)
Random Variable • A random variable x takes on a defined set of values with different probabilities; e.g.. • If you roll a die, the outcome is random (not fixed) and there are 6 possible outcomes, each of which occur with equal probability of one-sixth. • If you poll people about their voting preferences, the percentage of the sample that responds “Yes on Proposition 101” is a also a random variable • the %-age will be slightly differently every time you poll. • Roughly, probability is how frequently we expect different outcomes to occur if we repeat the experiment over and over (“frequentist” view)
Random variables can be Discrete or Continuous • Discrete random variables have a countable number of outcomes • Examples: Dead/Alive, Red/Black, Heads/Tales, dice, counts, etc. • Continuous random variables have an infinite continuum of possible values. • Examples: blood pressure, weight, Air Temperature, the speed of a car, the real numbers from 1 to 6.
Probability Distribution Functions • A Probability Distribution Function (PDF) maps the possible values of x against their respective probabilities of occurrence, p(x) • p(x) is a number from 0 to 1.0, or alternatively, from 0% to 100%. • The area under a probability distribution function curve is always 1 (or 100%).
x p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 Discrete Example: Roll The Die 1/6 1 2 3 4 5 6
Continuous Case • The probability function that accompanies a continuous random variable is a continuous mathematical function that integrates to 1. • The Probabilities associated with continuous functions are just areas under a Region of the curve (→ Definite Integrals) • Probabilities are given for a range of values, rather than a particular value • e.g., the probability of getting a math SAT score between 700 and 800 is 2%).
Continuous Case PDF Example • Recall the negative exponential function (in probability, this is called an “exponential distribution”): • This Function Integrates to 1 zero to infinity as required for all PDF’s
1 2 Continuous Case PDF Example • The probability that x is any exact value (e.g.: 1.9976) is 0 • we can ONLY assign Probabilities to possible RANGES of x • For example, the probability of x falling within 1 to 2: p(x)=e-x 1 x p(x)=e-x NO Area Under a LINE 1 x
The Man-Height HistroGram had some Limited, and thus DISCRETE, Data If we were to Measure 10,000 (or more) young men we would obtain a HistoGram like this Gaussian Curve • As We increase the number and fineness of the measurements The PDF approaches a CONTINUOUS Curve
Gaussian Distribution • A Distribution that Describes Many Physical Processes is called the GAUSSIAN or NORMAL Distribution • Gaussian (Normal) distribution • Gaussian → famous “bell-shaped curve” • Describes IQ scores, how fast horses can run, the no. of Bees in a hive, wear profile on old stone stairs... • All these are cases where: • deviation from mean is equally probable in either direction • Variable is continuous (or large enough integer to look continuous)
Normal Distribution • Real-valued PDF: f(x) → −∞ < x < +∞ • 2 independent fitting parameters: µ , σ (central location and width) • Properties: • Symmetrical about Mode at µ , • Median = Mean = Mode, • Inflection points at ±σ • Area (probability of observing event) within: • ± 1σ = 0.683 • ± 2σ = 0.955 • For larger σ, bell shaped curve becomes wider and lower (since area =1 for any σ)
Normal Distribution • Mathematically • Where • σ2 = Variance • µ = Mean • The Area Under the Curve
68-95-99.7 Rule for Normal Dist 68% of the data σ σ 95% of the data 2σ 2σ 99.7% of the data 3σ 3σ
68-95-99.7 Rule in Math terms… • Using Definite-Integral Calculus
How Good is the Rule for Real? • Check some example data: • The mean, µ, of the weight of a large group of women Cross Country Runners = 127.8 lbs • The standard deviation (σ) for this Group = 15.5 lbs
112.3 143.3 68% of 120 = .68x120 = ~ 82 runners In fact, 79 runners fall within 1σ (15.5 lbs) of the mean 127.8
96.8 158.8 95% of 120 = .95 x 120 = ~ 114 runners In fact, 115 runners fall within 2σ of the mean 127.8
81.3 174.3 99.7% of 120 = .997 x 120 = 119.6 runners In fact, all 120 runners fall within 3σ of the mean 127.8
The Location & Width Parameters, µ & σ, are Calculated from the ENTIRE POPULATION Mean, µ Estimating µ & σ (1) • Standard Deviation, σ • For LARGE Populations it is usually impractical to measure all the xk • In this case we take a Finite SAMPLE to ESTIMATE µ & σ • Variance, σ2
Say we want to characterize Miles/Yr driven by Every Licensed Driver in the USA We assume that this is Normally Distributed, so we take a Sample of N = 1013 Drivers Estimating µ & σ (2) • We Take the Mean of the SAMPLE • Use the SAMPLE-Mean to Estimate the POPULATION-Mean
Now Calc the SAMPLE Variance & StdDev Estimating µ & σ (3) • Estimate • standard deviation: positive square root of the variance • small std dev: observations are clustered tightly around a central value • large std dev: observations are scattered widely about the mean • Number decreased from N to (N – 1) To Account for case where N = 1 • In this case x-bar = x1, and the S2 result is meaningless