Bruce Mayer, PE Licensed Electrical & Mechanical Engineer BMayer@ChabotCollege

Engr/Math/Physics 25 Chp7Statistics-1 Bruce Mayer, PE Licensed Electrical & Mechanical EngineerBMayer@ChabotCollege.edu

Learning Goals • Use MATLAB to solve Problems in • Statistics • Probability • Use Monte Carlo (random) Methods to Simulate Random processes • Properly Apply Interpolation or Extrapolation to Estimate values between or outside of know data points

Histogram • Histograms are COLUMN Plots that show the Distribution of Data • Height Represents Data Frequency • Some General Characteristics • Used to represent continuous grouped, or BINNED, data • BIN  SubRange within the Data • Usually Does not have any gaps between bars • Areas represent %-of-Total Data

HistoGram ≡ Frequency Chart • A HistoGram shows how OFTEN some event Occurs • Histograms areoften constructedusing FrequencyTables

MATLAB has 6 Forms of the Histogram Cmd The Simplest Histograms In MATLAB TmaxOAK = [70, 75, 63, 64, 65, 66, 65, 65, 67, 78, 75, 73, 79, 71, 72, 67, 69, 69, 70, 74, 71, 72, 71, 74, 77, 77, 86, 90, 90, 70, 71, 66, 66, 72, 68, 73, 72, 82, 91, 82, 76, 75, 72, 72, 69, 70, 68, 65, 67, 65, 63, 64, 72, 70, 68, 71, 77, 65, 63, 69, 69, 67] Hist(y) • Generates a Histogram with 10 bins • Example: Max Temp at Oakland AirPort in Jul-Aug08 • The Plot Statement hist(TmaxOAK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland Airport - Jul-Aug08')

hist Result for Oakland • It was COLD in Summer 08 • Bin Width = (91-63)/10 = 2.8 °F

Next Example: Max Temp at Stockton AirPort in Jul-Aug08 Histograms In MATLAB TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89, 94, 100, 99, 103, 103, 103, 97, 91, 83, 84, 90, 89, 95, 94, 99, 97, 94, 102, 103, 107, 98, 86, 89, 95, 91, 84, 93, 98, 104, 105, 107, 103, 91, 90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93, 97, 96, 97, 101, 92, 89, 92, 93, 94] Hist(y) • Generates a Histogram with 10 bins • The Plot Statement hist(TmaxSTK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title(‘Stockton Airport - Jul-Aug08')

hist Result for Stockton • It was HOT in Summer 08 • Bin Width = (107-81)/10 = 2.6 °F

Adjust The number and width of the bins using hist Command Refinements • Consider Summer 08 Max-Temp Data from Oakland and Stockton hist(y,N) hist(y,x) • Where • N  an integer specifying the NUMBER of Bins • x  A vector that Specs CENTERs of the Bins • Make 2 Histograms • 17 bins • 60F→110F by 2.5’s

hist Plots  17 Bins hist(TmaxOAK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - Jul-Aug08') >> hist(TmaxSTK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug08')>>

hist Plots  Same Scale >> x = [60:2.5:110]; hist(TmaxOAK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - Jul-Aug08') >> x = [60:2.5:110]; >> hist(TmaxSTK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug08')

Hist can also provide numerical Data about the Histogram hist Numerical Output k = 2 5 1 10 16 7 9 2 7 3 • We can also spec the number and/or Width of Bins n = hist(y) >> k13 = hist(TmaxSTK,13) k13 = 2 2 4 4 6 10 10 7 5 2 6 2 2 >> k2_5s = hist(TmaxOAK,x) • Gives the number of values in each of the (default) 10 Bins • For the Stockton data

hist Numerical Output • Bin-Count and Bin-Locations (Frequency Table) for the Oakland Data >> [u, v] = hist(TmaxOAK,x) u = 0 3 11 7 159 6 4 1 2 1 0 3 0 0 0 0 0 0 0 0 v = 60.0000 62.5000 65.0000 67.5000 70.0000 72.5000 75.0000 77.5000 80.0000 82.5000 85.0000 87.5000 90.0000 92.5000 95.0000 97.5000 100.0000 102.5000 105.0000 107.5000 110.0000

Histogram Commands - 1

Histogram Commands - 2

Make Line-Plot of Temp Data for Stockton, CA Use the Tools Menu to find the Data Statistics Tool Data Statistics Tool - 1 Time for LIVE Demo

Use the Tool to Add Plot Lines for The Mean ±StdDev Data Statistics Tool - 2

Quite a Nice Tool, Actually The Result Data Statistics Tool - 3 • The Avg Max Temp Was 96.97 °F

Probability • Probability  The LIKELYHOOD that a Specified OutCome Will be Realized • The “Odds” Run from 0% to 100% • Class Question: What are the Odds of winning the California MEGA-MILLIONS Lottery? Exactly! 175 711 536 : 1

175 711 536 ... EXACTLY???!!! • To Win the MegaMillions Lottery • Pick five numbers from 1 to 56 • Pick a MEGA number from 1 to 46 • The Odds for the 1st ping-pong Ball = 5 out of 56 • The Odds for the 2nd ping-pong Ball = 4 out of 55, and so On • The Odds for the MEGA are 1 out of 46

175 711 536 ... Calculated • Calc the OverAll Odds as the PRODUCT of each of the Individual OutComes • This is Technically a COMBINATION

175 711 536 ... is a DEAL! • The ORDER in Which the Ping-Pong Balls are Drawn Does NOT affect the Winning Odds • If we Had to Match the Pull-Order: • This is a PERMUTATION

Consider Data on the Height of a sample group of 20 year old Men Normal Distribution - 1 • We can Plot this Frequency Data using bar >> y_abs=[1,0,0,0,2,4,5,4,8,11,12,10,9,8,7,5,4,4,3,1,1,0,1]; >> xbins = [64:0.5:75]; >> bar(xbins, y_abs), ylabel('No.'), xlabel('Height (Inches'), title('Height of 20 Yr-Old Men')

We can also SCALE the Bar/Hist such that the AREA UNDER the CURVE equals 1.00, exactly Normal Distribution - 2 • The Game Plan for Scaling • Calc the Height of Each Bar To Get the Total Area = [Bin Width] x [Σ(individual counts)] • The individual Bar Area =[Bin Width] x [individual count] • %-Area any one bar → [Bar Areas]/[Total Area]

We can Use bar to Plot the Scaled-Area Hist. Normal Distribution - 3 >>y_abs=[1,0,0,0,2,4,5,4,8,11,12,10,9,8,7,5,4,4,3,1,1,0,1]; >> xbins = [64:0.5:75]; >> TotalArea = sum(0.5*y_abs) >> y_scale = 100*y_abs/TotalArea; >> bar(xbins, y_scale), ylabel('Fraction (%/inch)'), xlabel('Height (inches)'), title('Height of 20 Yr-Old Men')

This is a Good Time for a UNITS Check Remember, our GOAL → the Area Under the Curve = 1 Recall From the Plot the UNITS for the y-axis → %/inch (?) The Units come from these MATLAB Statements Normal Distribution - 4 TotalArea = sum(0.5*y_abs) Bin Width in INCHES • So TotalArea is in inches•No. • Now y_scale y_scale = 100*y_abs/TotalArea; • Cont. on Next Slide

The Units Analysis for y-scale Normal Distribution - 5 • Recall From MTH1 that for y = f(x) displayed in BAR Form the Area Under the Curve y_scale = 100*y_abs/TotalArea;

In this Case y(x) → y_scalein %/inch Δx → Bin Width = 0.5 in inches Then The Units Analysis for Our “integration” Normal Distribution - 6 • Check the integration Example

Normal Distribution - 7 • The 71” Bar Area = Hgt•Width: • Example  71” • Alternatively from the Absolute values • The Total Abs Area = 50 No.•inch

Because the Area Under the Scaled Plot is 1.00, exactly, The FRACTIONAL Area under any bar, or set-of-bars gives the probability that any randomly Selected 20 yr-old man will be that height e.g., from the Plot we Find 67.5 in → 8 %/in 68 in → 16 %/in 68.5 in → 22%/in Summing → 46 %/in Multiply the Uniform BinWidth of 0.5 in → 23% of 20 yr-old men are 67.25-68.75 inches tall Probability Distribution Fcn (PDF)

Random Variable • A random variable x takes on a defined set of values with different probabilities; e.g.. • If you roll a die, the outcome is random (not fixed) and there are 6 possible outcomes, each of which occur with equal probability of one-sixth. • If you poll people about their voting preferences, the percentage of the sample that responds “Yes on Proposition 101” is a also a random variable • the %-age will be slightly differently every time you poll. • Roughly, probability is how frequently we expect different outcomes to occur if we repeat the experiment over and over (“frequentist” view)

Random variables can be Discrete or Continuous • Discrete random variables have a countable number of outcomes • Examples: Dead/Alive, Red/Black, Heads/Tales, dice, counts, etc. • Continuous random variables have an infinite continuum of possible values. • Examples: blood pressure, weight, Air Temperature, the speed of a car, the real numbers from 1 to 6.

Probability Distribution Functions • A Probability Distribution Function (PDF) maps the possible values of x against their respective probabilities of occurrence, p(x) • p(x) is a number from 0 to 1.0, or alternatively, from 0% to 100%. • The area under a probability distribution function curve is always 1 (or 100%).

x p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 Discrete Example: Roll The Die 1/6 1 2 3 4 5 6

Continuous Case • The probability function that accompanies a continuous random variable is a continuous mathematical function that integrates to 1. • The Probabilities associated with continuous functions are just areas under a Region of the curve (→ Definite Integrals) • Probabilities are given for a range of values, rather than a particular value • e.g., the probability of getting a math SAT score between 700 and 800 is 2%).

Continuous Case PDF Example • Recall the negative exponential function (in probability, this is called an “exponential distribution”): • This Function Integrates to 1 zero to infinity as required for all PDF’s

1 2 Continuous Case PDF Example • The probability that x is any exact value (e.g.: 1.9976) is 0 • we can ONLY assign Probabilities to possible RANGES of x • For example, the probability of x falling within 1 to 2: p(x)=e-x 1 x p(x)=e-x NO Area Under a LINE 1 x

The Man-Height HistroGram had some Limited, and thus DISCRETE, Data If we were to Measure 10,000 (or more) young men we would obtain a HistoGram like this Gaussian Curve • As We increase the number and fineness of the measurements The PDF approaches a CONTINUOUS Curve

Gaussian Distribution • A Distribution that Describes Many Physical Processes is called the GAUSSIAN or NORMAL Distribution • Gaussian (Normal) distribution • Gaussian → famous “bell-shaped curve” • Describes IQ scores, how fast horses can run, the no. of Bees in a hive, wear profile on old stone stairs... • All these are cases where: • deviation from mean is equally probable in either direction • Variable is continuous (or large enough integer to look continuous)

Normal Distribution • Real-valued PDF: f(x) → −∞ < x < +∞ • 2 independent fitting parameters: µ , σ (central location and width) • Properties: • Symmetrical about Mode at µ , • Median = Mean = Mode, • Inflection points at ±σ • Area (probability of observing event) within: • ± 1σ = 0.683 • ± 2σ = 0.955 • For larger σ, bell shaped curve becomes wider and lower (since area =1 for any σ)

Normal Distribution • Mathematically • Where • σ2 = Variance • µ = Mean • The Area Under the Curve

68-95-99.7 Rule for Normal Dist 68% of the data σ σ 95% of the data 2σ 2σ 99.7% of the data 3σ 3σ

68-95-99.7 Rule in Math terms… • Using Definite-Integral Calculus

How Good is the Rule for Real? • Check some example data: • The mean, µ, of the weight of a large group of women Cross Country Runners = 127.8 lbs • The standard deviation (σ) for this Group = 15.5 lbs

112.3 143.3 68% of 120 = .68x120 = ~ 82 runners In fact, 79 runners fall within 1σ (15.5 lbs) of the mean 127.8

96.8 158.8 95% of 120 = .95 x 120 = ~ 114 runners In fact, 115 runners fall within 2σ of the mean 127.8

81.3 174.3 99.7% of 120 = .997 x 120 = 119.6 runners In fact, all 120 runners fall within 3σ of the mean 127.8

The Location & Width Parameters, µ & σ, are Calculated from the ENTIRE POPULATION Mean, µ Estimating µ & σ (1) • Standard Deviation, σ • For LARGE Populations it is usually impractical to measure all the xk • In this case we take a Finite SAMPLE to ESTIMATE µ & σ • Variance, σ2

Say we want to characterize Miles/Yr driven by Every Licensed Driver in the USA We assume that this is Normally Distributed, so we take a Sample of N = 1013 Drivers Estimating µ & σ (2) • We Take the Mean of the SAMPLE • Use the SAMPLE-Mean to Estimate the POPULATION-Mean

Now Calc the SAMPLE Variance & StdDev Estimating µ & σ (3) • Estimate • standard deviation: positive square root of the variance • small std dev: observations are clustered tightly around a central value • large std dev: observations are scattered widely about the mean • Number decreased from N to (N – 1) To Account for case where N = 1 • In this case x-bar = x1, and the S2 result is meaningless

Bruce Mayer, PE Licensed Electrical & Mechanical Engineer BMayer@ChabotCollege