1 / 32

V7 Foundations of Probability Theory

V7 Foundations of Probability Theory. „ Probability “ : degree of confidence that an event of an uncertain nature will occur . „Events“ : we will assume that there is an agreed upon space  of possible outcomes („ events “).

sonel
Download Presentation

V7 Foundations of Probability Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. V7 FoundationsofProbabilityTheory „Probability“ : degreeofconfidencethat an eventof an uncertainnature will occur. „Events“ : we will assumethatthereis an agreed upon spaceofpossibleoutcomes(„events“). E.g. a normal die (dt. Würfel) has a space   1,2,3,4,5,6 Also weassumethatthereis a setofmeasurableevents S towhichwearewillingtoassignprobabilities. In the die example, theevent6 isthecasewherethe die shows 6. The event 1,3,5 representsthecaseof an oddoutcome. Mathematics of Biological Networks

  2. FoundationsofProbabilityTheory Probabilitytheoryrequiresthattheeventspacesatisfy 3 basicproperties: • Itcontainstheemptyevent andthetrivial event. • Itisclosedunderunion→ if ,   S, then so is     S, • Itisclosedundercomplementation→if  S, then so is    S The requirementthattheeventspaceisclosedunderunion andcomplementationimpliesthatitis also closedunderother Boolean operations, such asintersectionandsetdifference. Mathematics of Biological Networks

  3. Probabilitydistributions A probabilitydistributionP over (, S) is a mappingfromevents in S toreal valuesthatsatisfiesthefollowingconditions: (1) P(  0 for all  S → Probabilitiesare not negative (2) P() = 1 → The probabilityofthe trivial eventwhichallows all possibleoutcomeshasthe maximal possibleprobabilityof 1. (3) If,   S and    = 0 then P(  ) = P() + P() Mathematics of Biological Networks

  4. Interpretation ofprobabilities The frequentist‘sinterpretation: The probabilityof an eventisthefractionoftimes theeventoccursifwerepeattheexperimentindefinitely. E.g. throwingofdice, coinflips, cardgames, … wherefrequencies will satisfytherequirementsof proper distributions. For an event such as „It will rain tomorrowafternoon“, thefrequentistapproachdoes not provide a satisfactoryinterpretation. Mathematics of Biological Networks

  5. Interpretation ofprobabilities An alternative interpretationviewsprobabilitiesassubjectivedegreesof belief. E.g. thestatement „theprobabilityof rain tomorrowafternoonis 50 percent“ tellsusthat in theopinionofthespeaker, thechancesof rain andno rain tomorrowafternoonarethe same. Whenwediscussprobabilities in thefollowingweusually do not explicitlystate theirinterpretationsincebothinterpretationsleadtothe same mathematicalrules. Mathematics of Biological Networks

  6. Conditionalprobability The conditionalprobabilityof  given  isdefinedas The probabilitythat  istruegiventhatweknow  isthe relative proportion ofoutcomessatisfying  amongthesethatsatisfy . Fromthisweimmediatelyseethat This equalityisknowasthechainruleofconditionalprobabilities. More generally, if 1, 2, … kareevents, wecanwrite Mathematics of Biological Networks

  7. Bayesrule Another immediate consequenceofthedefinitionofconditionalprobabilityis Bayes‘ rule. Due tosymmetry, wecanswapthe 2 variables  and  in thedefinition andgettheequivalentexpression Ifwerearrange, wegetBayes‘ ruleor A moregeneralconditionalversionofBayes‘ rulewhere all probabilitiesareconditioned on somebackgroundevent  also holds: Mathematics of Biological Networks

  8. Example 1 forBayesrule Consider a studentpopulation. LetSmartdenote smart studentsandGradeAdenotestudentswhogot grade A. Assumewebelievethat P(GradeA|Smart) = 0.6, andnowwelearnthat a particularstudentreceived grade A. SupposethatP(Smart) = 0.3 andP(GradeA) = 0.2 Thenwehave P(Smart|GradeA) = 0.6  0.3 / 0.2 = 0.9 In thismodel, an A grade stronglysuggeststhatthestudentis smart. On theotherhand, ifthetest was easierand high grades weremorecommon, e.g. P(GradeA) = 0.4, thenwewouldget P(Smart|GradeA) = 0.6  0.3 / 0.4 = 0.45whichismuchlessconclusive. Mathematics of Biological Networks

  9. Example 2 forBayesrule Supposethat a tuberculosisskintestis 95% percentaccurate. Thatis, ifthepatientis TB-infected, thenthetest will be positive withprobability 0.95 andifthepatientis not infected, thetest will be negative withprobability 0.95. Nowsupposethat a persongets a positive testresult. Whatistheprobabilitythatthepersonisinfected? Naive reasoningsuggeststhatifthetestresultiswrong5% ofthe time, thentheprobabilitythatthesubjectisinfectedis 0.95. Thatwouldmeanthat 95% ofsubjectswith positive resultshave TB. Mathematics of Biological Networks

  10. Example 2 forBayesrule IfweconsidertheproblembyapplyingBayes‘ rule, weneedtoconsiderthepriorprobabilityof TB infection, andtheprobabilityofgetting a positive testresult. Supposethat 1 in 1000 ofthesubjectswhogettestedisinfected→ P(TB) = 0.001 Weseethat 0.001  0.95 infectedsubjectsget a positive result and 0.999  0.05 uninfectedsubjectsget a positive result. Thus P(Positive) = 0.001  0.95 + 0.999  0.05 = 0.0509 ApplyingBayes‘ rule, weget P(TB|Positive) = P(TB)  P(Positive|TB) / P(Positive) = 0.001  0.95 / 0.0509  0.0187 Thus, although a subjectwith a positive testismuchmore probable tobe TB-infectedthanis a randomsubject, fewerthan 2% ofthesesubjectsare TB-infected. Mathematics of Biological Networks

  11. Random Variables A random variable isdefinedby a function thatassociateswitheachoutcome in  a value. Forstudents in a class, thiscouldbe a functionthatmaps eachstudent in theclass (in ) tohisor her grade (1, …, 5). The eventgrade = Ais a shorthandfortheevent. Thereexistcategorical (ordiscrete) randomvaluesthattake on oneof a fewvalues, e.g. intelligencecouldbe „high“ or „low“. There also existinteger or real random variable thatcantake on an infinite numberofcontinuousvalues, e.g. theheightofstudents. By Val(X) wedenotethesetofvaluesthat a random variable X cantake. Mathematics of Biological Networks

  12. Random Variables In thefollowing, we will eitherconsidercategoricalrandom variables orrandom variables thattake real values. We will usecapitalletters X, Y, Z todenoterandom variables. Lowercasevalues will refertothevaluesofrandom variables. E.g. Whenwediscusscategoricalrandomnumbers, we will denotethei-thvalueasxi. Boldcapitallettersareusedforsetsofrandom variables: X, Y, Z. Mathematics of Biological Networks

  13. Marginal Distributions Once wedefine a random variable X, wecanconsiderthe marginal distributionP(X)overeventsthatcanbedescribedusing X. E.g. letustakethetworandom variables IntelligenceandGrade andtheir marginal distributionsP(Intelligence) andP(Grade) Letussupposethat These marginal distributionsareprobabilitydistributionssatisfyingthe 3 properties. Mathematics of Biological Networks

  14. Joint Distributions Often weareinterested in questionsthat involvethevaluesofseveralrandom variables. E.g. wemightbeinterested in theevent „Intelligence = high andGrade = A“. In thatcaseweneedtoconsiderthejointdistribution over thesetworandom variables. The jointdistributionof 2 random variables hastobeconsistent withthe marginal distribution in that putFigure 2.1 Mathematics of Biological Networks

  15. ConditionalProbability The notionofconditionalprobabilityextendsto induceddistributionsoverrandom variables. denotestheconditionaldistributionovertheeventsdescribablebyIntelligencegiventheknowledgethatthestudent‘s grade is A. Note thattheconditionalprobability is quite different fromthe marginal distribution. We will usethenotationtopresent a setofconditionalprobabilitydistributions. Bayes‘ rule in termsofconditionalprobabilitydistributionsreads Mathematics of Biological Networks

  16. Independence We usuallyexpectP( | ) tobe different fromP( ) . Learning that istruetypicallychangesourprobabilityover. However, in somesituationsP( | ) =P( ) . Definition: Wesaythat an event isindependentofevent in P, denotedasifP( | ) = P( ) orifP() = 0. We will nowprovidean alternative definitionforthisconceptofindependence. Mathematics of Biological Networks

  17. Independence Proposition: A distributionPsatisfiesifandonlyif Proof IfP() = 0 → Also so thatisfulfilled. LetnowP()  0  Fromthechainruleweget Since  isindependentof , Thus weget • SupposethatThen, bydefinitionwehave whichiswhatneedstobeshown. Note that implies Mathematics of Biological Networks

  18. Independence of Random Variables Definition: LetX, Y, Z besetsofrandom variables. WesaythatXisconditionallyindependentofY givenZ in a distributionP ifPsatisfies for all values, , . As beforewecangive an alternative characterizationofconditionalindependence Proposition: The distributionPsatisfiesifandonlyif Mathematics of Biological Networks

  19. Independence propertiesofdistributions Symmetry  Decomposition  Weakunion  Contraction  Mathematics of Biological Networks

  20. ProbabilityDensityFunctions A function is a probabilitydensityfunction(PDF) for X ifitis a nonnegativeintegrablefunction so that The functionisthecumulativedistributionfor X. Byusingthedensityfunctionwecanevaluatetheprobabilityofotherevents. E.g. Mathematics of Biological Networks

  21. Uniform distribution The simplest PDF istheuniform distribution Definition: A variable X has a uniform distributionover [a,b] denoted X  Unif[a,b] ifithasthe PDF Thus theprobabilityofanysubintervalof [a,b] is proportional toitssize relative tothesizeof [a,b]. Ifb – a < 1, thedensitycanbegreaterthan 1. Weonlyhavetosatisfytheconstraintthatthe total areaunderthe PDF is 1. Mathematics of Biological Networks

  22. Gaussiandistribution A random variable X has a Gaussiandistributionwithmean  andvariance 2 , denoted X  N(;2) ifithasthe PDF A standardGaussianhasmean 0 andvariance 1. Fig. 2.2. Mathematics of Biological Networks

  23. Joint densityfunctions Let P be a jointdistributionovercontinuousrandom variables X1, … Xn . A functionp(x1, … xn) is a jointdensityfunctionof X1, … Xnif - p(x1, … xn)  0 for all valuesx1, … xnofX1, … Xn - pis an integrablefunction - foranychoiceofa1, … anandb1, … bn Fromthejointdensitywecanderivethe marginal densityofanyrandom variable byintegrating out theother variables. E.g. ifp(x,y)isthejointdensityof X and Y Mathematics of Biological Networks

  24. Conditionaldensityfunctions We nowwanttobeabletodescribeconditionaldistributionsofcontinuous variables. Applyingthepreviousdefinitionisproblematic becausetheprobabilityof an isolatedpointP(X = x) iszero. Thus wedefine Ifthereexists a continuousjointdensityfunctionp(x,y) thenwecanderivethe form ofthisterm. Letusconsidersomeevent on Y, saya  Y  b. Mathematics of Biological Networks

  25. Conditionaldensityfunctions When  issufficientlysmall, wecanassumethatp(x) = const in thisintervalandapproximate Using a similarapproximationforp(x‘,y) , weget WeconcludethatisthedensityofP( Y | X = x) Mathematics of Biological Networks

  26. Conditionaldensityfunctions Let p(x,y)bethejointdensityofXandY. The conditionaldensityfunctionofY givenXisdefinedas Whenp(x) = 0, theconditionaldensityisundefined. The propertiesofjointdistributionsandconditionaldistributions carry over tojointandconditionaldensityfunctions. In particular, wehavethechainrule andBayes‘ rule Mathematics of Biological Networks

  27. Conditionaldensityfunctions Definition: LetX, YandZbesetsofcontinuousrandom variables withjointdensityp(X, Y,Z). WesaythatXisconditionallyindependentofYgivenZ iffor all x,y, z such thatp(z) > 0 Mathematics of Biological Networks

  28. Expectation Let X be a discreterandom variable thattakesnumericalvalues. Then, theexpectationof X underthedistribution P is If X is a continuous variable, thenweusethedensityfunction E.g. ifweconsider X tobetheoutcomeofrolling a good die withprobability 1/6 foreachoutcome, thenE[X] = 1  1/6 + 2  1/6 + … + 6  1/6 = 3.5 Mathematics of Biological Networks

  29. Properties oftheexpectationof a random variable E[a  X + b] = a E[X ] + b Let X and Y betworandom variables E[X + Y] = E[X] + E[Y] Here, itdoes not matter whether X and Y areindependentor not. Whatcanbesayabouttheexpectationvalueof a productoftworandom variables? In thegeneralcaseverylittle. Consider 2 variables X and Y thattakeeach on thevalues +1 and -1 with probabilities 0.5. If X and Y areindependent, thenE[X  Y] = 0. Iftheyalwaystakethe same value (theyarecorrelated), thenE[X  Y] = 1. Mathematics of Biological Networks

  30. Properties oftheExpectationof a random variable If X and Y areindependentthen E[X  Y]= E[X]  E[Y] The conditionalexpectationof X given y is Mathematics of Biological Networks

  31. Variance The expectationof X tellsusthemeanvalueof X. However, itdoes not indicatehowfar X deviatesfromthisvalue. A measureofthisdeviationisthevarianceof X: The varianceistheexpectationofthesquareddifferencebetween X anditsexpectedvalue. An alternative formulationofthevarianceis If X and Y areindependent, then Forthisreason, weareofteninterested in thesquarerootofthevariance, whichiscalledthestandarddeviationoftherandom variable. Wedefine Mathematics of Biological Networks

  32. Variance Let X be a random variable withGaussiandistribution N(;2). ThenE[X] =  and Var[X] = 2. Thus, theparametersoftheGaussiandistributionspecifytheexpectationandthevarianceofthedistribution. The form oftheGaussiandistributionimpliesthatthedensityofvaluesof X dropsexponentially fast in thedistance(x - ) / . Not all distributionsshow such a rapid decline in theprobabilityofoutcomesthataredistancefromtheexpectation. However, evenforarbitrarydistributions, onecanshowthatthereis a decline. Chebyshevinequalitystates or in termsof  Mathematics of Biological Networks

More Related