280 likes | 454 Views
Modeling and Estimation of Uncertain Systems. Lecture 1: Uncertainty I: Probability and Stochastic Processes. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A. Modeling and Estimation. What is the problem?. Increasing difficulty.
E N D
Modeling and Estimation of Uncertain Systems Lecture 1: Uncertainty I: Probability and Stochastic Processes TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAA
Modeling and Estimation What is the problem? Increasing difficulty • Two distinct activities: • Modeling is the activity of constructing a mathematical description of a system of interest and encompasses specification of model structure and its parameterization, • Estimation is concerned with determining the “state” of a system relative to some model. • Broadly speaking, there are four possible cases: • Well defined system, rich data source(s), • Poorly defined system, rich data source(s), • Well defined system, sparse data, • Poorly defined system, sparse data. • Classification characterized by amount of a priori information that can be embedded in model and amount of data available for inference.
Uncertainty “What” is the problem! • Uncertainty permeates every aspect of this problem: • What parts of the system are important? • What are the right descriptions of the constituents? What is the right way to describe their interactions? • What are the available observations? What are the dynamics of the observation processes?.... • Two types of uncertainty: • Epistemic:Uncertainty due to a lack of knowledge of quantities or processes of the system or the environment. AKA subjective uncertainty, reducible uncertainty, or model form uncertainty. • Aleatory:Inherent variation associated with the physical system or the environment. AKA variability, irreducible uncertainty, or stochastic uncertainty. • Many different “uncertainty” theories, each with their own strengths and weaknesses.
Syllabus Lecture Series Agenda Uncertainty I: Probability and Stochastic Processes Filtering I: Kalman Filtering Filtering II: Estimation – The Big Picture Uncertainty II: Information Theory Model Inference I: Symbolic Dynamics and the Thermodynamic Formalism Model Inference II: Probabilistic Grammatical Inference Uncertainty III: Representations of Uncertainty Decision under Uncertainty: Plausible Inference
Probability and Stochastic Processes Lecture 1 Agenda • What is probability? • Frequentist Interpretation • Bayesian Interpretation • Calculus of Probability • Probability Spaces • Kolmogorov’s Axioms • Conditioning and Bayes’ Theorem • Random Variables (RVs) • Distribution and Density Functions • Joint and Marginal Distributions • Expectation and Moments • Stochastic Processes • Stationarity • Ergodicity
Frequentist Interpretation Physical Interpretation • Probabilities of events are associated with their relative frequencies in a long run of trials: • Associated with random physical systems (e.g., dice), • Makes sense only in the context of well defined situations. • Frequentism • If nA is the number of occurances of event A in n trials, then • The odds of getting “Heads” in a fair coin toss is ½ because its been demonstrated empirically not because there are two equally likely events. • Propensity Theory • Interpret probability as the “propensity” for an events occurance, • Explain long run frequencies via the Law of Large Numbers (LLN);
Bayesian Interpretation Evidentiary Interpretation • Probability can be assigned to any statement whatsoever, even when no random process is involved: • Represents subjective plausibility or the degree to which the statement is supported by available evidence, • Interpreted as a “measure of a state of knowledge.” • Bayesian approach specifies a prior probability which is then updated in the light of new information. • Objectivism • Bayesian statistics can be justified by the requirements of rationality and consistency and interpreted as an extension of logic, • Not dependent upon belief. • Subjectivism • Probability is regarded as a measure of the degree of belief of the individual assessing the uncertainty of a particular situation, • Rationality and consistency constrain the probabilities.
Probability Spaces Basis for Axiomatic Probability Theory • A probability space is a triple (, F,½) • is the set of all possible outcomes, known as the sample space, • F is a set of events, where each event is a subset of containing zero or more outcomes, F must form a ¾-algebra under complementation and intersection, • ½ is a measure of the probability of an event and is called a probability measure. • Example: Pairs of (fair) coin tosses: • Describes processes containing states that occur randomly.
Kolmogorov Axioms The Probability Axioms • Constraints on ½are needed to ensure consistency • Kolmogorov Axioms: • Non-negativity:The probability of an event is a non-negative real number ( for all events ) • Unit measure: The probability that some event in the entire sample space will occur is 1 ( ), • ¾-additivity: The probability of the union (sum) of a collection of non-intersecting sets is equal to the sum of the probabilities of each of the sets (whenever is a sequence of pairwise disjoint sets in F such that is also in F, then ). • A measure which satisfies these axioms is known as a probability measure. • Not the only set of probability axioms, merely the most common.
Conditioning Basic Probabilistic Inference E E E \ E • Assume that if an event Eoccurs we will only know the event that has occurred is inE (E2E). What is the probability that Ehas occurred given Ehas occurred? • If E is true, then its complement is false . • The relative probabilities of events in E remain unchanged, i.e., if E1, E2µE, with ½(E2) ¸0, then • A little bit of algebra yields • We call ½(E|E) the conditional probability of E and say that it is conditioned on E.
Conditioning Example Converging to the right answer… • Consider a biased coin that has a bias (towards heads) of either bH= 2/3 orbT= 1/3; • Assume bT is deemed more likely with ½(bT) = 0.99 • Assume coin is tossed 25 times and heads comes up 19 times… • We’ve probably made a bad assumption and would like to update the probability based upon new information: • 226 possible events (225 for two biases), • The prior probability of a coin having a bias bTand getting a particular sequence Enwith nheads is
Conditioning Example (Cont.) Converging to the right answer… • Thus, given 19 heads, we have and • Conditioned on seeing the sequence E19, the probability that the coin has bias bT is thus
Bayes’ Rule Inverse Conditioning • Bayes’ Rule: for ½(E), ½(E)>0 • One the most widely used results in probability theory as it provides a fundamental mechanism for updating beliefs. • Example, consider a test known to be 99% reliable. If this test indicates that an event Ehas occurred, how likely is it that event has occurred? • What does 99% reliable mean? • Assume that it means • 99% of the time Eoccurs, the test indicates correctly that it has occurred (false negative rate), and, • 99% of the time that Edoes not occur, the source correctly indicates that it does not occur (false positive rate).
Bayes’ Rule Example (Cont.) The Reliability of Tests • Let P be the event that the test indicates that Ehas occurred. By Bayes’ rule we have • Since the (positive) reliability is 99%, we have ½(P|E) = 0.99, • Note, we cannot compute the ½(E|P) without additional information, i.e., ½(E). • Though it looks like we also need ½(P) , we can in fact construct this from the reliability (positive and negative): where
Bayes’ Rule (Cont.) The Reliability of Tests Only if the event is very common does the reliability approximate the probability! • hence • Substituting into Bayes’ rule produces • N.B. We cannot determine the probability of event E conditioned on a positive test result P without knowing the probability of E, i.e., ½(E).
Random Variables Aleaiactaest • A random variable (RV) xis a process of assigning a number x(E)to every event E. This function must satisfy two conditions: • The set {x·x}is an event for every x. • The probabilities of the events x=1and x= -1 equals zero. • The key observation here is that random variables provide a tool for structuring sample spaces. • In many cases, some decision or diagnosis must be made upon the basis of expectation and RVs play a key role in computing these expectations. • Note that a random variable does not have a value per se. A realization of a random variable does, however have a definite value.
Probability Distributions Spreading the Wealth Around The elements of the set Sthat are contained in the event {x·x}change as the number xtakes on different values. The probabilityPr{x·x}of the event {x·x}is, therefore, a number that depends on x. This number is expressed in terms of a (cumulative) distribution function of the random variablexand is denotedFx(x). Formally, we say Fx(x)=Pr{x·x}for everyx. The derivative is known as the density function and is closely related to the measure ½introduced earlier. For our purposes, we will treat this density function as the specification of ½.
Distribution Example Normal Distribution CDF: PDF: Discrete CDF Distributions can be discrete, continuous, or hybrid Continuous CDF Hybrid CDF
Joint Distributions Building Up Multivariate Distributions A probability distribution that is a function of multiple RVs is a multivariate distribution and is defined by the joint distribution and the associated joint density function can be determined via partial differentiation The probability that an event lies within a domain is thus
Independence and Marginal Distributions Breaking Down Multivariate Distributions We say that the RVs are independent if The statistics of a subset of the random variables a multivariate distribution are known as marginal statistics. The associated distributions are known as marginal distributions and are defined The distribution of the marginal variables is said to be obtained by marginalizing over the distribution of the variables being discarded and the discarded variables are said to have been marginalized out.
Example Multivariate Distributions Visualizing Joint and Marginal Distributions Marginal Distributions Joint Distribution Marginal distributions are projections of the joint distribution
Expected Values What did you expect? The expected value, or mean of an RV x is defined by the integral This is commonly denoted hxor just h. For RVs of discrete (lattice) type, we obtain the expected value via the sum The conditional mean or conditional expected value is obtained by replacing fx(x) with the conditional density f(x|E)
Variance and Higher Moments Concentration and Distortion • The variance is defined by the integral • The constant ¾, also denoted ¾x, is called the standard deviation of x, • The variance measures the concentration of probability mass near the mean h. • This is the second (central) moment of the distribution, other moments of interest are: • Moments: • Central Moments: • Absolute Moments: • Generalized Moments:
Stochastic Processes Generalized RVs • A stochastic process x(t) is a rule for assigning a function x(t,E) to every event E. • We shall denote stochastic processes by x(t) and hence x(t) can be interpreted several ways: • A family, or ensemble, of functions x(t,E) [t and E are variable], • A single time function (or sample of the process) [E is fixed], • A random variable [tis fixed], • A number [t and E are fixed]. • Examples: • Brownian motion - x(t) consists of the motion of all particles (ensemble), A realization x(t,Ei) is the motion of a specific particle, • Phasor with random amplitude and phase is a family of pure sine waves and a single sample of
Statistics of Stochastic Processes Time Dependence • First order properties: • For a specific t, x(t) is an RV with distribution • F(x,t) is called the first-order distribution of x(t). Its derivative w.r.t. x is called the first-order density of x(t); • Second-Order properties: • The meanh(t) of x(t) is the expected value of the RV x(t); • The autocorrelationR(t1,t2) of x(t) is the expected value of the product x(t1)x(t2): • The autocovarianceC(t1,t2) of x(t) is the covariance of the RVs x(t1) and x(t2):
Stationarity Hard to hit a moving target A stochastic process x(t)is called strict-sense stationary (SSS) if its statistical properties are invariant to a shift of the origin: for anyc. A stochastic process is called wide-sense stationary (WSS) if its mean is constant And its autocorrelation depends only on ¿=t1– t2 A SSS process is WSS. Stationarity basically says that the statistical properties don’t evolve in time.
Ergodicity Mixing it up • Ergodicity is a property connected with the homogeneity of a process. A process whose time average is the same as its space or ensemble average is said to be mean-ergodic. • This definition can be extended to include other statistics as well (e.g., covariance), • This is also a measure of how well the process “mixes.” • Example: Brownian motion • The average motion of a specific particle will tend toward the ensemble average of all of the particle’s motion. • Ergodicity is important because it tells us how long or how often a process must be sampled in order for its statistics to be estimated.
References Some Good Books…. Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, Third Ed., McGraw-Hill, New York, NY, 1991. E.T. Jaynes, Probability: The Logic of Science, Cambridge University Press, Cambridge, UK, 2003. Eugene Wong, Stochastic Processes in Information and Dynamical Systems, McGraw-Hill, New York, NY, 1971.