190 likes | 322 Views
Statistical Estimation. Vasileios Hatzivassiloglou University of Texas at Dallas. Obama contract at intrade.com. Instance profiles.
E N D
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas
Instance profiles • Given k observations of maximum length n, construct a |Σ|×n matrix A (profile) where entry Aij is the estimated probability that the ith letter occurs in position j • One way to estimate Aij is to count each letter occuring at this position (cij); then • This is maximum likelihood estimation (MLE) • Estimate becomes better as k increases
Example data • 23 sample motif instances for the cyclic AMP receptor transcription factor (positions 3-9) TTGTGGC TTTTGAT AAGTGTC ATTTGCA CTGTGAG ATGCAAA GTGTTAA ATTTGAA TTGTGAT ATTTATT ACGTGAT ATGTGAG TTGTGAG CTGTAAC CTGTGAA TTGTGAC GCCTGAC TTGTGAT TTGTGAT GTGTGAA CTGTGAC ATGAGAC TTGTGAG
Probability of a motif • Suppose that we consider M as a candidate motif consensus • How do we find the best M given the observations in A? • Assuming independence of positions,
Maximum likelihood estimation • General method for estimating unknown parameters when we have • a sample of values that depend on these parameters • a formula specifying the probability of obtaining these values given the parameters
MLE example: three coins • Suppose we have three coins with probability of heads ⅓, ½, and ⅔ • One of them is used to generate a series of 20 tosses and we observe 11 heads • θ = the heads probability of the coin used in the experiment • Binomial distribution for the number of heads
Binomial distribution • Count of one of two possible outcomes in a series of independent events • The probabilities of the two outcomes are constant across events • An example of iid events (independent, identically distributed)
Binomial probability mass • If the probability of one outcome (let’s call it A) is p and there are n events • The probability of the other outcome is 1-p • The probability of obtaining a particular sequence of outcomes with m A’s is • There are sequences with the same number m of outcomes A • Overall
MLE example: three coins • Result: Choose θ = ½
MLE example: unknown coins • θ can take any value between 0 and 1 • m heads in n tosses • Solve the differential equation
MLE for binomial • Of the three solutions, θ = 0 andθ = 1 result in P(X1,X2,...,Xn | θ) = 0, i.e., local minima • On the other hand, for 0<θ<1, P(X1,X2,...,Xn | θ) > 0, so θ = m/n must be a local maximum • Therefore the MLE estimate is
Properties of estimators • The estimation error for a given sample is where x is the unknown true value • An estimator is a random variable • because it depends on the sample • The meansquare error represents the overall quality of the estimation across all samples
Expected values • Recall that the expected value of a discrete random variable X is defined as • The expected value of a dependent random variable f(X) is • For continuous distributions, replace the sum with an integral
Bias in estimation • An estimator is unbiased if • MLE is not necessarily unbiased • Example: standard deviation • Is the most commonly used measure of dispersion in a data set • For a random variable X, it is defined as
Estimators of standard deviation • MLE estimator where • “Almost unbiased” estimator ( is an unbiased estimator of σ2) biased