460 likes | 916 Views
Machine Learning – Expectation maximization Wilson Mckerrow ( Fenyo lab postdoc). Contact: Wilson.McKerrow@nyulangone.org. Maximum likelihood estimation (MLE). Maximum likelihood estimation (MLE).
E N D
Machine Learning – Expectation maximizationWilson Mckerrow (Fenyo lab postdoc) Contact: Wilson.McKerrow@nyulangone.org
Maximum likelihood estimation (MLE) • From a certain family of distributions (e.g. normal distributions) want to pick the the distribution that best describes that data.
Maximum likelihood estimation (MLE) • From a certain family of distributions (e.g. normal distributions) want to pick the the distribution that best describes that data. • Define an appropriate loss function?
Maximum likelihood estimation (MLE) • From a certain family of distributions (e.g. normal distributions) want to pick the the distribution that best describes that data. • Define an appropriate loss function? • How about pick the distribution that maximizes the probability (likelihood) of the data?
Maximum likelihood estimation (MLE) • Let’s find the normal distribution that best fits this data (i.e. maximizes the likelihood).
Maximum likelihood estimation (MLE) • Likelihood for one value given by the density:
Maximum likelihood estimation (MLE) • Likelihood for one value given by the density: • Likelihood of multiple independent values is the product of their individual likelihoods:
Maximum likelihood estimation (MLE) • Goal: find 𝜇, 𝜎 that maximize • Strategy take log, then derivative, set to zero and solve:
Maximum likelihood estimation (MLE) Math on board
Maximum likelihood estimation (MLE) • Solution:
Maximum likelihood estimation (MLE) Example in R
Mixture models Question: Are professors taller than their students? Mean professor height is 68.7 Mean student height is 67.5
Mixture models Is this difference purely due to the fact that professors skew male? Mean professor height is 68.7 Mean student height is 67.5
Mixture models • Is this difference purely due to the fact that professors skew male? • Ideal: we measure gender to account for its effect • Otherwise: gender is a hidden variable that we have to model
Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations
Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations • are the observed data values
Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations • are the observed data values • are the particular populations that each data point belongs to. (This is unknown) • is an unknown parameter we need to estimate
Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations • are the observed data values • are the particular populations that each data point belongs to. (This is unknown) • are probability density functions that define the likelihood of a given data value for each subpopulation • , ,…, are unknown parameters that we want estimate.
Mixture models • Professor/student height example. Let’s consider professor height first. • are the heights of each professor
Mixture models • Professor/student height example. Let’s consider professor height first. • are the heights of each professor • are the genders (male/female) of each professor • is the fraction of professor who are male
Mixture models • Professor/student height example. Let’s consider professor height first. • are the heights of each professor • are the genders (male/female) of each professor • is the fraction of professor who are male • is the distribution of heights for female professors • is the distribution of heights for male professors
Mixture models • Professor/student height example. • is the distribution of heights for female professors • is the distribution of heights for male professors • If we assume that heights are distributed normally then:
Expectation maximization How can we estimate parameters for the subpopulation distribution if we don’t know which subpopulation a value belongs to? Use the following steps: Start with a guess for the subpopulation parameter distributions Calculate the probability that each point is in each subpopulation Estimate the subpopulation parameters using the an average weighted by the probabilities calculated in step (2) Repeat steps 2+3 until converge
Expectation maximization • The theory • EM is guaranteed to yield parameters that increase the likelihood of the data. • EM might converge to local maximum.
Expectation maximization Professor/student height example 2. Calculate the probability that each point is in each subpopulation Use: Bayes Rule
Expectation maximization Professor/student height example 2. Calculate the probability that each point is in each subpopulation
Expectation maximization Professor/student height example 3. Estimate the subpopulation parameters using the average weighted by the probabilities calculated in step (2) Regular MLE Weighted MLE Probability that professor i is male
Expectation maximization Professor/student height example 3. Estimate the subpopulation parameters using the average weighted by the probabilities calculated in step (2) Update
Expectation maximization Example in R
The exponential family Step (3) requires that we can estimate the subpopulation parameters using some kind of mean. What other distributions have MLEs that meet this requirement? The exponential family:
The exponential family Normal distribution:
The exponential family Math on board
The exponential family Other distributions in the exponential family: • Exponential • Gamma • Chi-squared • Beta • Dirichlet • Bernoulli • Categorical • Poisson • Geometric • And more…
Application of EM: Isoform expression Which isoforms of gene X are expression and at what level? Exon 1 Exon 2 Exon 3 Exon 4 Exon 1 Exon 3 Exon 1 Exon 2 Exon 4 Exon 1 Exon 4
Application of EM: Isoform expression Data: Illumina RNA-seq GeneX.1 Exon 1 Exon 2 Exon 3 Exon 4 GeneX.2 Exon 1 Exon 3 GeneX.3 Exon 1 Exon 2 Exon 4 GeneX.4 Exon 1 Exon 4
Mixture models • Describe this problem as a mixture model • n observations (reads), k subpopulations (isoforms) • are genomic alignments that tell us which isoform a read might belong to. • are the particular isoforms that each read is derived from • , ,…, are fraction of transcripts from each isoform.
Mixture models Step (2): calculate posterior probability of
Mixture models Step (2): calculate posterior probability of
Mixture models Step (2): calculate posterior probability of Fraction of reads derived from isoform j The length of isoform j. (Longer isoforms have more genomic loci where a read can begin) 1 if read i is consistent with isoform j, 0 if it is not
Expectation maximization (3) Estimate the parameters using an average weighted by the probabilities calculated in step (2) Regular MLE Weighted MLE
Expectation maximization • (3) Estimate the parameters using an average weighted by the probabilities calculated in step (2) • Regular MLE Weighted MLE • EM for isoform expression: Start with an initial guess of expression, repeat these two steps until convergence.