Machine Learning – Expectation maximization Wilson Mckerrow ( Fenyo lab postdoc)

Machine Learning – Expectation maximizationWilson Mckerrow (Fenyo lab postdoc) Contact: Wilson.McKerrow@nyulangone.org

Maximum likelihood estimation (MLE)

Maximum likelihood estimation (MLE) • From a certain family of distributions (e.g. normal distributions) want to pick the the distribution that best describes that data.

Maximum likelihood estimation (MLE) • From a certain family of distributions (e.g. normal distributions) want to pick the the distribution that best describes that data. • Define an appropriate loss function?

Maximum likelihood estimation (MLE) • From a certain family of distributions (e.g. normal distributions) want to pick the the distribution that best describes that data. • Define an appropriate loss function? • How about pick the distribution that maximizes the probability (likelihood) of the data?

Maximum likelihood estimation (MLE) • Let’s find the normal distribution that best fits this data (i.e. maximizes the likelihood).

Maximum likelihood estimation (MLE) • Likelihood for one value given by the density:

Maximum likelihood estimation (MLE) • Likelihood for one value given by the density: • Likelihood of multiple independent values is the product of their individual likelihoods:

Maximum likelihood estimation (MLE) • Goal: find 𝜇, 𝜎 that maximize • Strategy take log, then derivative, set to zero and solve:

Maximum likelihood estimation (MLE) Math on board

Maximum likelihood estimation (MLE) • Solution:

Maximum likelihood estimation (MLE) Example in R

Mixture models Question: Are professors taller than their students? Mean professor height is 68.7 Mean student height is 67.5

Mixture models Is this difference purely due to the fact that professors skew male? Mean professor height is 68.7 Mean student height is 67.5

Mixture models • Is this difference purely due to the fact that professors skew male? • Ideal: we measure gender to account for its effect • Otherwise: gender is a hidden variable that we have to model

Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations

Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations • are the observed data values

Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations • are the observed data values • are the particular populations that each data point belongs to. (This is unknown) • is an unknown parameter we need to estimate

Mixture models • Mixture model: the data is made up of subpopulations, with a different distribution describing each sub-population. • n observations, k subpopulations • are the observed data values • are the particular populations that each data point belongs to. (This is unknown) • are probability density functions that define the likelihood of a given data value for each subpopulation • , ,…, are unknown parameters that we want estimate.

Mixture models • Professor/student height example. Let’s consider professor height first. • are the heights of each professor

Mixture models • Professor/student height example. Let’s consider professor height first. • are the heights of each professor • are the genders (male/female) of each professor • is the fraction of professor who are male

Mixture models • Professor/student height example. Let’s consider professor height first. • are the heights of each professor • are the genders (male/female) of each professor • is the fraction of professor who are male • is the distribution of heights for female professors • is the distribution of heights for male professors

Mixture models • Professor/student height example. • is the distribution of heights for female professors • is the distribution of heights for male professors • If we assume that heights are distributed normally then:

Expectation maximization How can we estimate parameters for the subpopulation distribution if we don’t know which subpopulation a value belongs to? Use the following steps: Start with a guess for the subpopulation parameter distributions Calculate the probability that each point is in each subpopulation Estimate the subpopulation parameters using the an average weighted by the probabilities calculated in step (2) Repeat steps 2+3 until converge

Expectation maximization • The theory • EM is guaranteed to yield parameters that increase the likelihood of the data. • EM might converge to local maximum.

Expectation maximization Professor/student height example 2. Calculate the probability that each point is in each subpopulation Use: Bayes Rule

Expectation maximization Professor/student height example 2. Calculate the probability that each point is in each subpopulation

Expectation maximization Professor/student height example 3. Estimate the subpopulation parameters using the average weighted by the probabilities calculated in step (2) Regular MLE Weighted MLE Probability that professor i is male

Expectation maximization Professor/student height example 3. Estimate the subpopulation parameters using the average weighted by the probabilities calculated in step (2) Update

Expectation maximization Example in R

The exponential family Step (3) requires that we can estimate the subpopulation parameters using some kind of mean. What other distributions have MLEs that meet this requirement? The exponential family:

The exponential family Normal distribution:

The exponential family Math on board

The exponential family Other distributions in the exponential family: • Exponential • Gamma • Chi-squared • Beta • Dirichlet • Bernoulli • Categorical • Poisson • Geometric • And more…

Application of EM: Isoform expression Which isoforms of gene X are expression and at what level? Exon 1 Exon 2 Exon 3 Exon 4 Exon 1 Exon 3 Exon 1 Exon 2 Exon 4 Exon 1 Exon 4

Application of EM: Isoform expression Data: Illumina RNA-seq GeneX.1 Exon 1 Exon 2 Exon 3 Exon 4 GeneX.2 Exon 1 Exon 3 GeneX.3 Exon 1 Exon 2 Exon 4 GeneX.4 Exon 1 Exon 4

Mixture models • Describe this problem as a mixture model • n observations (reads), k subpopulations (isoforms) • are genomic alignments that tell us which isoform a read might belong to. • are the particular isoforms that each read is derived from • , ,…, are fraction of transcripts from each isoform.

Mixture models Step (2): calculate posterior probability of

Mixture models Step (2): calculate posterior probability of Fraction of reads derived from isoform j The length of isoform j. (Longer isoforms have more genomic loci where a read can begin) 1 if read i is consistent with isoform j, 0 if it is not

Expectation maximization (3) Estimate the parameters using an average weighted by the probabilities calculated in step (2) Regular MLE Weighted MLE

Expectation maximization • (3) Estimate the parameters using an average weighted by the probabilities calculated in step (2) • Regular MLE Weighted MLE • EM for isoform expression: Start with an initial guess of expression, repeat these two steps until convergence.

Machine Learning – Expectation maximization Wilson Mckerrow ( Fenyo lab postdoc)

Machine Learning – Expectation maximization Wilson Mckerrow ( Fenyo lab postdoc)

Presentation Transcript

EMI-SIG Training Working Group TWG

Risk Based CMC Review

NAIC Risk-Based Capital (RBC)

Rule Based Systems

Risk Management

Supporting Location-Based Approximate-Keyword Queries

Multidimensional Approximate agreement in byzantine asynchornous systems

Event-based surveillance and Risk assessment

Minimum Intervention Dentistry (MI) (Overview)

Unit 17

Risk Management of Aircraft Wiring System Oct. 23, 2003

Effective Training: Systems, Strategies and Practices

Risk Based Internal Audit in Banks

Risk Based Corrective Action

Complexity of Approximation

Approximate Abstraction for Verification of Continuous and Hybrid Systems

Risk Management Between Projects and Systems

Agenda

Approximate Counting via Correlation Decay in Spin Systems

Risk-Based Analysis of Needs for Transportation Corridor Protection

RISK-BASED MANAGEMENT OF GUARDRAILS: SITE SELECTION AND UPGRADING