SNLP Chapter 2 Mathematical Foundation

SNLP Chapter 2 Mathematical Foundation 인공지능연구실 정 성 원

Contents – Part 1 1. Elementary Probability Theory • Conditional probability • Bayes’ theorem • Random variable • Joint and conditional distribution

Probability spaces • Probability theory deals with predicting how likely it is that something will happen. • The collection of basic outcomes (or sample points) for our experiment is called the sample space(Ω). • An event is a subset of the sample space. • σ-field • Probabilities are numbers between 0 and 1, where 0 indicates impossibility and 1, certainty. • A probability function/distribution distributes a probability mass of 1 throughout the sample space. • A well-founded probability space consists of a sample space Ω, σ-field of event F, and a probability function P.

Conditional probability (1/2) • P(A) : the probability of the event A • Ex1> A coin is tossed 3 times. W = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} A = {HHT, HTH, THH} : 2 heads, P(A)=3/8 B = {HHH, HHT, HTH, HTT} : first head, P(B)=1/2 : conditional probability

Conditional probability (2/2) • Multiplication rule • Chain rule • Two events A, B are independent • Conditionally independent If

Bayes’ theorem (1/2)

Bayes’ theorem (2/2) • Ex2> G : the event of the sentence having a parasitic gap T : the event of the test being positive • This poor result comes about because the prior probability of a sentence containing a parasitic gap is so low.

Random variable (확률 변수) • Ex3>Random variable X for the sum of two dice. Expectation : (기대값) Variance : (분산) S={2,…,12} probability mass function(pmf) : p(x) = p(X=x), X ~ p(x) (확률 질량 함수)

Joint and conditional distributions • The joint pmf for two discrete random variables X, Y • Marginal pmfs, which total up the probability mass for the values of each variable separately. • Conditional pmf • Chain rule for y such that

Contents – Part 2 2. Essential Information Theory • Entropy • Joint entropy and conditional entropy • Mutual information • The noisy channel model • Relative entropy or Kullback-Leibler divergence

Shannon’s Information Theory • Maximizing the amount of information that one can transmit over an imperfect communication channel such as a noisy phone line. • Theoretical maxima for data compression • Entropy H • Theoretical maxima for the transmission rate • Channel Capacity

Entropy (1/4) • The entropy H (or self-information) is the average uncertainty of a single random variable X. • Entropy is a measure of uncertainty. • The more we know about something, the lower the entropy will be. • We can use entropy as a measure of the quality of our models. • Entropy measures the amount of information in a random variable (measured in bits). where, p(x) is pmf of X

Entropy (2/4) • The entropy of a weighted coin. The horizontal axis shows the probability of a weighted coin to come up heads. The vertical axis shows the entropy of tossing the corresponding coin once. P

Entropy (3/4) • Ex7> The result of rolling an 8-sided die.(uniform distribution) • Entropy : The average length of the message needed to transmit an outcome of that variable. • For expectation E

Entropy (4/4) • Ex8> Simplified Polynesian • We can design a code that on average takes bits to transmit a letter • Entropy can be interpreted as a measure of the size of the ‘search space’ consisting of the possible values of a random variable. bits

Joint entropy and conditional entropy (1/2) • The joint entropy of a pair of discrete random variable X,Y~ p(x,y) • The conditional entropy • The chain rule for entropy

Joint entropy and conditional entropy (2/2)

p t k a i u p t k a i 0 u 0 1 Joint entropy and conditional entropy (2/3) • Ex9> Simplified Polynesian revisited • All words of consist of sequence of CV(consonant-vowel) syllables Marginal probabilities (per-syllable basis) Per-letter basis probabilities double back 8 page

p t k a i 0 u 0 1 Joint entropy and conditional entropy (3/3)

Mutual information (1/2) • By the chain rule for entropy • : mutual information • Mutual information between X and Y • The amount of information one random variable contains about another. (symmetric, non-negative) • It is 0 only when two variables are independent. • It grows not only with the degree of dependence, but also according to the entropy of the variables. • It is actually better to think of it as a measure of independence.

Mutual information (2/2) • Since (entropy is called self-information) • Conditional MI and a chain rule =I(x,y) Pointwise MI

Noisy channel model (1/2) • Channel capacity : the rate at which one can transmit information through the channel (optimal) • Binary symmetric channel • since entropy is non-negative,

Noisy channel model (2/2)

Relative entropy or Kullback-Leibler divergence • Relative entropy for two pmfs, p(x), q(x) • A measure of how close two pmfs are. • Non-negative, and D(p||q)=0 if p=q • Conditional relative entropy and chain rule

The relation of language :Cross entropy • Use entropy as a measure of the quality of our models • Pointwise Entropy • Minimize D(p||m) • Cross Entropy

SNLP Chapter 2 Mathematical Foundation

SNLP Chapter 2 Mathematical Foundation

Presentation Transcript

Mathematical Background Chapter 2 of HAC

Chapter 2 Foundation of Individual Behavior

Chapter 2: A Mathematical Toolkit

4203 Mathematical Probability Chapter 2: Probability

Mathematical Models Chapter 2

chapter 2 Mathematical preliminaries

Chapter 2 Foundation of Individual Behavior

Chapter 2 Mathematical Foundation

2. Mathematical Modeling

Chapter 2 A Mathematical Toolkit

ADD and SNLP in Thailand

CHAPTER 2 MATHEMATICAL REPRESENTATION OF NOISE

2. Mathematical Foundations

Chapter 2 Mathematical Description of Systems

Chapter 2 - Mathematical Review Functions

Foundation 2

Survey of Mathematical Ideas Math 100 Chapter 2

CHAPTER 2 MATHEMATICAL REPRESENTATION OF NOISE

Mathematical Background 2

Discrete Mathematical Chapter 2

Chapter 2 Mathematical Background

Chapter 2 Foundation of Individual Behavior