Last lecture summary

Last lecture summary

Probability • long-term chance that a certain outcome will occur from some random (stochastic) process • how to get probabilities? • classical approach (combinatorics) • relative frequencies • simulations • sample space S (finite, countably infinite, uncountably infinite), event A, B, C, …

Types of probabilities • marginal (marginální, nepodmíněná) • S={1,2,3,4,5,6}, A={1,3,5}, P(A)=3/6=0.5, P(1)=1/6 • union (pravděpodobnostsjednocení) • prob. of A orB • joint (intersection), (průniku) • prob. of A andB, happens at the same time • conditional (podmíněná) • P(A|B), “probability of A given B” • of complement (doplňku)

product rule sum rule P(X,Y) – joint probability, “probability of X and at the same time Y” P(Y|X) – conditional probability, “probability of Y given X” P(X) – marginal probability, “probability of X” • Because P(X,Y) = P(Y,X), we immediately obtain Bayes’ theorem

Bayes theorem interpretation • If we had been asked which box had been chosen before being told the fruit identity, the most complete information we have available is provided by the P(B). • This probability is called prior probability because it is the probability before observing the fruit. • Once we’re told that the fruit is an orange, we used Bayes to compute P(B|F). This probability is called posterior because it is the probability after we have observed F. • Just based on the prior we would say we have chosen blue box (P(blue)=6/10), however based on the knowledge of fruit identity we actually answer red box (P(red|orange)=2/3). This also agrees with the intuition.

New stuff

Probability distribution

Now we’ll move away from individual probability scenarios, look at the situations in which probabilities follow certain predictable pattern and can be described by the model. • probability model – formally, it gives you formulas to calculate probabilities, determine average outcomes, and figure the amount of variability in data • e.g. probability model helps you to determine the average number of times you need to play to win a lottery game • fundamental parts of probability model: • random variable • probability distribution

Random variable • Not all outcomes of the experiment can be assigned numerical values (e.g. coin toss, possible events are “heads” or “tails”). • But we want to represent outcomes as numbers. • A random variable (usually denoted as X, Y, Z) is a rule (i.e. function) that assigns a number to each outcome of an experiment. • It is called random, because its value will vary from trial to trial as the experiment is repeated.

Two types of random variable (rv): • discrete – may take on only a countable number of distinct values such as 0, 1, 2, 3, 4, ... • example: number of children in a family • continuous(spojité)– takes an infinite number of possible values • example: height, weight, time required to run a kilometer

Probability distribution • Once we know something about random variable X, we need a way how to assign a probability P(X) to the event – X occurs. • For discrete rv this function is called probability mass function (pmf)(pravděpodobnostní funkce). • For continuous rv it is called probability density function (pdf)(hustota pravděpodobnosti, frekvenční funkce).

pmf • rolling two dice, take the sum of the outcomes probability that X = 2? 1/36 (only {1,1}) X = 3 1/36+1/36 ({1,2} and {2,1}) from Probability for Dummies, D. Rumsey

pmf – it is called mass function, because it shows how much probability (or mass) is given to each value of rv • pdf – continuous rv does not assign probability (mass), it assigns density, i.e. it tells you how dense the probability is around X for any value of X • you find probabilities for intervals of X, not particular value of X • continuous rv’s have no probability at a single point

probability distribution(pravděpodobnostní rozdělení, rozložení, distribuce)– listing of all possible values of X along with their probabilities • P(x) is between 0 and 1 • prob X takes value a or b: P(a)+P(b) • probs must add up to one These probabilities are assigned by pmf. from Probability for Dummies, D. Rumsey

probability distribution of discrete rv (i.e. its pdf) can be pictured using relative frequency histogram • the shape of histogram is important from Probability for Dummies, D. Rumsey

pdf of continuous rv

Calculating probabilities from probability distribution • two dice-rolling example • calculate the probability of these events: sum is • at least 7 • i.e. P(7≤X≤12)=P(7)+P(8)+…=6/36+5/36+… • less than 7 • at most 10 • more than 10

That’s a lot of additions! And again and again for “less than, at most, …” • There is a better way – use cummulative distribution function (cdf)(distribuční funkce) • it represents the probability that X is less or equal to the given value a • and is equal to the sum of all the probabilities for X that are less or equal to a

cdf for sum of two dice example less than 7: P(6)+P(5)+…=42%

cdf is defined on all values from -Inf to +Inf F(6) … less than 7 from Probability for Dummies, D. Rumsey

for continuous rv, cumulative distribution function (cdf) is the integral of its probability density function (pdf)

from pmf you can figure out the long-term average outcome of a random variable – expected value • and the amount of variability you need to expect from one set set of result to another – variance(rozptyl)

Expected value • long-term (infinite number of times) average value • mathematically – weighted average of all possible values of X, weighted by how often you expect each value to occur • E(X) is mean of X • multiply value of X by its probability • repeat step for all values of X • sum the results • rolling two dice example: 2*1/36 + 3*2/36 + … = 7 from Probability for Dummies, D. Rumsey

the average sum of two dice is the middle value between 2 and 12 • however, this is not a case in every problem, this happened because of the symmetric nature of pmf • E(X) = 0 * 0.30 + 1 * 0.35 + … = 1.42 • E(X) does not have to be equal to • a possible value of X from Probability for Dummies, D. Rumsey

Variance of X • variance – expected amount of variability after repeating experiment infinite number of times • weighted average squared distance from E(X) • Subtract E(X) (i.e. μ) from the value of X • square the difference • multiply by the P(x) • repeat 1-3 for each value of X • sum the results

Standard deviation • (směrodatná odchylka) • variance of X is in squared units of X !! • take the root, and you have standard deviation

Discrete uniform distribution • Each probability model has its own name, its own formulas for pmf and cdf, its own formulas for expected value and variance • The most basic is discrete uniform distribution U(a,b). • values of X: integers • from a to b (inclusive) • each value of X has an • equal probability from Probability for Dummies, D. Rumsey

pmf • cdf • expected value • variance

Statistics

Jargon • population • specific group of individuals to be studied (e.g. all Czech), … • data collected from the whole population - census • sample(výběr) • Studying whole population may be impractical (e.g. whole mankind), so you select smaller number of individuls from the population. • representative sample – if you send out a survey (about time spent on the internet by teenagers) by e-mail to “all teenagers”, you’re actually excluding teenagers that don’t have internet access at all

random sample(náhodný výběr, vzorek) • good thing, it gives every member of population same chance to be chosen • bias (zaujatost, upřednostnění, chyba) • systematic favoritism present in data collection process • occurs, because • in the way the sample is selected • in the way data are collected

data • actual measurements • categorical (gender, political party, etc …), numerical (measurable values) • data set • collection of all the data taken from the sample • statistic(výběrová statistika) • a number that summarizes the data collected from a sample • if census is collected, this number is called parameter(populační parametr), not a statistic

Means, medians and more • statistic summarizes some characteristic od data • why to summarize? • clear, consise number(s) that can be easily reported and understood even by people less intelligent than you (e.g. your boss or teacher) • they help researchers to make a sense of data (they help formulate or test claims made about the population, estimate characteristics of the population, etc.)

Summarizing categorical data • reporting percentage of individuals falling into each category • e.g. survey of 2 000 teenagers included 1 200 females and 800 males, thus 60% females and 40% males • crosstabs (two-way tables) • tables with rows and columns  • they summarize the information from two categorical variables at once (e.g. gender and political party – what is the percentage of ODS females, etc.)

Summarizing numerical data • the most common way of summarizing • where the center is (i.e. what’s a typical value) • how spread the data are • where certain milestones are

Getting centered • average, mean • sum all the numbers in data set • divide by the size of data set n • this is the sample mean, and the population mean is μ • however, average is a perfidious bitch • a few large/small values (outliers) (odlehlé body)greatly influence the average • e.g. avg(x) of 2,3,2,2,500 is cca100 • or averge salary at VŠCHT is 38 000,- (he, he, he)

in these cases more appropriateis a median • orderthe data set fromsmallest to largest • medianisthevalueexactly in themiddle(e.g. 2,3,2,2,500 → 2,2,2,3,500 → median is 2) • if the numer of data points is even, take the average of two values in the middle (e.g. 2,3,2,500 → 2,2,3,500 → median is (2+3)/2 = 2.5) • Statistic which is not influenced by outliers is called robust statistic.

skewed (zešikmené) to the right 50% of data lie below the median, 50% above symmetric data: median = mean skewed to the right: mean > median skewed to the left: mean < median skewed to the left symmetric from Statistics for Dummies, D. Rumsey

Last lecture summary