410 likes | 674 Views
Statistical model for count data. Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang. Outline. Why use statistical model Target Gene expression Binomial distribution Poisson distribution Over dispersion Negative binomial Chi-square approximation Conclusion . Statistics model.
E N D
Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang
Outline • Why use statistical model • Target • Gene expression • Binomial distribution • Poisson distribution • Over dispersion • Negative binomial • Chi-square approximation • Conclusion
Statistics model • A statistical model is a probability distribution constructed to enable inferences to be drawn or decisions made from data. Information : sample Height, weight, etc. Population Inference We We have to choose a statistics model for sample (mean, variance) Make a decision : Hypothesis testing (mean, variance) size designer consumer
Target • Gene expression • We like to use statistical model to test an observed difference in read counts is significant. Look like a significant region How about this Can we sure ? Noise or not
Count data • A type of data in which the observations can take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking. • An individual piece of count data is often termed a count variable. Poisson All of them are this type Binomial Negative binomial
Binomial distribution • The number of successes in a sequence ofnindependent yes/no experiments, each of which yields success with probability p. • Notation :
Binomial distribution Ex : p=0.8 , (1-p)=0.2 , times : 3 , success : 2 (1 1 0) (1 0 1) (0 1 1) f(2)=0.384 33 goals 110 shots in this season Success : 0.3 Fail : 0.7 What is the probability if he scored 6 goals in 10 shots
Binomial distribution • Exactly six goals • Most three goals 0 1 2 3 4 5 6 7 8 9 10 6
Poisson distribution • Expresses the probability of a given number of events occurring in a fixed interval. • Notation :
e = 2.718281828… Poisson distribution • Suppose interval : goals per game
Poisson Games • Total : 11 games • Score : 33 goals • (33/11) = 3 goals per game • Poisson : • Raw data : • We could test inaccurately in this case by poisson goals
Overdispersion • The presence of greater variability (statistical dispersion) in a data set than would be expected based on a given simple statistical model.
Negative binomial • Gamma-poisson (mixture) distribution
Approximate control limits • Chi-square approximation
Example = 67.0
Conclusion • Conclusion • Thanks for attention
Statistics model • Suitable type • Which distribution should we use • Parameters • Get some information from data • Inference • What do we want to know • How could we make a decision • Hypothesis testing
Statistics model • Suitable type • Binomial distribution • Parameters • n = 10, p = 0.7 • Inference • 2 successes
Multinomial distribution • The analog of the Bernoulli distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number k of possible outcomes. • http://en.wikipedia.org/wiki/Multinomial_distribution
Count data • A type of data in which the observations can take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking. • We tend to use fixed fractions of genes. The probability that reads appeared in this region The number of read counts in this interval (Binomial distribution) (Poisson distribution)