1 / 19

Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28

Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28. So far, all methods are one-gene-at-a-time First these methods are simple and intuitive, then they begin to become complicated.

Download Presentation

Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part II – with interactions of genes in mind Min-Te Chao 2002/10/ 28

  2. So far, all methods are one-gene-at-a-time • First these methods are simple and intuitive, then they begin to become complicated. • Eg., Efron has to use a tricky logistic regression to estimate the prior density which is not too easy.

  3. The general problem with microarray of data is, although similar in regression setup, the “design matrix” is never of full rank.

  4. In the setup Y=X * \beta + error X is n by p, with n<100, p>1000. I have seen a case with n=7, but p>6000.

  5. Let us say there is a way to “Do the statistical problem” (say, with traditional methods), with a smaller p, say p=p_1=3 or 30, depending on the value of n we have. • Let us assume a model with the first p_1 parameteres only (the other betas are all 0, say)

  6. With our traditional method, we may find the likelihood function – with n observation and p_1 parmateres • And we go through the text book method to do inference about the selected p_1 parameters. • And obtain an estimator of the p_1-dim parameter (together with a sd or p-value)

  7. Repeat the procedure B times, each time with a “simple random sample without replacement of size p_1” from the p genes in the problem.

  8. In this way we change an unsolvable problem (in our classical statistical sense) to B problems, all of them can be done with traditional methods • It is very time-consuming, but sometimes it works

  9. Lo, S haw-Hwa and Tien Zheng (2002) Backward haplotype transmission association algorithm – a fast multi-marker screening method To appear: Human Heredity

  10. Instead of genes, they use markers. • P-markers, n-patient • For each patient, we have data from father and mother • So we have n pieces of parents – child data.

  11. The problem is to identify which are the disease-causing markers

  12. They pick out r markers at a time, r<<p • A statistics T(r) is constructed, which tells the “amount of information” for a n-patient, r-marker sub-problem • Markers in this subproblem are deleted one by one, the least important one first, until all markers left are important

  13. This gets us the group 1 of important markers. • We do the same thing for another subset of r markers, and get the group 2 of important markers, …. • Do it B times, B pretty large, say 5000

  14. Combine all markers together, those with highest frequencies are selected. • More specifically, markers whose returning frequencies are more than the 3-rd quartile plus 1.8 times IQR will be selected (about 3.1 sd from mean) • About 10^{-3} type I error.

  15. The difficult part of the problem is to formulate a likelihood function for the r selected markers. • The next problem is to derive a test statistic, together with its properties. But these are problem-specific…

  16. It is the generality of the setup that is important. • Because it considers r markers at a time, so the likelihood function is with respect to the r selected markers. If there is any interaction between 2 or 3 markers, this process has a potential to pick them up

  17. This is not possible with all the one-gene-at-a-time processes.

  18. All known methods, data mining or not, for analysis of micro array type of data are ad hoc and rather primitive. • Amount of theory is limited. • It has the tendency that these methods will eventually become statistical in nature, because an assessment of risk is still a very important factor in scientific work

  19. Subject-matter relevancy is the key • Other keys: good data other scientists effective computation don’t wait

More Related