1 / 22

Genetic Programming for Mining DNA Chip data from Cancer Patients

Genetic Programming for Mining DNA Chip data from Cancer Patients. W.B. Langdon & B.F. Buxton Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004 Presenter John Dynan. Why Genetic Programming ?. Applies principles Darwinism to AI

angeni
Download Presentation

Genetic Programming for Mining DNA Chip data from Cancer Patients

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic Programming for Mining DNA Chip data from Cancer Patients W.B. Langdon & B.F. Buxton Genetic Programming and Evolving Machines, 5 (3): 251-257 September 2004 Presenter John Dynan

  2. Why Genetic Programming ? • Applies principles Darwinism to AI • Allows natural selection of the Fittest Models • Iterative process that evolves numerous Solutions • Similar to the Biology of Genetic • Resolves over fitting issue found in other Approaches • DNA arrays with limited data sets (<100 Tissues) • Predictive nature of low expression Genes • Disease , treatment and prevention

  3. What is Genetic Programming(GP) ? • Replicates Genetic Process: • Crossover(recombination) • Duplication • Mutation • Production • Deletion • DNA string of Elements (A,C,G,U=T)

  4. GP Cross Over

  5. Biological Genetic Cross Over

  6. What it is not • Clustering K-means • Heuristic Combination of fixed Rules • Single set of features • Sequential learning process for features • Optimal solution • Controlled Feature Deletion or Addition

  7. History • Extension of Holland(1975) Genetic Algorithms Work(Stanford): • Structures are programs • Syntax Trees • Nodes • Functions ( Mul, Add, Div, Sub, Exp ..) • Terminals (Attributes, Gene Expression, ..) • GP is a search for Terminals and Functions

  8. Syntax Tree

  9. µarray Problem • Pomeroy Data Set (url) • 7129 Gene Expressions • 60 Patents • 39 Survivors ( Cancer Tissues) • 21 Terminal (Non Cancer) • Compare w/ K=5 & 8 Genes - Pomeroy

  10. Pomeroy Data Set Snippet • Brain_MD_30 Brain_MD_31 Brain_MD_32 Brain_MD_33 Brain_MD_34 Brain_MD_35 • Brain_MD_36 Brain_MD_37 Brain_MD_38 Brain_MD_39 Brain_MD_40 Brain_MD_41 • Brain_MD_42 Brain_MD_43 Brain_MD_44 Brain_MD_45 Brain_MD_46 Brain_MD_47 • Brain_MD_48 Brain_MD_49 Brain_MD_50 Brain_MD_51 Brain_MD_52 Brain_MD_53 • Brain_MD_54 Brain_MD_55 Brain_MD_56 Brain_MD_57 Brain_MD_58 Brain_MD_59 • Brain_MD_60 • U08998_at TAR RNA binding protein (TRBP) mRNA 206.0 55.0 106.0 323.0 209.0 88.0 • 179.0 -493.0 -40.0 60.0 -200.0 312.0 -26.0 -234.0 127.0 10.0 135.0 -72.0 • 46.0 -77.0 50.0 375.0 -252.0 -189.0 -112.0 -931.0 193.0 -125.0 -1244.0 -470.0 • -683.0 -261.0 -18.0 -90.0 -3.0 -57.0 -201.0 50.0 -197.0 -141.0 -353.0 -132.0 • -408.0 -262.0 20.0 239.0 -232.0 -593.0-443.0 6.0 -316.0 116.0 -7.0 169.0 • -260.0 -137.0 17.0 100.0 -954.0 -353.0 • U41737_at Pancreatic beta cell growth factor (INGAP) mRNA 15.0 -87.0 11.0 173.0 177.0 • -105.0 35.0 13.0 53.0 8.0 25.0 28.0 21.0 61.0 -8.0 75.0 24.0 • -135.0 55.0 162.0 139.0 22.0 -89.0 13.0 -177.0 -384.0 45.0 -38.0 -38.0 • -136.0 -152.0 -42.0 -85.0 -31.0 70.0 -76.0 -74.0 -50.0 29.0 -81.0 145.0 • 42.0 -79.0 25.0 18.0 -20.0 44.0-78.0 192.0 -66.0 -73.0 -39.0 57.0 • -122.0 -90.0 25.0 -10.0 -80.0 -306.0 -3.0 • 60 2 1 • # class0 class1 • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  11. Method • The individual consist of 5 trees (mating pools) • N=60 fold generates 60 random models • N =60 fold is repeated 10 times • 600 Predictive Patent Survival Models • if Tree(i=1..5)>0, GP model positive (node) • Genetic modifications in tree 1 and 2 • Trees may specialize(tissue) • Program Fitness (Pos/Neg) Accuracy > .5

  12. GP Conditions • Terminals ( µarray data) • Functions(+,-,/,*,exp,<,> ..) • Fitness Measurement(Data) • Program Control(loop,time) • Termination(Generations)

  13. GP DNA Parameters

  14. GP 1st/2nd Data Mining • 600 GP models • 6970 of 7129 Attributes in GP Models • 404 Genes in ten or more GP Models • 404 Genes were used in 2nd GP run • Two Genes in 100 GP models • U08998 - 182 GP Models • U41737 – 193 GP Models

  15. Gene Biology • Genes NOT highly Expressed • Not Found in Pomeroy Kmeams Cluster Analysis • U08998_at • TAR RNA binding protein – promotes cancer • TARBP1 GeneCard • U41737_at • Pancreatic beta cell growth • REG3A GeneCard

  16. Gene Frequency 2nd GP

  17. Final GP • Limited number of functions • Single IF statements ( <,>,,≤) • Random generation of function and Genes • N=60 fold times 10 accuracy = 68% • 147 of 192 were incorrect predictors • 39 of 192 were correct two gene predictors

  18. Two Gene Profile

  19. Two Gene Outcome •  Survived/Predicted Correct –TP •  Failed Treatment/Predicted Wrong – FP • ⃟ Survived/Predicted Wrong – FN •  Failed Treatment/Predicted Correct –TN • Darken points poor predictors • GP Model predictor: • -42 < U41737_at + 2*U0998_at

  20. Limitations • Extensive computer resources( exponential) • NP solution • Only heuristic optimal solution • Replications of the random selection process with various genetic evolutionary change rates, can cause different results

  21. Bioinformatics • Allows the selection of low expression gene into predictive model • New information can be harvested by repeating execution of GP • 5 tree members can be isolated members of different organ tissues • Disease treatment, prediction and cured

  22. References • 1 J. DeRisi, et al. 1998. The transcriptional program of sporulation in budding yeasts. • Science 282:699-705 • 2Mitra, A; Almal, A. ; George, B.;Fry,D. ; Lenehan et. al, The use of genetic programming analysis of quantitative expression profiles… BMC Cancer 206;6:159. • 3University of Manchester GP Web Site URL • : http://dbkgroup.org/gp_home.htm • 4Biolograhy of GP references: • http://liinwww.ira.uka.de/bibliography/Ai/genetic.programming.html • 5Langdon,L.; and Poli, R. Foundations of Genetic Programming ,Springer –Verlag , Berlin. 2001 • 6Koza,John; Bennett, F.;Andre, D. and Keane, Martin. Genetic Programming, Morgan Kaufmann Publishing, San Francisco, 1999. • 7 Hartl, D. and Jones, E. 2002. Essential Genetics 3rd ed. Boston, MA. : .Jones and Bartlett Publishers

More Related