170 likes | 319 Views
Datamining: Turning Biological Data into Gold. Limsoon Wong KRDL. Jonathan’s blocks. Jessica’s blocks. Whose block is this?. What is Datamining?. Jonathan’s rules : Blue or Circle Jessica’s rules : All the rest. What is Datamining?. Question: Can you explain how?.
E N D
Datamining: Turning Biological Data into Gold Limsoon Wong KRDL
Jonathan’s blocks Jessica’s blocks Whose block is this? What is Datamining? Jonathan’s rules : Blue or Circle Jessica’s rules : All the rest
What is Datamining? Question: Can you explain how?
What are the Benefits? • To the patient: • Better drug, better treatment • To the pharma: • Save time, save cost, make more $ • To the scientist: • Better science
Epitope Prediction TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSE EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
1 66 100 Epitope Prediction Results • Prediction by our ANN model for HLA-A11 • 29 predictions • 22 epitopes • 76% specificity • Prediction by BIMAS matrix for HLA-A*1101 Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%) Rank by BIMAS
Gene Expression Analysis • Clustering gene expression profiles • Classifying gene expression profiles • find stable differentially expressed genes
Gene Expression Analysis Results • The Discovery System • Correlation test • Voter selection • Class prediction
WEB Protein Interaction Extraction “What are the protein-protein interaction pathways from the latest reported discoveries?”
Protein Interaction Extraction Results • Rule-based system for processing free texts in scientific abstracts • Specialized in • extracting protein names • extracting protein-protein interactions Jak1
Medical Record Analysis • Looking for patterns that are • valid • novel • useful • understandable
Medical Record Analysis Results • DeEPs, a novel “emerging pattern’’ method • Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks • Works for gene expressions
Under the Hood • Artificial neural network • Neighbourhood analysis • Non-linear analysis • Template matching • Emerging pattern • Hidden markov models • Bayesian inference • Decision tree induction • ...
Epitope Prediction Vladimir Brusic Judice Koh Seah Seng Hong Zhang Guanglan Yu Kun Transcription Start Prediction Vladimir Bajic Seah Seng Hong Gene Expression Analysis Zhang Louxin Zhang Zhuo Zhu Song Medical Records Li Jinyan Protein Interaction Extraction Ng See Kiong Zhang Zhuo Behind the Scene