1 / 19

Limsoon Wong Kent Ridge Digital Labs Singapore

From Informatics to Bioinformatics. Limsoon Wong Kent Ridge Digital Labs Singapore. What is Bioinformatics?. What are the Themes of Bioinformatics?. Bioinformatics = Data Mgmt + Knowledge Discovery Data Mgmt = Integration + Transformation + Cleansing Knowledge Discovery =

Download Presentation

Limsoon Wong Kent Ridge Digital Labs Singapore

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Informatics to Bioinformatics Limsoon Wong Kent Ridge Digital Labs Singapore

  2. What is Bioinformatics?

  3. What are the Themes of Bioinformatics? Bioinformatics = Data Mgmt + Knowledge Discovery Data Mgmt = Integration + Transformation + Cleansing Knowledge Discovery = Statistics + Algorithms + Databases

  4. What are the Benefits of Bioinformatics? • To the patient: • Better drug, better treatment • To the pharma: • Save time, save cost, make more $ • To the scientist: • Better science

  5. Data Integration • A DOE “impossible query”: For each gene on a given cytogenetic band, find its non-human homologs.

  6. Data Integration Results sybase-add (#name:”GDB", ...); create view L from locus_cyto_location using GDB; create view E from object_genbank_eref using GDB; select #accn: g.#genbank_ref, #nonhuman-homologs: H from L as c, E as g, (select u from g.#genbank_ref.na-get-homolog-summary as u where not(u.#title string-islike "%Human%") andalso not(u.#title string-islike "%H.sapien%")) as H where c.#chrom_num = "22” andalso g.#object_id = c.#locus_id andalso not (H = { }); • Using Kleisli: • Clear • Succint • Efficient • Handles • heterogeneity • complexity

  7. Data Warehousing {(#uid: 6138971, #title: "Homo sapiens adrenergic ...", #accession: "NM_001619", #organism: "Homo sapiens", #taxon: 9606, #lineage: ["Eukaryota", "Metazoa", …], #seq: "CTCGGCCTCGGGCGCGGC...", #feature: { (#name: "source", #continuous: true, #position: [ (#accn: "NM_001619", #start: 0, #end: 3602, #negative: false)], #anno: [ (#anno_name: "organism", #descr: "Homo sapiens"), …] ), …)} • Motivation • efficiency • availabilty • “denial of service” • data cleansing • Requirements • efficient to query • easy to update. • model data naturally

  8. Data Warehousing Results • Relational DBMS is insufficient becauseit forces us to fragment data into 3NF. • Kleisli turns flat relational DBMS into nested relationalDBMS. It can use flat relational DBMS such as Sybase, Oracle, MySQL, etc. to be its updatable complex object store. It can even use all of these systems simultaneously! ! Log in oracle-cplobj-add (#name: "db", ...); ! Define table create table GP (#uid: "NUMBER", #detail: "LONG") using db; ! Populate table with GenPept reports select #uid: x.#uid, #detail: x into GP from aa-get-seqfeat-general "PTP” as x using db; ! Map GP to that table create view GP from GP using db; ! Run a queryto get title of 131470 select x.#detail.#title from GP as x where x.#uid = 131470;

  9. Epitope Prediction TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSE EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN

  10. 1 66 100 Epitope Prediction Results • Prediction by our ANN model for HLA-A11 • 29 predictions • 22 epitopes • 76% specificity • Prediction by BIMAS matrix for HLA-A*1101 Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%) Rank by BIMAS

  11. Gene Expression Analysis • Clustering gene expression profiles • Classifying gene expression profiles • find stable differentially expressed genes

  12. Gene Expression Analysis Results • The Discovery System • Correlation test • Voter selection • Class prediction

  13. WEB Protein Interaction Extraction “What are the protein-protein interaction pathways from the latest reported discoveries?”

  14. Protein Interaction Extraction Results • Rule-based system for processing free texts in scientific abstracts • Specialized in • extracting protein names • extracting protein-protein interactions Jak1

  15. Transcription Start Prediction

  16. Transcription Start Prediction Results

  17. Medical Record Analysis • Looking for patterns that are • valid • novel • useful • understandable

  18. Medical Record Analysis Results • DeEPs, a novel “emerging pattern’’ method • Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks • Works for gene expressions

  19. Research Vladimir Bajic Vladimir Brusic Jinyan Li See-Kiong Ng Limsoon Wong Louxin Zhang Business Peter Saunders Industry Assignees Hao Han (gX) Rahul Despande (MC) Engineering Allen Chong Judice Koh SPT Krishnan Seng Hong Seah Guanglan Zhang Zhuo Zhang Students Huiqing Liu Song Zhu Kun Yu Behind the Scene

More Related