1 / 19

Limsoon Wong Laboratories for Information Technology Singapore

From Informatics to Bioinformatics. Limsoon Wong Laboratories for Information Technology Singapore. What is Bioinformatics?. Themes of Bioinformatics. Bioinformatics = Data Mgmt + Knowledge Discovery Data Mgmt = Integration + Transformation + Cleansing Knowledge Discovery =

makya
Download Presentation

Limsoon Wong Laboratories for Information Technology Singapore

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Informatics to Bioinformatics Limsoon Wong Laboratories for Information Technology Singapore

  2. What is Bioinformatics?

  3. Themes of Bioinformatics Bioinformatics = Data Mgmt + Knowledge Discovery Data Mgmt = Integration + Transformation + Cleansing Knowledge Discovery = Statistics + Algorithms + Databases

  4. Benefits of Bioinformatics • To the patient: • Better drug, better treatment • To the pharma: • Save time, save cost, make more $ • To the scientist: • Better science

  5. From Informatics to Bioinformatics MHC-Peptide Binding (PREDICT) Protein Interactions Extraction (PIES) 8 years of bioinformatics R&D in Singapore Gene Expression & Medical Record Datamining (PCL) Cleansing & Warehousing (FIMM) Gene Feature Recognition (Dragon) Integration Technology (Kleisli) Venom Informatics 1994 1996 1998 2002 2000 ISS LIT KRDL

  6. Data Integration A DOE “impossible query”: For each gene on a given cytogenetic band, find its non-human homologs.

  7. Data Integration Results sybase-add (#name:”GDB", ...); create view Lfromlocus_cyto_locationusingGDB; create view Efromobject_genbank_erefusingGDB; select #accn: g.#genbank_ref, #nonhuman-homologs: H from Lasc, Easg, {selectu fromg.#genbank_ref.na-get-homolog-summaryasu wherenot(u.#title string-islike "%Human%") andalso not(u.#title string-islike "%H.sapien%")}asH where c.#chrom_num = "22” andalso g.#object_id = c.#locus_id andalso not (H = { }); • Using Kleisli: • Clear • Succinct • Efficient • Handles • heterogeneity • complexity

  8. Data Warehousing {(#uid: 6138971, #title: "Homo sapiens adrenergic ...", #accession: "NM_001619", #organism: "Homo sapiens", #taxon: 9606, #lineage: ["Eukaryota", "Metazoa", …], #seq: "CTCGGCCTCGGGCGCGGC...", #feature: { (#name: "source", #continuous: true, #position: [ (#accn: "NM_001619", #start: 0, #end: 3602, #negative: false)], #anno: [ (#anno_name: "organism", #descr: "Homo sapiens"), …] ), …)} • Motivation efficiency availabilty “denial of service” data cleansing • Requirements efficient to query easy to update. model data naturally

  9. Data Warehousing Results ! Log in oracle-cplobj-add (#name: "db", ...); ! Define table create tableGP (#uid: "NUMBER", #detail: "LONG") usingdb; ! Populate table with GenPept reports select#uid: x.#uid, #detail: xintoGP fromaa-get-seqfeat-general "PTP”asx usingdb; ! Map GP to that table create viewGPfrom GPusingdb; ! Run a queryto get title of 131470 selectx.#detail.#title fromGPasx wherex.#uid = 131470; Relational DBMS is insufficientbecauseit forces us to fragment data into 3NF. Kleisli turns flat relational DBMS into nested relationalDBMS.It can use flat relational DBMS such as Sybase, Oracle, MySQL, etc. to be its update-able complex object store.

  10. Epitope Prediction TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSE EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN

  11. 1 66 100 Epitope Prediction Results • Prediction by our ANN model for HLA-A11 • 29 predictions • 22 epitopes • 76% specificity • Prediction by BIMAS matrix for HLA-A*1101 Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%) Rank by BIMAS

  12. Transcription Start Prediction

  13. Transcription Start Prediction Results

  14. Medical Record Analysis • Looking for patterns that are • valid • novel • useful • understandable

  15. Gene Expression Analysis • Classifying gene expression profiles • find stable differentially expressed genes • find significant gene groups • derive coordinated gene expression

  16. Medical Record & Gene Expression Analysis Results • PCL, a novel “emerging pattern’’ method • Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks • Works well for gene expressions Cancer Cell, March 2002, 1(2)

  17. WEB Protein Interaction Extraction “What are the protein-protein interaction pathways from the latest reported discoveries?”

  18. Protein Interaction Extraction Results • Rule-based system for processing free texts in scientific abstracts • Specialized in • extracting protein names • extracting protein-protein interactions Jak1

  19. Vladimir Bajic Vladimir Brusic Jinyan Li See-Kiong Ng Limsoon Wong Louxin Zhang Allen Chong Judice Koh SPT Krishnan Huiqing Liu Seng Hong Seah Soon Heng Tan Guanglan Zhang Zhuo Zhang Behind the Scene and many more: students, folks from geneticXchange, MolecularConnections, and other collaborators….

More Related