1 / 43

“the world’s 1 st structured network pattern database technology”

“the world’s 1 st structured network pattern database technology”. Introductions. Robert Hercus - CSO & Founder Australian, over 30 years IT experience Pioneered many large-scale IT projects “Language of Biology” basis of Synamatix

syshe
Download Presentation

“the world’s 1 st structured network pattern database technology”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “the world’s 1st structured network pattern database technology”

  2. Introductions Robert Hercus - CSO & Founder • Australian, over 30 years IT experience • Pioneered many large-scale IT projects • “Language of Biology” basis of Synamatix • Interests: Linguistics, Genomics, Artificial Intelligence, • Neural Networks

  3. 4 Common perceptions…. X 1. Bioinformatics companies are applications dependent 2. Using your applications will mean that we are locked in…. 3. Buying proprietary software means we cannot modify or understand how it works 4. We will have to replace current software and investments • No, Synamatix is a database technology company that ALSO develops applications to demonstrate its technology X • No, YOU can develop your own applications, or ask someone else to do it X • No, Synamatix gives away the source code for applications built upon SynaBASE X No, SynaBASE is designed to be open to enable integration with EXISTING IT infrastructure and software investments and installations

  4. Not just target discovery…broad applications Clinical drug candidate Lead Optimisation Lead ID Toxicology & Assays Target ID

  5. Personal Genomics Proteomics Non-sequence data Phylogenetics Comparative Genomics Chip-design Sequence Mining Mapping Motifs Annotate Clustering & Assembly

  6. Open, shareable applications

  7. Open, shareable applications Data repository Internal software Bought in software 1 1 2 2 3 3 4 4 Integration Interface 5 5

  8. Open application development Internal software Bought in software Data repository 1 1 2 2 3 3 External development 4 4 Internal development 5 5

  9. 5 Unique features

  10. Patterns and structures • Finds, Stores, Relates & Structures • PATTERNS, • not FLAT FILES

  11. Patterns & Network 1

  12. 1 Significance 2 5 Novel Applications Patterns & Network Scale Speed 4 3

  13. T TG TGG TGGT TGGTG TGGTGT TGGTGTA Patterns – forward and reverse ATGTGGT redraw A AT ATG ATGT ATGTG ATGTGG

  14. Patterns – all fwd intermediates ATGTGGT AT ATG ATGT ATGTG ATGTGG TG TGT TGTG TGTGG TGTGGT GT GTG GTGG GTGGT TG TGG TGGT GG GGT GT

  15. Patterns – all rev intermediates ATGTGGT TG TGG TGGT TGGTG TGGTGT GG GGT GGTG GGTGT GGTGTA GT GTG GTGT GTGTA TG TGT TGTA GT GTA TA

  16. GT TGGTGTA TGGTGT TGGTG GGTG AT TG TGG GGT ATGT TGGT TG GTGT A TGT TGT GGTGT GGTGTA T TGTG TGG ATG ATGTGG ATGTG ATGTGGT ATG TGGTG GT GTG GTGGT GTGTA TGGT GTG AT GT TGTA TG ATGT GG TG GTGG GT GTA TGTGG TGG TGGTGT TG TA TGGT GGT ATGTG GG SynaBASE is 100% exhaustive

  17. TEMPORAL/SPATIAL “events related by time or proximity are associated” A B C ASSOCIATIVE • Precise • Can recall • Computationally simple • Multi-level network structure • Updating is simple dt

  18. The end result…a structured network pattern database.. ?

  19. SynaBASE can address diverse data types • Patterns can be associated based upon TEMPORAL or SPATIAL characteristics • Sequence data – SPATIAL/TEMPORAL • Protein data – INTERACTION • Gene expression – TEMPORAL • Phylogeny – DISTANCE MEASURES • Text mining – SPATIAL INTERACTIONS • Transcription factors – REGULATORY INTERACTIONS • etc…

  20. 1. Patterns and structures 2. Significance and Frequency SynaBASE automatically learns and maintains the significance of patterns and data

  21. Relying on frequency alone is inadequate… The elephant and the giraffe walked up the mountain A graph showing Frequency of  “string (word)” patterns in a sentence does not reflect meaning The elephant and the giraffe walked up the mountain A graph showing Probabilities of predicting Precessor and Successor Characters/events (string Significance) reflecting meaning

  22. Significance – forward and reverse elephant

  23. Maximum of fwd & rev significance is plotted elephant

  24. a 64000 at 17000 (17/64=26%) atg 4930 (4930/17000=29%) atgg 1725 (1725/4930=34%) atggt 760 (760/1725=44%) atggtg 500 (500/760=66%) atggtgat atggtga 355 (355/500=71%) atggtgat 266 (266/355=75%)

  25. Frequency v Significance FREQUENCY SIGNIFICANCE Human placental ribonuclease inhibitor

  26. Gene models and “SIGNIFICANCE” correlation Ensembl Gene F2 F3 PIM1 Oncogene

  27. SIGNIFICANCE and conservation Multi Species Comparison as presented by: Eric Green ISMB 2004

  28. Pattern Significance 1st 500 KBP of hu.ch7 6 Genome db 3.80s Human Genome 7.41s Mouse Genome 3.88s Dog Genome 6.01s

  29. Patterns and structures • SIGNIFICANCE • 3. Scale and 4. Speed • Unique method for structuring data leads to • Ultra-high-throughput applications becoming routinely accessible

  30. 10 Genome 10 9 Genome 9 8 Genome 8 7 Genome 7 Size of database 6 Genome 6 5 Genome 5 4 Genome 4 3 Genome 3 2 Genome 2 1 Genome 1 2 4 6 8 10 Number of Human genome copies

  31. 10 9 8 7 6 Size of database 5 4 Genome 10 Genome 9 Genome 8 Genome 7 3 Genome 6 Genome 5 Genome 4 2 Genome 3 Genome 2 1 Genome 1 2 4 6 8 10 Number of genomes

  32. Analysis speed scales at logn base 2 Speed milliseconds 900 800 Conventional 700 SynaBASE 600 500 400 300 200 100 Size of database giga bp 1 10 100 1000

  33. Building a SynaBASE is fast! Swissprot raw sequence data 8 minutes Search Compare Keywords Annotations Fast & Significant results

  34. All Prokaryotes All genomes!! All variants!! All Multiples…. All Eukaryotes Human Virus Mouse Analyse a marker across 100s of genomes in 100 milliseconds All Plants Sequence data

  35. Compare across genomes in hours

  36. Personal genome & personalised medicine Human wt Cancer Mouse Dog Ultra-high-throughput Biomarker mapping and analysis

  37. Patterns and structures • SIGNIFICANCE • Speed and 4. Scale • 5. Future proof • A non brute force investment, • hardware independence leads to novel applications • or challenging research projects

  38. 3rd party Applications

  39. Users Windows / Linux Linux Itanium C++ Java Java Servlets HTML Application Servers WWW Interface Custom Applications SUITE

  40. Massively Parallel Single Molecule Sequencing analysis Real-time Proteomics Comparative genomics Probe design / testing Personalised medicine Clinical Diagnostics Ultra High Throughput (UHT)

  41. Summary • Unique pattern network dB • Maintains patterns and their relationships • Able to derive Significance from data “a priori” • Self learning mechanism • Accuracy & Speed • Developed world’s 1st genomics platform capable of addressing demanding new applications: • Truly future – Hardware independent • Scalable • Ultra-high-throughput genome analysis z155801

More Related