260 likes | 427 Views
“ The World's 1st Ultra High throughput Genomics Data Platform ”. Introductions. Robert George Hercus – CTO/MD Over 30 years IT experience Pioneered many large-scale IT projects “Language of Biology” basis of Synamatix
E N D
“The World's 1st Ultra High throughput Genomics Data Platform”
Introductions Robert George Hercus – CTO/MD • Over 30 years IT experience • Pioneered many large-scale IT projects • “Language of Biology” basis of Synamatix • Synamatix 1 of 4 companies in group: Linguamatix, Neuramatix and Viramatix • Interests: Linguistics, Genomics, Artificial Intelligence, Neuronal Networks
What we are NOT….what we ARE.. • NOT ABOUT: • Data content • Flat file, hierarchical or Relational Databases • Replacement of existing browser/mapping tools: e.g.EnsEMBL • Just a Blast alternative…….. • Applications – IP is within SynaBASE • ARE ABOUT: • NEW CONCEPT: Proprietary pattern based approaches to construct 1st “structured-network data system” for biological data • ITERATION, SPEED and EXTENSIBILITY: enable single data system for a wide array of biological data – enabling enterprise wide capability • Providing tools, Know-how and Technologies to assist in understanding and ultimately defining the “language” of biological data – FUNCTIONAL GENOMICS
Basis of IP – Human Brain>SynaBASE This is SynaBASE: Identifies and learns patterns without human supervision or training sets. Synabase maintains patterns & their relationships Highly efficient data structures and relationships between data elements are constructed Applications becomes more accurate and efficient as more data is added
A structural database for genomics data, SynaBASE* *patents pending
4 unique features
Patterns and structures • Finds, Stores, Relates & Structures • PATTERNS, • not FLAT FILES
Similarity of DATA Common PATTERNS and functionality What makes Synamatix UNIQUE - 1 What do we know about data ? • There is no evolutionary need to preserve non-functional sequence patterns • Evolution requires the conservation of patterns which are at least functionally equivalent, or functionally better • Significant patterns and their relationships are extended and maintained by SynaBASE
SynaBASE * - A structured network database *patents pending
2. Significance and Frequency SynaBASE automatically learns and maintains the significance of patterns and data
Significance Fixed length K-mers are inappropriate The elephant and the giraffe walked up the mountain A graph showing Frequency of “string (word)” patterns in a sentence does not reflect meaning The elephant and the giraffe walked up the mountain A graph showing Probabilities of predicting Precessor and Successor characters (string Significance) reflects true meaning
HUMAN Placental ribonuclease inhibitor FREQUENCY SIGNIFICANCE
3. Scale and Speed Unique method for structuring data leads to Ultra-high-throughput applications becoming routinely accessible
All Prokaryotes Proteomics Any data All Eukaryotes Protein Interactions Human Array analysis Text Phylogenetics Virus Mouse Non-Sequence data What makes Synamatix UNIQUE - 3 All genomes!! All proteomes!! Multiples…. Array data Sequence data 3rd Party
Multi-genome scalability – flat file db 10 Genome 10 – 99.9% 9 Genome 9 – 99.9% 8 Genome 8 – 99.9% 7 Genome 7 – 99.9% Size of database 6 Genome 6 – 99.9% 5 Genome 5 – 99.9% 4 Genome 4 – 99.9% 3 Genome 3 – 99.9% 2 Genome 2 – 99.9% 1 Genome 1 2 4 6 8 10 Number of Human genome copies
Multi-genome scalability – SynaBASE 10 9 8 7 Size of database 6 5 4 Genome 10 – 99.9% Genome 9 – 99.9% Genome 8 – 99.9% Genome 7 – 99.9% 3 Genome 6 – 99.9% Genome 5 – 99.9% Genome 4 – 99.9% 2 Genome 3 – 99.9% Genome 2 – 99.9% 1 Genome 1 2 4 6 8 10 Number of Human genome copies
Analysis speed scales at logn base 2 Speed milliseconds 1000 900 Conventional 800 700 SynaBASE 600 500 400 300 200 100 1 10 100 1000 Size of database giga bp
What makes Synamatix UNIQUE - 4 Massively Parallel Single Molecule Sequencing analysis Real-time Proteomics Comparative genomics Probe design / testing Personalised medicine Clinical Diagnostics Ultra High Throughput (UHT)
Architecture 3rd party Applications SUITE
Users Windows / Linux Linux Itanium C++ Java Java Servlets HTML Application Servers WWW Interface Architecture Custom Applications SUITE
Summary • Unique pattern network dB • Maintains patterns and their relationships • Able to derive Significance from data “a priori” • Self learning mechanism • Accuracy • Developed world’s 1st genomics platform capable of addressing demanding new applications: • Scalable and efficient ultra volume storage • Ultra-high-throughput genome analysis • Personalised medicine and the $1000 genome