200 likes | 284 Views
Semantic Web Technologies on HPC for Life Sciences and Other Domains. Sean Martin Founder & CTO Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341. Semantic Web Technologies on HPC for Life Sciences and Other Domains. Sean Martin Founder & CTO Cambridge Semantics Inc.
E N D
Semantic Web Technologies on HPC for Life Sciences and Other Domains Sean Martin Founder & CTO Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341
Semantic Web Technologies on HPC for Life Sciences and Other Domains Sean Martin Founder & CTO Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341
What is/are Semantic Technologies anyway? Semantics (from Greeksēmantiká, neuter plural of sēmantikós)is the study of meaning. 10 Semantics experts in a room = 11 opinions
Little “s” semantics • Usually proprietary, mostlyheuristics/statistics based • Search (not query) • Usually extract meaning from unstructured data (text/video etc) • Examples: • or • Enterprise search e.g. or • Entity extraction, automated tagging, text analytics • Natural Language Processing Technologies (NLP) • Automated Translation e.g. Google Translate • SMILA & UIMA open source frameworks
Big “S” Semantics – Paint starting to dry • W3C recommendations (open data standards) • Machine readable, query (not search) & instant data integration • The Semantic Web • Also known as “Linked Open Data” • Also known as “Web 3.0 • Examples: • Google “rich snippets” • OpenGraph • The Good Relations Ontology e.g. • PublicGovernment Data (USA, Europe, UK) • All sorts of startup activity
What are the W3C’s Open Data Standards? • RDF • OWL • SPARQL There are others, but these are the key ones
RDF • Self describing (tagged) instance data • Facts or Triples : <subject> <predicate> <Object/Value> • Collections of triples creates a directed labeled graph <subject> and <predicate> are globally unique strings or URIs e.g. http://www.cambridgesemantics.com/people/sean
OWL • OWL (Web Ontology Language) • Describe data models in a way that domain expert would • What triples or facts are needed to properly describe something and its relationship to other similarly described things? • Relationships for inference and other kinds of reasoning
SPARQL • The first standards based distributed query language for RDF data & the Web • Wow!
Important properties of RDF • Machine readable model / programs can “understand” • Unique Identity of every data element • Subject is a unique identifier • Predicates (the relationship) is also a unique identifier • Object can be a unique identifier pointing to another subject • That’s how we get directed graphs • Allows annotation (the unique subject string provides an “anchor” for 3rd party metadata) • Allows provenance (especially useful when data travels beyond its source system or needs to be updated) • Semantic Type (not just primitive data types) • Lets programs immediately know what type of data they are dealing with, allowing automated contextualization of information
So what does any of this change? • Adoption of the semantic standards will be disruptive in at least two ways that create enormous value • Who can do what. Much easier. • Pushing the bar further and further towards end user self-service 2. How long it takes. Much faster. • Each new wave of technology brings at least an order of magnitude productivity increases, often more Recent waves: Web Services/SOA; Java (no memory management); Virtualization etc. • Semantic technology is another wave
Where do these benefits come from? • Using Semantic Technologies, the end users understanding of their data need be the only system or application model required • This allows the construction of applications & systems to move from what have until now been carefully planned, structure dependent “all up front” designs over to malleable conceptual representations that can be evolved quickly • Systems go from being brittle to flexible • Systems can change at the speed the business does • End Users can increasingly make more of these changes directly themselves
Preserving the end users model Traditional middleware Semantic middleware Users Model* *Warning: dramaticallyover simplified to make a point • Relational Model Physical • Relational Model Logical • Object Relational Model • Business Objects Model • User Interface Model • Users idea of the Model
Paying the price for all this flexibility • Exploding data volumes • tagging creates 10x more data • Random Access is expensive • >35 Years of optimization around RDBMS is not helping • too many “self-joins” on a three column table • No index support • Adding an additional layer of indirection is expensive • every time you want to display a value you need to dereference it
Paying the price for all this flexibility – enabling trends • W3C Semantic standards • A decade of semantic middleware+storageR&D • Multi-core CPUs • Fast networking • Cheap RAM • Web 2.0 blazing the trail with a new RAM based application model? Disk is the new tape? Twitter, Facebook, LinkedIn and iostat • SSD • The changing cost of the sub 4k random access read and what it means to transaction processing systems and the applications that run on them
Spot the difference Then.. Now
And finally, so what does any of this have to do with HPC? • Cray’s XMT Systems + Very large quantities of RAM arranged in a contiguous block + Very low latency memory access + Large number of CPUs + Large number of cheap threads = Full pipelines • Great for interactive applications creating random access queries patterns, particularly complex ones requiring many joins
Other HPC related Semantic efforts • Raytheon BNN’s SPARQL on MapReduce clusters • WebPie – VU University of Amsterdam’s OWL Horst Inference on MapReduce • Clustered RDF triple stores • Open Link’s Virtuosa data store • Ontotext’s Big OWLIM • Franz Inc’sAllegroGraph
Semantics & the Enterprise – not waiting for the network effect Overview of Cambridge Semantics Middleware Platform • Allow business users & customers /partners to: • Discover & connect to any data in databases & other systems on the fly • Create dashboards & applications on demand • Allows IT to: • Rapidly integrate data across silos and firewalls • Expose business policies, rules & workflow to business users • Implement manual intervention with automated response • Enterprise-class security, governance, provenance, … A W3C-based semantic middleware for real-time user driven operational intelligence
Thanks for listening • Further Interest and a completely different view • Sir Tim Berners-Lee’s TED Talk on the next web • Questions/Objections? • Stop me & ask/state • Contact details again Sean Martin Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341