Web X. 0, NoSQL DBs and the Semantic Web

Web X.0, NoSQL DBsand the Semantic Web Directions…

Quick review: Web development frameworks • Web 2.0/3.0 is about making websites faster, smarter, more media rich and more intuitive • There is a generation of web development frameworks that focus on faster and smarter, they attack the back end, not the front end • Ruby on Rails, Grails, Django, Symfony, and others • They tend to use some kind of relational to object mapping/wrapping • They support and in fact enforce the MVC approach to developing websites – Model (the database with mapping/wrapping), View (web pages), Controller (pieces of code that map view manipulation into model manipulations and vice-versa). • They are AJAX friendly

MVC

Client-side web development • There is another generation of web frameworks that focus on making it easy to create rich web interfaces • Flash Builder (gone open source from Adobe) • Silverlight (from Microsoft and perhaps dead?) • They support 2D and some 3D graphics • They use upfront loading to minimize interaction with server • There is a newer effort involving HTML5 • Graphics is supported, with 2D and some 3D • Local storage with simple insert, delete, can use SQLite • Better multimedia support • More powerful Javascript libraries are coming out, e.g. JQuery, as well

Important to note • Web X.0 efforts try to make use of graphics in interfaces, as well as provide better displaying of media • But supporting blob and continuous data access is still very rudimentary (images, video, audio, etc.) • Problem: we cannot screen media in real time • Problem: it is very difficult to capture the semantics of media • The solution: We tend to build accompanying meta databases with tag sets (one per piece of media) assigned by experts using specialized namespaces. • To enhance accuracy, there is sometimes a feedback loop where users can train the search facility

Quick review: the Semantic Web • This is oriented around making the web more automatically searchable • Main foci: • Assertions and inferences • Exposing databases that contain “hidden” data • Searching of media bases (blog and continuous), i.e., exposing them • Searching document bases, i.e., exposing them • Data mining

Querying the Semantic Web • RDF - triples • We can use URI’s for all three pieces of a triple • SPARQL - triples query language, used for spanning Web boundaries • Example: THE BALL is ORANGE. ORANGE is an UGLY COLOR. The inference we can make is THE BALL has an UGLY COLOR

An RDF Example xmls:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”> xmls:zx=”http://www.someurl.org/zx/”> <rdf:Description rdf:about=”http://www.awebsite.org/index.html”> <zx:topic>funstuff</zx:topic> </rdf:Description> <rdf:Description <zx:created-by>http://www.anotherurl.org/buzz</zx:created-by> </rdf:Description> <rdf:Description rdf:about=”http://www.anotherurl.org/buzz”> <zx:is>http://www.yetanotherurl.org/professor</zx:Is> </rdf:Description>

The assertions and an inference • www.awebsite.org/index.html <topic> funstuff • The topic of the resource at www.awebsite.org/index.html is funstuff • www.awebsite.org/index.html <created-by> http://www.anotherurl.org/buzz • www.awebsite.org/index.html was created by someone who is identified by the url http://www.anotherurl.org/buzz. • We see that the value in the first triple, which concerns the “topic” of our resource, consists of a character string, but the value in the second triple, which concerns the “created-by” of our resource, is actually a URL.

SPARQL • SPARQL stands for Protocol And RDF Query Language, with an S tossed into the beginning so we can say it as “sparkle”. • It is a language that can be used to traverse graphs that consist of RDF triples that are chained together into an object network. • prefix website1: <http://awebsite.org/ > SELECT ?x WHERE { website1:was-created-by ?x } • This code will find the creators of http://awebsite.org • It will search through all of these triples and find the ones of interest to us, and then pluck off the names of the creators. • These triples could be distributed all around the Web

The Semantic Web, continued • Main tools • Namespaces posted on web and shared • XML • Ontologies of assertions • Tall people play basketball – Joe is tall (note both schema and instance based) • Walking paths linked by assertions with languages like SPARQL • Forming inferences from assertions along the way • XML extensions to accommodate complex data and non-string data and querying of large datasets • Support pointers to namespaces • Support complex, non-textual documents, along with object IDs, keys and foreign keys

XML

Continued…

Accommodating complex data • Schemas • Initially – DTDs • Later – XML schema • Save schema fragments and import them • Non-string data types • Keys and FKs • Type constructors • Primitive – integer, float, boolean, date, ID • Simple – list, union • Complex – groups of elements

Data types in XML Schema

Continued…

DTDs

XML schema and namespaces

XPATH – for searching XML schemas hierarchically • An XPath expression takes a document tree as input and returns a multi-set of nodes of the tree • Expressions that start with / are absolute path expressions • Expression / – returns root node of XPath tree • /Students/Student – returns all Student-elements that are children of Students elements, which in turn must be children of the root • /Student – returns empty set (no such children at root

XPATH continued • Current (or context node) – exists during the evaluation of XPath expressions (and in other XML query languages) • . – denotes the current node; .. – denotes the parent • foo/bar – returns all bar-elements that are children of foo nodes, which in turn are children of the current node • ./foo/bar – same • ../abc/cde – all cde e-children of abc e-children of the parent of the current node • Expressions that don’t start with / are relative (to the current node)

Attributes, text, … • /Students/Student/@StudentId – returns all StudentId a-children of Student, which are e-children of Students, which are children of the root • /Students/Student/Name/Last/text() – returns all t-children of Last e-children of … • /comment( ) – returnscomment nodes under root • XPath provides means to select other document components as well

XQuery • General structure: FOR variable declarations WHERE condition RETURN document • Example: (: students who took MAT123 :) FOR $t IN doc(“http://xyz.edu/transcript.xml”)//Transcript WHERE $t/CrsTaken/@CrsCode = “MAT123” RETURN $t/Student • Result: <Student StudId=“111111111” Name=“John Doe” /> <Student StudId=“123454321” Name=“Joe Blow” />

XML and Web X.0: Flash Builder <?xml version="1.0" encoding="utf-8"?> <s:Applicationxmlns:fx="http://ns.adobe.com/mxml/2009" xmlns:s="library://ns.adobe.com/flex/spark" xmlns:mx="library://ns.adobe.com/flex/mx" minWidth="955" minHeight="600"> <fx:Declarations>  </fx:Declarations> <mx:DateChooser x="527" y="142"/> </s:Application>

Results in

Semantic Web big problems • Massive reengineering effort to make use of Semantic Web technology • Assertions that span nodes can be extremely time consuming to traverse • Making media accessible • Easy enough to generate low level assertions automatically • Very time consuming to add assertions manually by experts • Our main tools are tagging and image/sound processing packages that are very complex and very heuristic driven • XML Schema, the big XML extension, is unwieldy

Web X.0 big problems • We are not just trying to search relational databases • Graphics is often used in a gratuitous, non-useful, even distracting fashion, and they eat up download time and computational time • We still cannot manipulate or search or interpret media

Comparison with NoSQL DBs • Key-document and key-value databases are a way of organizing document and value (blob) and continuous databases so they can be searched quickly by next generation web applications, as well as by programs automatically searching the web • Graph databases are a way of dynamically extending assertions between objects, but don’t play well with large networks

Nice things about NOSQL DBS andthe Semantic Web and Web X.0 • NoSQL DBs are minimalistic in just the right way • Much easier to plug in than complex XML Schema front ends to databases and can work with existing relational dbs • Documents are natural to both efforts • Media blogs are natural to both efforts • Graphs are natural to the Semantic Web

Web services • Supports non-interactive database access • Uses XML, HTTP, etc. • Examples are Google and Amazon • Universal Description, Discovery, and Integration (UUID) for creating distributed registries of web services • Web Services Description Language (WSDL) • Simple Object Access Protocol (SOAP) is XML based and is a protocol that allows apps to send messages to each other over the Internet

Security • The complexity of server-side technology, along with its heterogeneity • The need to allow dynamic web page support, email, ftp, etc. • The need to support services • Access to databases from multiple sources on either side of the firewall

… continued • The tendency to loosen firewalls when things don’t work • Email attachments • Rapid rate of change of software and content and services • The use of open source and legacy dbs that are poorly understood

Another security issue • Web and database servers are used to support newer sorts of data and service access • Warehousing data (usually, but not always inside the firewall) • Mining data, which is often outside the firewall • Specialized document retrieval systems • Specialized advanced media retrieval systems • Integration of heterogeneous data • Sharing of namespaces, schema fragments, and query code (often in XML technologies)

… continued • All of these can be layered and span multiple sites • Such as hierarchical data marts • Mediator based integration hierarchies • A wide class of people, inside and outside of the organization must have access to data (such as content taggers)

Data Privacy • HIPAA • Authorization of users and applications • Passwords • Two factor (like a password or code and a physical code) • Mediated (using a third party) • Encryption • Storage • Transmission

Web X. 0, NoSQL DBs and the Semantic Web

Web X. 0, NoSQL DBs and the Semantic Web

Presentation Transcript

The Semantic Web

The Semantic Web

The Semantic Web

NoSQL DBs

The Semantic Web

Web X.0 and the Semantic Web

The Semantic Web

The SEMANTIC Web

The Semantic Web

The Semantic Web

The Semantic Web

The Semantic Web

The Semantic Web

Languages for the Semantic Web and Semantic Web Services

The Semantic Web

The Semantic Web

The Semantic Web

The Semantic Web

The Semantic Web

The Semantic Web

Languages for the Semantic Web and Semantic Web Services

The Semantic Web