350 likes | 458 Views
Web X. 0, NoSQL DBs and the Semantic Web. Directions…. Quick review: Web development frameworks. Web 2.0/3.0 is about making websites faster, smarter, more media rich and more intuitive
E N D
Web X.0, NoSQL DBsand the Semantic Web Directions…
Quick review: Web development frameworks • Web 2.0/3.0 is about making websites faster, smarter, more media rich and more intuitive • There is a generation of web development frameworks that focus on faster and smarter, they attack the back end, not the front end • Ruby on Rails, Grails, Django, Symfony, and others • They tend to use some kind of relational to object mapping/wrapping • They support and in fact enforce the MVC approach to developing websites – Model (the database with mapping/wrapping), View (web pages), Controller (pieces of code that map view manipulation into model manipulations and vice-versa). • They are AJAX friendly
Client-side web development • There is another generation of web frameworks that focus on making it easy to create rich web interfaces • Flash Builder (gone open source from Adobe) • Silverlight (from Microsoft and perhaps dead?) • They support 2D and some 3D graphics • They use upfront loading to minimize interaction with server • There is a newer effort involving HTML5 • Graphics is supported, with 2D and some 3D • Local storage with simple insert, delete, can use SQLite • Better multimedia support • More powerful Javascript libraries are coming out, e.g. JQuery, as well
Important to note • Web X.0 efforts try to make use of graphics in interfaces, as well as provide better displaying of media • But supporting blob and continuous data access is still very rudimentary (images, video, audio, etc.) • Problem: we cannot screen media in real time • Problem: it is very difficult to capture the semantics of media • The solution: We tend to build accompanying meta databases with tag sets (one per piece of media) assigned by experts using specialized namespaces. • To enhance accuracy, there is sometimes a feedback loop where users can train the search facility
Quick review: the Semantic Web • This is oriented around making the web more automatically searchable • Main foci: • Assertions and inferences • Exposing databases that contain “hidden” data • Searching of media bases (blog and continuous), i.e., exposing them • Searching document bases, i.e., exposing them • Data mining
Querying the Semantic Web • RDF - triples • We can use URI’s for all three pieces of a triple • SPARQL - triples query language, used for spanning Web boundaries • Example: THE BALL is ORANGE. ORANGE is an UGLY COLOR. The inference we can make is THE BALL has an UGLY COLOR
An RDF Example xmls:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”> xmls:zx=”http://www.someurl.org/zx/”> <rdf:Description rdf:about=”http://www.awebsite.org/index.html”> <zx:topic>funstuff</zx:topic> </rdf:Description> <rdf:Description <zx:created-by>http://www.anotherurl.org/buzz</zx:created-by> </rdf:Description> <rdf:Description rdf:about=”http://www.anotherurl.org/buzz”> <zx:is>http://www.yetanotherurl.org/professor</zx:Is> </rdf:Description>
The assertions and an inference • www.awebsite.org/index.html <topic> funstuff • The topic of the resource at www.awebsite.org/index.html is funstuff • www.awebsite.org/index.html <created-by> http://www.anotherurl.org/buzz • www.awebsite.org/index.html was created by someone who is identified by the url http://www.anotherurl.org/buzz. • We see that the value in the first triple, which concerns the “topic” of our resource, consists of a character string, but the value in the second triple, which concerns the “created-by” of our resource, is actually a URL.
SPARQL • SPARQL stands for Protocol And RDF Query Language, with an S tossed into the beginning so we can say it as “sparkle”. • It is a language that can be used to traverse graphs that consist of RDF triples that are chained together into an object network. • prefix website1: <http://awebsite.org/ > SELECT ?x WHERE { website1:was-created-by ?x } • This code will find the creators of http://awebsite.org • It will search through all of these triples and find the ones of interest to us, and then pluck off the names of the creators. • These triples could be distributed all around the Web
The Semantic Web, continued • Main tools • Namespaces posted on web and shared • XML • Ontologies of assertions • Tall people play basketball – Joe is tall (note both schema and instance based) • Walking paths linked by assertions with languages like SPARQL • Forming inferences from assertions along the way • XML extensions to accommodate complex data and non-string data and querying of large datasets • Support pointers to namespaces • Support complex, non-textual documents, along with object IDs, keys and foreign keys
Accommodating complex data • Schemas • Initially – DTDs • Later – XML schema • Save schema fragments and import them • Non-string data types • Keys and FKs • Type constructors • Primitive – integer, float, boolean, date, ID • Simple – list, union • Complex – groups of elements
XPATH – for searching XML schemas hierarchically • An XPath expression takes a document tree as input and returns a multi-set of nodes of the tree • Expressions that start with / are absolute path expressions • Expression / – returns root node of XPath tree • /Students/Student – returns all Student-elements that are children of Students elements, which in turn must be children of the root • /Student – returns empty set (no such children at root
XPATH continued • Current (or context node) – exists during the evaluation of XPath expressions (and in other XML query languages) • . – denotes the current node; .. – denotes the parent • foo/bar – returns all bar-elements that are children of foo nodes, which in turn are children of the current node • ./foo/bar – same • ../abc/cde – all cde e-children of abc e-children of the parent of the current node • Expressions that don’t start with / are relative (to the current node)
Attributes, text, … • /Students/Student/@StudentId – returns all StudentId a-children of Student, which are e-children of Students, which are children of the root • /Students/Student/Name/Last/text() – returns all t-children of Last e-children of … • /comment( ) – returnscomment nodes under root • XPath provides means to select other document components as well
XQuery • General structure: FOR variable declarations WHERE condition RETURN document • Example: (: students who took MAT123 :) FOR $t IN doc(“http://xyz.edu/transcript.xml”)//Transcript WHERE $t/CrsTaken/@CrsCode = “MAT123” RETURN $t/Student • Result: <Student StudId=“111111111” Name=“John Doe” /> <Student StudId=“123454321” Name=“Joe Blow” />
XML and Web X.0: Flash Builder <?xml version="1.0" encoding="utf-8"?> <s:Applicationxmlns:fx="http://ns.adobe.com/mxml/2009" xmlns:s="library://ns.adobe.com/flex/spark" xmlns:mx="library://ns.adobe.com/flex/mx" minWidth="955" minHeight="600"> <fx:Declarations> <!-- Place non-visual elements (e.g., services, value objects) here --> </fx:Declarations> <mx:DateChooser x="527" y="142"/> </s:Application>
Semantic Web big problems • Massive reengineering effort to make use of Semantic Web technology • Assertions that span nodes can be extremely time consuming to traverse • Making media accessible • Easy enough to generate low level assertions automatically • Very time consuming to add assertions manually by experts • Our main tools are tagging and image/sound processing packages that are very complex and very heuristic driven • XML Schema, the big XML extension, is unwieldy
Web X.0 big problems • We are not just trying to search relational databases • Graphics is often used in a gratuitous, non-useful, even distracting fashion, and they eat up download time and computational time • We still cannot manipulate or search or interpret media
Comparison with NoSQL DBs • Key-document and key-value databases are a way of organizing document and value (blob) and continuous databases so they can be searched quickly by next generation web applications, as well as by programs automatically searching the web • Graph databases are a way of dynamically extending assertions between objects, but don’t play well with large networks
Nice things about NOSQL DBS andthe Semantic Web and Web X.0 • NoSQL DBs are minimalistic in just the right way • Much easier to plug in than complex XML Schema front ends to databases and can work with existing relational dbs • Documents are natural to both efforts • Media blogs are natural to both efforts • Graphs are natural to the Semantic Web
Web services • Supports non-interactive database access • Uses XML, HTTP, etc. • Examples are Google and Amazon • Universal Description, Discovery, and Integration (UUID) for creating distributed registries of web services • Web Services Description Language (WSDL) • Simple Object Access Protocol (SOAP) is XML based and is a protocol that allows apps to send messages to each other over the Internet
Security • The complexity of server-side technology, along with its heterogeneity • The need to allow dynamic web page support, email, ftp, etc. • The need to support services • Access to databases from multiple sources on either side of the firewall
… continued • The tendency to loosen firewalls when things don’t work • Email attachments • Rapid rate of change of software and content and services • The use of open source and legacy dbs that are poorly understood
Another security issue • Web and database servers are used to support newer sorts of data and service access • Warehousing data (usually, but not always inside the firewall) • Mining data, which is often outside the firewall • Specialized document retrieval systems • Specialized advanced media retrieval systems • Integration of heterogeneous data • Sharing of namespaces, schema fragments, and query code (often in XML technologies)
… continued • All of these can be layered and span multiple sites • Such as hierarchical data marts • Mediator based integration hierarchies • A wide class of people, inside and outside of the organization must have access to data (such as content taggers)
Data Privacy • HIPAA • Authorization of users and applications • Passwords • Two factor (like a password or code and a physical code) • Mediated (using a third party) • Encryption • Storage • Transmission