520 likes | 646 Views
The Policy-Aware Web: Privacy and Transparency on the Semantic Web. Jim Hendler Hendler@cs.umd.edu http://www.cs.umd.edu/~hendler. 2004 NSF National Priorities ITR to UMCP and MIT (Hendler, Berners-Lee, Weitzner- PIs). Outline. Motivation Example Digression Content Challenge(s) Summary.
E N D
The Policy-Aware Web: Privacy and Transparency on the Semantic Web Jim Hendler Hendler@cs.umd.edu http://www.cs.umd.edu/~hendler 2004 NSF National Priorities ITR to UMCP and MIT(Hendler, Berners-Lee, Weitzner- PIs)
Outline • Motivation • Example • Digression • Content • Challenge(s) • Summary
As we publish more info- how do we control access … Who can see What??
Current Policy Languages • A number of languages being explored: • P3P (data-centric relational semantics -> relational database) • WS-Policy (propositional, and & or, but weak not) • Features and Properties (no operators, easier to map to RDF) • Combinators (choose one/all, similar to WS-Policy) • KaOS Policy and Domain Services • WSPL and EPAL (subsets of XACMLs) • XACML (and, or, not, first and higher order bag functions) • Rei (OWL-Lite + logic-like variables) • A lot of ambiguity about exact expressivity and computational properties (or even the semantics!)
An example: WS-Policy • WS-Policy provides a flexible grammar for expressing C&C of web services • Normalized form (maybe to do non normalized) • Two translation approaches: • Policies as Instances • Readable, but hard to capture semantics • Available at: http://mindswap.org/dav/ontologies/ws-policy_instance.owl • Policies as Classes • Translate WS-Policy constructs into OWL constructs • E.g., wsp:All --> owl:intersectionOf
WS-Policy Example <wsp:Policy> <wsp:ExactlyOne> <wsp:All> <wsse:SecurityToken> <wsse:TokenType>wsse:Kerberosv5TGT</wsse:TokenType> </wsse:SecurityToken> </wsp:All> <wsp:All> <wsse:SecurityToken> <wsse:TokenType>wsse:X509v3</wsse:TokenType> </wsse:SecurityToken> </wsp:All> <wsp:All> <wsse:SecurityToken> <wsse:TokenType>wsse:UserNameToken</wsse:TokenType> </wsse:SecurityToken> </wsp:All> </wsp:ExactlyOne> </wsp:Policy>
Mapping WS-Policy to OWL • “all” is easy: it’s logical conjuction (i.e., intersectionOf) • “exactlyOne” is harder, two readings: • Older version: “oneOrMore” • Inclusive OR, maps to owl:unionOf • “exactlyOne” suggests XOR • Have to map to a disjunction of conjunctions • Quadratic increase in size of disjuncts • Ontology: http://www.mindswap.org/dav/ontologies/policytest.owl
Example • @prefix owl: <http://www.w3.org/2002/07/owl#> .@prefix policytest: <http://www.mindswap.org/~kolovski/policytest.owl#> .policytest:TestPolicyaowl:Class;owl:intersectionOf ( owl:unionOf ( policytest:SecurityTokenTypeUsernameTokenpolicytest:SecurityTokenTypeX509policytest:SecurityTokenTypeKerberos ) owl:complementOfowl:unionOf ( owl:intersectionOf ( policytest:SecurityTokenTypeUsernameTokenpolicytest:SecurityTokenTypeX509 ) owl:intersectionOf ( policytest:SecurityTokenTypeUsernameTokenpolicytest:SecurityTokenTypeKerberos ) owl:intersectionOf ( policytest:SecurityTokenTypeX509policytest:SecurityTokenTypeKerberos ) ) ) .
Digression MINDSWAP ontology tools
Ontology Debugging Service • Example taken from Sweet-JPL OWL Ontology, where 13 out of ~3000 axioms make one class unsatisfiable
Under the hood • The Semantic Web vision requires "plumbing" that lives on the Web, but provides support for • Ontologies linked together • Reasoning that can scale • Limited expressivity (OWL) • Mixed Logics and Rules (RIF) • Open World reasoning (CW is key to many algorithms performance) • "Hidden" logic - users want results, not symbols • Modularity and collaboration • Teams of people creating teams of ontology • And much more • Triple store scaling, HTTP embedding (state free),URIs…
RDF/XML Parser SPARQL Parser Jena Application Jena Interface Species Validation & Ontology Repair ABox Query Engine OWL API Application TBox ABox OWL API Interface TBox Absorption Tu KnowledgeBase Interface (Reasoner SPI) Tableau Reasoner DIG Application Tg DIG Interface XSD Reasoner Internalization Pellet: a reasoner for the SemWeb
Pellet: OWL reasoner • Description Logic reasoner based on tableaux algorithms • Specifically designed for OWL • Primarily for OWL-DL ontologies • Heuristics to repair OWL Full ontologies • Research extensions to OWL FULL • First reasoner to support all of OWL-DL • Implements SHOIQ algorithm by Horrocks and Sattler • Provides all the standard reasoning services • KB consistency, concept satisfiability, classification, realization • Plus…
Special Features • Query Answering • Conjunctive ABox queries expressed in RDQL or SPARQL • Datatype Reasoning • Check if the intersection of XML Schema datatypes is satisfiable • Support reasoning with user-defined derived datatypes • e.g. numeric or time intervals • Multi-Ontology Reasoning using E-Connections • Defining and instantiating combinations of OWL-DL ontologies • An alternative to owl:imports • Ontology Debugging • Explaining the cause of unsatisfiable concepts • Relations between unsatisfiable concepts • Non-monotonic Reasoning with K-operator • Closed-world queries using ALCK
Pellet (more) • Coerces “DL-izable” OWL Full ontologies into OWL DL • OWL Full and OWL DL can be unified • Inverse functional properties on datatype properties • Punning: Metaclasses allowed • Type assignment for untyped classes • Combines inverse and nominal correctly (decidably) • Extended datatype support (more built in and user defined datatypes) • Incremental reasoning through update of the KB: • Optimized classification and realization (50% to order of magnitude improvements) • Working on updating the completion graph to speed initial consistency check
Performance • Dynamic completion strategy selection based on the ontology expressivity • Nominals (oneOf, hasValue), Inverse Properties (inverseOf), Individuals • Includes standard optimization techniques • Normalization, simplification, absorption, semantic branching, dependency directed backjumping, caching, model merging, binary instance retrieval • Several novel optimizations (see KR ’06 paper) • Nominal absorption, learning-based disjunct selection, partial backjumping, nominal-based model merging, lazy forest generation, forest caching
Applications using Pellet • Ontology editing and management • Available as a Swoop plug-in • DIG interface to support Protégé • Web Service composition • Matchmaking for Web Services • Reasoning about preconditions and effects • Fujitsu Task Computing Environment • Interacting with devices and Web Services • Reasoning about policies • Policy consistency, policy containment, etc. • Process WS-Policy descriptions
Policy Aware Web (NSF ITR; Hendler, Berners-Lee, Weitzner; 2005)
Use case: A Web browser requests the home page for a girl scout troop and is given it by a Web server. Web Server Content Demo
However, requests for images result in HTTP Error 401, “Unauthorized” Web Server 401 Content 401
The 401 “Unauthorized” response has been modified to provide a URL to a policy: HTTP/1.1 401 Not authorized Date: Sat, 03 Dec 2005 15:32:18 GMT Server: TwistedWeb/2.0.1 Policy: http://groups.csail.mit.edu/dig/2005/09/rein/examples/troop42-policy.n3 Content-type: text/html; charset=UTF-8 Connection: close 10:32:20 ERROR 401: Not authorized. Demo
Example policies Photos taken at meetings of the troop can be shared with any current member of the troop. Photos taken at a jamboree can be shared with anyone in the troop or with anyone who attended the jamboree. Photos of any girl in the troop can be shared with the world if that girl's parent has given permission Policies use linked rules { REQ a rein:Request. REQ rein:resource PHOTO. ?F a TroopStuff; log:includes { PHOTO a t:Photo; t:location LOC. LOC a t:Meeting }. REQ rein:requester WHO. WHO session:secret ?S. ?S crypto:md5 TXT. ?F a TroopStuff; log:includes { [] t:member [ is foaf:maker of PG ]. LOC t:attendee [ is foaf:maker of PG ] }. PG log:semantics [ log:includes { PG foaf:maker [ session:hexdigest TXT ] } ]. } => { WHO http:can-get PHOTO }.
Rein example { <http://dig.csail.mit.edu/2005/09/rein/examples/troop42.rdf> log:semantics ?F } => { ?F a TroopStuff }. # Photos take at meetings of the troop can be shared with any # current member of the troop { REQ a rein:Request. REQ rein:resource PHOTO. ?F a TroopStuff; log:includes { PHOTO a t:Photo; t:location LOC. LOC a t:Meeting }. REQ rein:requester WHO. WHO session:secret ?S. ?S crypto:md5 TXT. ?F a TroopStuff; log:includes { [] t:member [ is foaf:maker of PG ]. LOC t:attendee [ is foaf:maker of PG ] }. PG log:semantics [ log:includes { PG foaf:maker [ session:hexdigest TXT ] } ]. } => { WHO http:can-get PHOTO }. # Photos taken at a jamboree can be shared with anyone in the # troop or with anyone who attended the jamboree. # (i) anyone who is in the troop { REQ a rein:Request. REQ rein:resource PHOTO. ?F a TroopStuff; log:includes { PHOTO a t:Photo; t:location LOC. LOC a t:Jamboree }. REQ rein:requester WHO. WHO session:secret ?S. ?S crypto:md5 TXT. ?F a TroopStuff; log:includes { [] t:member [ is foaf:maker of PG ]. }. PG log:semantics [ log:includes { PG foaf:maker [ session:hexdigest TXT ] } ]. } => { WHO http:can-get PHOTO }. # (ii) anyone who attended the jamboree { REQ a rein:Request. REQ rein:resource PHOTO. ?F a TroopStuff; log:includes { PHOTO a t:Photo; t:location LOC. LOC a t:Jamboree }. REQ rein:requester WHO. WHO session:secret ?S. ?S crypto:md5 TXT. ?F a TroopStuff; log:includes { LOC t:attendee [ is foaf:maker of PG ]. }. PG log:semantics [ log:includes { PG foaf:maker [ session:hexdigest TXT ] } ]. } => { WHO http:can-get PHOTO }. The RDF/XML syntax is even worse: Authorability/Editability are important issues Specialized use (cf. Creative Commons) a partial out.
Use of the PAW proof-generation proxy results in a proof which satisfies the policy: Web Server Proof Third-party services may be consulted to help construct the proof.
The proxy: Uses Rein, a policy engine, to specify rules which match a given policy. The Rein rules are run in Cwm, a forward-chaining reasoner for the Semantic Web. This generates a proof. Proof is HTTP-PUT on the server, and a HTTP-GET on same document is then invoked (requires HTTP 1.1)
The Web server checks the proof and serves the content if it is valid. Web Server Content
The server: Uses Cwm to validate the proof. Takes action based on validation (serves content or denies).
Current demo work: Make use of multiple distributed authentication systems (instead of holding secrets in the proxy). Associate content with RDF metadata and base policy decisions on the RDF Address issues of eventual integration of the proxy with a Web browser (e.g. cookie storage). Extend system to "distributed" scenarios (different authorities hold parts of policy, may have own rules on access) Attack user interface issues
Open, Distributed Policy Challenges • Identity vs. privacy • How do you identify yourself w/o violating the very privacy concerns we hope to address? • Current identity schemes are centralized and universal • Can we do a distributed ID model (maybe email based)? • Inconsistency • In logic "P ^ -P => Q" • On Web it better not! (Supports(Hillary) ^ -Supports(Hillary)) => you owe me $1000 • Can we use a "non-standard" logic solution? • Provenance and downstream tracking • As information flows through the system, later access may depend on earlier decisions • Policies often dependent on use context • Policies may change depending on how information was acquired
Provenance Tracking on the Semantic Web • Provenance of Data • Who or what services created/input the data • Files on which the data depends • Date and time of creation • Steps taken to compute / produce the data • "recursively" ground to the above
Producing Provenance Data • On the Semantic Web • Provenance can be stored and tracked • Services represented by Service Descriptions • All files created and and referenced by URIs • Web service executes and also outputs and OWL model of the service execution, including all provenance data • Service outputs a file with provenance for each output file • Semantic Web triple stores maintain mapping to this file from triples or subgraphs
"Magic" is in URIs Every piece of data gets its own "web page"
Ontology for provenance The "Web page" itself is machine-readable (OWL)
Validation - IPAW provenance Challenge • E.g. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. Answeredevery querysuccessfully
Dana's Challenge • All data directly output from a Predator UAV is classified. • Classified data combined with unclassified data is considered classified. • Classified data can only be viewed by persons with top secret clearance, with the following exceptions: … In warfare conditions, unclassified persons may view perishable data that is classified if the persons life is threatened due to lack of that data and if the person's superior has top secret clearance and has approved such viewing. Can we apply PAW to Army policies w/in B3AN?
Conclusions • Information lives in specific contexts • The Semantic Web helps us place information into these (multiple) contexts. • Control of information requires control of contexts • Explication of policies • Linked in a Web-like way • Integrated directly into the Web • With extensions for rules and proofs • Is really hard • Issues of identity, inconsistency, provenance, change over time • But holds great potential • Flexible and adaptive • "Policy-Aware" Web project (joint between UMCP and MIT) • First step towards "Semantic Accountability" applications http://www.policyawareweb.org/
Another Cool thing… • What is a rule of logic? • In traditional philosophy it relates to "Truth" • What is truth on the Web? • Ex: How many cows are in Texas? • On the Web, we could use an idea of agreed upon rules, grounded at URI • Social definition of truth via shared contexts • Ex: Because Mom said so…
Truth on Web Pages [based on Heflin etal, 1998] • Inference rules could be used to determine the credibility of claims • I might believe the claims made by a reliable Newspaper • Trustable(x) :- x; reliableNewspaper. • And I could establish the Washington Post as reliable... • i.e. I assert: http://www.washingtonpost.com owl:class reliableNewspaper. • or if I infer it • ReliableNewspaper(X) :-> X owl:class ReliableNewspaper;http://MediaWatchList. • (?) reliableNewspaper(X) :- X owl:class ReliableNewspaper; src ^ trusted(src). • The rules are "grounded" in a testable way • cf. If I can HTTP-get the fact, then it is asserted
Rule Sets could be shared • You can ground your sources • X :- X; src ^ src owl:class TrustedSource; http://…/myMomSet.rdf • Or infer trusted sources based on other rule sets • X :- X; src ^ src owl:class TrustedSource; http://ex.com/RushLimbaughSet.rdf • X :- X; src ^ src owl:class TrustedSource; http://ex.com/UnabomberRules.rdf ^ --( X;http://www.rushLimbaugh.com/truths.rdf)
Annotated Logic(in 25 words or less) • Traditional Logic P & -P => Q (P and -P are inconsistent) • Annotated Logic • P;X & -P;Y are not inconsistent • P;X & -P;X => Q;X but not Q;Y • P;X & -(P;X) is inconsistent and must be avoided (but this is easily checked if inference of RHS is restricted)