280 likes | 384 Views
Secure XML Querying with Security Views. Wenfei Fan University of Edinburgh & Bell Laboratories Chee-Yong Chan National University of Singapore Minos Garofalakis Bell Laboratories. The need for XML security. Data in XML format: Business information: confidential
E N D
Secure XML Querying with Security Views Wenfei Fan University of Edinburgh & Bell Laboratories Chee-Yong Chan National University of Singapore Minos Garofalakis Bell Laboratories
The need for XML security Data in XML format: • Business information: confidential • Health-care data: Patient Privacy Act, … Access control: • multiple groups simultaneously query the same XML document • each user group has a different access-control policy Enforcement of access-control policies: . . . user group 1 user group n inaccessible accessible XML Query Engine
user group Q Q(T) inaccessible XML Query Engine accessible XML document T Secure XML querying For each user group of an XML document T, • specify a access-control policyS, • enforceS: for any query Q posted by the group over the document T, Q(T) consists of only data accessible wrt S Access control for XML: • How to specify access policies at various levels of granularity? • How to efficiently enforce those access policies?
hospital * patient * SSN name record date diagnosis treatment regular trial trName tname bill * Example: an XML document of patients Document DTD D hospital patient* patient SSN, name, record* record date, diagnosis, treatment treatment (trial + regular) trial trName, treatment* regular tname, bill Access-control policies over docs ofD: • Doctors in the hospital are granted access to all the data in the docs • Insurance company is allowed to access billing information only DTD graph
X X X X hospital * patient * SSN name record date diagnosis treatment regular trial trName tname bill * Access-control policy for syndrome surveillance • patients: accessible to only those who are diagnosed to have a certain disease “DIS” (a constant) • records: • only with diagnosis = “DIS” • part of “DIS” records: date, diagnosis, treatment, tname • denied from seeing whether a patient is in a clinical trail or not (trial, regular, trName) • denied from accessing billing information
hospital * patient * SSN name record date diagnosis treatment regular trial trName tname bill * Challenge: Access-control specification • various levels of granularity: restricting access to entire subtrees or specific elements • conditional access: e.g., a patient is accessible if and only if it has a descendant diagnosis = “DIS” • overriding: e.g., tname overrides the accessibility of its parent regular • inheritance: e.g., SSNand name inherit the accessibility of patient conditionally accessible
hospital * patient * SSN name record date diagnosis treatment regular trial trName tname bill * conditionally accessible Challenge: access-control enforcement should not imply any drastic degradation in performance Example: an XPath query Q posed by a syndrome surveillance group over a document T //patient[name=`Joe’]//tname • access control requirement: Q(T) {accessibletname} • enforcement: ensure that • all and only those Joe’s having a descendant diagnosis = “DIS”, • all and only those records with diagnosis = “DIS”
hospital * patient * SSN name record date diagnosis treatment regular trial trName tname bill * conditionally accessible Challenge: schema availability • One needs schema information to facilitate query formulation and optimization • How to define a schema (DTD) characterizing all and only the accessible information, without security breach? • How to automatically derive such a DTD from the document DTD and an access-control specification? XML DTD is far more complicated than its relational counterpart – recursive, nondeterministic
Previous proposals/standards for XML security Dozens of models have been proposed for XML: XACML, XACL, … • Specifying and enforcing access-control at a physical level • annotate data nodes in an XML document with accessibility, and check accessibility at runtime (with optimizations for tree-pattern queries and tree/DAG DTDs), or • materialize a view consisting of accessible data Problems: • costly (time, space): multiple accessibility annotations/views • error-prone: integrity maintenance becomes a problem when the underlying data or access policy is updated • No support for schema availability: either deny access to any schema information, or expose the entire document DTD -- security breach
hospital * patient * SSN name record date diagnosis treatment regular trial trName tname bill * A seemingly plausible model • annotate data nodes with accessibility • check accessibility at runtime, and • expose the document DTD D Example: permissible XPath queries: • Q1://patient[name=`Joe’]/record /treatment/*/tname • Q2://patient[name=`Joe’]/record /treatment//tname Security breach: from the document DTD it follows that if Q2(T) – Q1(T) is nonempty then Joe is involved in a clinical trial
query query query Security view k (view DTD, xpath( )) Security view n (view DTD, xpath( )) Security view 1 (view DTD, xpath( )) derivation module query translation module Rewriter Optimizer specification 1 specification n specification k Our security model for XML • Security administrator: specifies a access-control policy for each group by extending the document DTD with XPath qualifiers • Derivation module: automatically derives a security-view definition from each policy: view DTD and mapping via XPath • Query translation module: rewrite and optimize queries over views to equivalent queries over the underlying document XML document
query query query Security view k (view DTD, xpath( )) Security view n (view DTD, xpath( )) Security view 1 (view DTD, xpath( )) derivation module query translation module Rewriter Optimizer specification 1 specification n specification k XML document Overcome the limitations of previous proposals • Specification and enforcement: at the conceptual (schema) level • no need to update the underlying XML data • no need to materialize views or perform runtime check • Schema availability: view schema is automatically derived • characterizing accessible data • exposing necessary schema information only
Access-control specification • DTD D : element type definitions A ::= PCDATA | | A1, …, Ak | A1 + … + Ak | A* • Specification S= (D, access( )): a mapping access( ) from the edges in the document DTD { Y, N, [q]}. For each A , for each B in , define Access(A, B) as • Y: accessible (true) • N: inaccessible (false) • [q]: XPath qualifier,conditional: accessible iff [q] holds XPath fragment: p ::= | A | * | // | p/p | p p | p[q] q ::= p | p = “c” | q1 q2 | q1 q2 | q + Access policy = DocumentDTD XPath qualifiers
hospital * [q1] patient * [q2] SSN name record date diagnosis treatment regular trial trName tname bill * Example: access policy S for syndrome surveillance access(hospital, patient) = [//diagnose = “DIS”] -- [q1] access(patient, record) = [diagnose = “DIS”] -- [q2] access(treatment, trial) = N access(treatment, regular) = N access(regular, tname) = Y • overriding: if access(A, B) = Y (N), then the B children of A override the accessibility of A • inheritance: if access(A, B) is not explicitly defined, then the B children of A inherit the accessibility of A • content-based: conditional accessibility via XPath qualifiers conditionally accessible
hospital * [q1] patient * [q2] SSN name record date diagnosis treatment regular trial trName tname bill * Properties of the specification language • XML tree of the document DTD: the accessibility of each data node is uniquely defined by an access specification • relative to the path from root • a qualifier at a node a constrains the entire subtree rooted ata, e.g., [q2] constrains tname • various levels of granularity: entire subtrees or specific elements • schema level: the underlying XML data is not touched; efficient, easy to specify and maintain conditionally accessible
Enforce access control – security views XML security view: = (Dv, xpath( )) with respect to an access policy S= (D, access( )), • Dv: view DTD, exposed to the user and characterizing the accessible information (of document DTD D) wrt S Schema availability: to facilitate query formulation • xpath( ):mapping from instances of D to instances of Dv defined in terms of XPath queries and view DTD Dv • for each A in Dv, for each B in , xpath(A, B) = p • p: generates B children of an A element in a view p ::= | A | * | // | p/p | p p | p[q] q ::= p | p = “c” | q1 q2 | q1 q2 | q
hospital * [q1] hospital patient * * [q2] patient SSN name record * SSN name record date diagnosis treatment date diagnosis treatment * regular trial tname trName tname bill * Example: view DTD for syndrome surveillance = (Dv, xpath( )) with respect to access policy S= (D, access( )) View DTD Dv • Hide trial, trName, regular, bill • Expose accessible information only Document DTDD
patient patient patient patient SSN name record Example: view definition for syndrome surveillance xpath( ): maps edges in view DTDDv to paths in document DTD D • hospital patient* xpath(hospital,patient) = hospital/patient [q1] [q1]: [//diagnose=“DIS”] semantics: • top-down construction • preserving qualifiers in a specification hospital • patient SSN, name, record* • xpath(patient, SSN) = SSN, /* name */ • xpath(patient, record) = record [q2] • [q2]: [diagnose=“DIS”]
patient patient patient patient date diagnosis treatment treatment tname tname SSN name record regular trial trName tname bill * DTD-directed construction of security views • record date, diagnosis, treatment xpath(record, date) = date /*diagnosis, treatment */ hospital • treatment tname* xpath(treatment, tname) = //tname • DTD-directed construction view DTD conformance • Never materialized the construction strategy is just to give the semantics
Derivation of security-view definition XML security views are far more intriguing than relational views • multiple XPath queries vs. a single SQL query • DTDvs. relational schema One needs an algorithm to compute a security-view definition: • Input: anaccess policy S= (D, access( )) • Output: a security-view definition = (Dv, xpath( )) • sound: accessible information only • complete: all the accessible data (structure preserving) • DTD-conformant: conforming to the view DTD • efficient: O(|S|2) time • generic: recursive/nondeterministic document DTDs
hospital hospital xpath(hospital,patient) = hospital/patient[q1] * * patient [q1] patient xpath(patient, record) = record[q2] * * [q2] SSN name record SSN name record date diagnosis treatment date diagnosis treatment xpath(record, treatment) = treatment Algorithm: deriving a security-view definition • Top-down traversal ofthe document DTD D • short-cutting/renaming (via dummy)inaccessibleelement types • normalizing the view DTD Dv and reducing dummy types
treatment treatment dummy1 dummy2 regular trial treatment * trName * tname bill tname tname * deriving a security-view definition • recursive and non-deterministic productions xpath(treatment, dummy2) = regular xpath(treatment, dummy1) = trail • reducing dummy element types: (dummy1/treatment)* / dummy2 / tname dummy2/tname) (dummy1/treatment)* / dummy2 / tname tname* xpath(treatment, tname) = //tname
query Query translation: one needs an efficient algorithm to rewrite queries over a security view to equivalent and efficient queries over the underlying document Security view k (view DTD, xpath( )) Rewriter Optimizer query translation module XML document Enforce access control via query rewriting security viewsare virtual: not materialized • Efficiency: no extra costs to support multiple security views over the same large document simultaneously • Consistency/integrity: updating the underlying data introduces no difficulties/overhead
algorithm rewrite • Input: • = (Dv, xpath( )) (security view wrt S= (D, access( ))), and • an XPath query Qv over the view (Dv) • Output: an equivalent XPath query Qtover the document • for any XML document T of D, Qt(T) = Qv((T)) Dynamic programming: • for any subquery Qv’ of Qv, anynode A in view-DTD graph Dv rewrite Qv’ at A by incorporating xpath(A, _) Qt’(A) • efficient:O(|Qv| | |2) time • a practical class of XPath (with union, descendant, qualifiers) vs. tree-pattern queries studied in previous security models
hospital * [q1] hospital xpath(hospital, patient) [name = “Joe”] / xpath(patient, record) / xpath(record,treatment) / xpath(treatment, tname) patient * * [q2] patient SSN name record * SSN name record date diagnosis treatment date diagnosis treatment Qt=/hospital/patient[name = “Joe” and //diagnosis = “DIS”] /record[diagnosis = “DIS”] /treatment // tname equivalent queryover document * regular trial tname trName tname bill * Example: query rewriting for syndrome surveillance Qv = // patient[name=“Joe”] // tname over the view
A [B and C] empty-set exclusive constraint: an Aelement cannot have bothB and C children at the same time A disjunction: exclusive constraints B C • // F[G] / H empty-set non-existence constraint: a Felement does not have a G child E F G conjunction: existence (nonexistence) constraints H Query optimization with structural constraints Optimize Qt = rewrite(, Qv) by leveraging the document DTD D Q = A[B] // E[F] //H A [B and C] // H // F[G] / H Q’ = A /B / E / F / H • A[B] // E[F] // H A /B / E / F / H exclusive constraint: B and C do not coexist under an A element DTD graph
A heuristic for XPath containment (NP-hard for small fragments in the presence of DTDs) • image graph: evaluation ofsub-queries over DTD graph • containment test: extension of simulation • Q1 Q2 if image(Q1) is simulated by image(Q2) • qualifiers: inverse simulation • effective: preliminary experimental study (speedup up to a factor of 2) B * A A C E DTD graph B B E [C] E image graph for // *[C] //E image graph for // E Example: heuristic for XPath containment Q = // *[C] //E // E Q’ = A /B / E • Q1 Q2 Q2 if Q1 Q2 // *[C] //E // E // E A /B / E
Summary • security views: the first model for specifying/enforcing XML security at a schema level and providing schema availability • a fine-grained access-control specification language • an effective enforcement framework via security views • view DTD: characterizing accessible information • algorithm for deriving security-view definitions • algorithms for query rewriting/optimization: no need tomaterialize views or to perform runtime security checks • future work: • reasoning about security views (soundness, completeness, DTD conformance – subsume XPath satisfiability with DTDs) • inference control in the presence of external knowledge A practical solution for securing XML querying