Integrated Security Framework for Semantically Enhanced Semi-Structured Data

Integrated Security Framework for Semantically Enhanced Semi-Structured Data Andrei G. Stoica and Csilla Farkas Department of Computer Science & EngineeringUniversity of South Carolina i

Overview • Machine understandable data semantics: • domain and context definition • ontologies • metadata • What are the security implications? • New security mechanisms? • New security paradigm?

XML Language • High-level application messaging • Used for storage application • reduces computation overhead • uniform access • Base for semantic orientated languages - RDF, DAML • Increased popularity

Semantic Tools • The information process is augmented with a semantic layer. • Infrastructure allows computers to reason about data meaning. • Computers exchange information transparently on behalf of the user. • Implications • Intelligent high-volume processing

Security Setup Increased Connectivity + Extensive XML support + Semantic Infrastructure = New Security Threats • Established Security Models do not address this dimension: • Indirect disclosure • Undesired Inference • Available inference models difficult to transfer from database security • open domains

Related Work • Document Instance Security • Digital Signatures • Encryption • XML Access Control Models • Security labels assignment • Multi-level XML Security • Extensions from Database Security

Problems? • Semantic correlations ignored • Inconsistent reply • Indirect unauthorized disclosure

Example medicalFiles countyRec milBaseRec <medicalFiles> UC <countyRec> S <patient> S <name>John Smith </name> UC <phone>111-2222</phone> S </patient> <physician>Jim Dale </physician> UC </countyRec> <milBaseRec> TS <patient> S <name>Harry Green</name> UC <phone>333-4444</phone> S </patient> <physician>Joe White </physician> UC <milTag>MT78</milTag> TS </milBaseRec> </medicalFiles> physician Jim Dale physician Joe White milTag MT78 patient patient name John Smith phone 111-2222 name Harry Green phone 333-4444 View over UC data

Inference • Set of data + associations derive the target data • Traditionally a human task • At the limit, infer any target given enough related data and metadata.

Problems? If the inference target is confidential information Security Violation

Example • Simulation Exploitation Using Open Source Information: • Objective: US Government would like to share a limited simulation software with friendly countries. • Can this software be used to explore the capabilities of US weaponry? • Can sufficient information be found from public sources to create such simulation?

Example • Findings: • Most of the information needed for the simulation was available on the Internet. • Needed human aid to combine available information

Proposed Solution What do we do? • XML Views Considering Semantic Dimension. • do not disclose more information (including structure of the document). • cover stories. • Web Inference • make sure the information we publish does not lead to our confidential data.

Proposed Solution • XML Access Control • semantic consistent reply • prevent illegal inference from query reply (cover stories). • Global Disclosure Control • detect and prevent a set of undesired inferences using public Internet data in correlation with public local data

Security Engine Local Organization Access Control Corrective Measures Request SecView Local XML Database Interface Module Return Oxsegin Update Local Ontology Upload Global Data Privacy Control Local Data Internet Data

Secure XML Views • Builds secure & semantic consistent single security level partial views • Minimum Semantic Conflict Graph • avoids semantic conflicts • Multi-Plane DTD Graph MPG • structural relationships between tags • Andrei Stoica, Csilla Farkas. “Secure XML Views”, In Proc. of IFIP 2002

Example DTD Graph medicalFiles MSCG name phone countyRec milBaseRec emrgRec physician patient milTag physician name phone

Oxsegin Local Classified Database Inference Engine Local Public Database Security Violations Internet Databases Corrective Measures

Corrective Measures • Local Public Data • Remove information • Release misleading information • Internet Public Data • Release misleading information • Target desirable inference results

Inference Engine Replicated Data Inf. Public+Local Database Local Classified Database Violation Pointers Prob. Coef. Correlated Data Inf. Inf. Struct Ontology

Replicated Data Inference • Identifies replicated information under different security classifications • Violation Pointer = similar units of data at different security levels • Inference is guided by inference structures built on ontology concept hierarchy • Andrei Stoica, Csilla Farkas. “Ontology guided XML Security Engine”, In Journal of Intelligent Information Systems, to appear.

Replicated Data Inference Inf. Tree Ontology Classified Data file Public Data file A  Patriot Freq. N0 M1 B B C ,  M2 M4 M3 N1 N2 D E PAC-2 Freq. PAC-3 Freq. PAC-2 Freq. PAC-3 Freq. M7 M7 N5 N5 N6 N7 Scientific data on radar components Missiles Tracking Systems Confidence Level (M7,N5) = ƒ (,,,)

Correlated Data Inference • Identifies sensitive data in the public domain (relative to a given classified database – usually the local database). • Inference guidance: • Ontology concept hierarchy • Structural similarity of public data • Csilla Farkas, Andrei Stoica. “Correlated Data Inference, Ontology Guided XML Security Engine”, In Proc of IFIP 2003.

Correlated Data Inference • Features of similarity: • Levels of abstraction for each node • Distance of associated nodes from association root • Similarity of the distances • Length of the distance • Similarity of sub-trees originating from correlated nodes

Air show address fort address fort Correlated Data Inference • Association similarity: • Distance of each node from the association root • Difference of the distance of the nodes from the association root • Similarity of the sub-trees originating at nodes

address fort Water source base district basin Correlated Data Inference Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base ?

place address fort district basin Water source base Confidential Correlated Data Inference Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base base Public Public Water Source

Summary • Secure XML Views provide semantic consistent query reply and cover stories. • Oxegin architecture and methods detect undesired inferences • Structural similarity • Semantic concept hierarchy • Confidence in derived inferences

Next Class • Stream data

Integrated Security Framework for Semantically Enhanced Semi-Structured Data

Integrated Security Framework for Semantically Enhanced Semi-Structured Data

Presentation Transcript

Keyword Search on Structured and Semi-Structured Data

Putting Semi-structured Data to Practice

Community Data Evaluation using a Semantically Enhanced Modelling Process

Semi-Indexing Semi-Structured Data (in tiny space)

Text Search for Fine-grained Semi-structured Data

THE ENHANCED INTEGRATED FRAMEWORK PROGRAM

A Robust System Architecture For Mining Semi-structured Data

Semi-Structured Data Models

Semi-Structured Data and XML

Efficient Algorithms for Mining Semi-structured Data

Proxy Framework for Enhanced RFID Security and Privacy

Efficient Search in Semi-structured Data Spaces

ENHANCED INTEGRATED FRAMEWORK

Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Diversifying Query Results on Semi-Structured Data

Semantically enhanced SLA Negotiation

ENHANCED INTEGRATED FRAMEWORK TRUST FUND MANAGER

Semi-structured data - exercises

Semi-structured Data

Semi-Structured data (XML)