ABSTRACT • A data distributor has given sensitive data to a set of supposedly trusted agents. Some of the data are leaked and found in an unauthorized place. • The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. • We propose data allocationstrategies that improve the probability of identifying leakages. • These methods do not rely on alterations of the released data (e.g., watermarks). Data Leakage Detection

INTRODUCTION • DISTRIBUTER: He is the owner of the data who distributes the data to the third parties. • THIRD PARTIES: Trusted recipient’s of the distributer’s data who are also called as agents. • PERTURBATION: Technique where the data are modified and made less sensitive before being handed to agents. • ALLOCATION STRATEGIES: Tactics used by the distributer to allocate the sensitive data in order to increase the probability of detecting the data leakage. Data Leakage Detection

OBJECTIVES • Avoiding the perturbation of the original data before being handed to the agents. • Detecting if the distributer’s sensitive data has been leaked by the agents. • The likelihood that an agent is responsible for a leak is assessed. Data Leakage Detection

STUDY AND ANALYSIS EXISTING SYSTEM • Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. • If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. DRAWBACKS OF EXISTING SYSTEM • Watermarking involves some modification of the original data. • Watermarks can sometimes be destroyed if the data recipient is intelligent. Data Leakage Detection

PROPOSED SYSTEM ALLOCATION STRATEGIES: The proposed system uses two allocation strategies through which the data is allocated to the agents. They are, • Sample request Ri=SAMPLE (T, mi): Any subset of mi records from T can be given to agent. • Explicit request Ri=EXPLICIT (T, condition): Agent receives all T objects that satisfy condition. Data Leakage Detection

FLOW CHART: start User’s explicit request Check the Condition Select the agent. else exit Create Fake Object is Invoked Loop Iterates User Receives the Output. end Data Leakage Detection

Example: • Say that T contains customer records for a given company A. Company A hires a marketing agency U1 to do an online survey of customers. • Since any customers will do for the survey, U1 requests a sample of 1,000 customer records. • At the same time, company subcontracts with agent U2 to handle billing for all California customers. • Thus, U2 receives all T records that satisfy the condition “state is California.” Data Leakage Detection

FUTURE SCOPE • Future work includes the investigation of agent guilt models that capture leakage scenarios. • Theextension of data allocation strategies so that they can handle agent requests in an online fashion. Data Leakage Detection

LIMITATION • The presented strategies assume that there is a fixed set of agents with requests known in advance. • The distributor may have a limit on the number of fake objects. Data Leakage Detection

APPLICATIONS • It helps in detecting whether the distributer’s sensitive data has been leaked by the trustworthy or authorized agents. • It helps to identify the agents who leaked the data. • Reduces cybercrime. Data Leakage Detection

CONCLUSION • Though the leakers are identified using the traditional technique of watermarking, certain data cannot admit watermarks. • In spite of these difficulties, we have shown that it is possible to assess the likelihood that an agent is responsible for a leak. • We have shown that distributing data judiciously can make a significant difference in identifying guilty agents using the different data allocation strategies. Data Leakage Detection

REFERENCES [1] P. Buneman and W.-C. Tan, “Provenance in Databases,” Proc. ACM SIGMOD, pp. 1171-1173, 2007. [2] Y. Cui and J. Widom, “Lineage Tracing for General Data Warehouse Transformations,” The VLDB J., vol. 12, pp. 41-58, 2003. [3] S. Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio Watermarking,” http://www.scientificcommons. org/43025658, 2007. [4] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An Improved Algorithm to Watermark Numeric Relational Data,” Information Data Leakage Detection

THANK YOU  Data Leakage Detection

CONTENTS

CONTENTS

Presentation Transcript

Contents

Contents

Contents

Contents

Contents

Contents

Contents

CONTENTS

Contents

Contents