230 likes | 353 Views
CSC321/545: Summary of Database Techniques . Dr. Zhen Jiang Computer Science Department West Chester University West Chester, PA 19383. Outline. Overview Non-relational DB system NonSQL DB system Injection Inference Role access control (UML) Perturbation Design.
E N D
CSC321/545: Summary of Database Techniques Dr. Zhen Jiang Computer Science Department West Chester University West Chester, PA 19383
Outline • Overview • Non-relational DB system • NonSQL DB system • Injection • Inference • Role access control (UML) • Perturbation • Design
Database System Overview Database data Query request DBMS
Integration • Administration • Security & encryption • Privacy & inference • Transaction & injection • Sketching & hashing
Traditional Database • The relation of key vs. non-key • The relation between key and foreign key • Intra-table relation • Inter-table relation • E-R diagram • http://www.cs.wcupa.edu/~zjiang/ER.pdf • Any regularity? • Arbitrary & Abrupt • Ambiguity • Sample of such ambiguity in normalization process caused by the lack of background
Non-Relational Database • Data does not relate in the true sense • e.g., Mongo, which handles document stores or other content and/or metadata stores
NonSQL Database • A more clear structure • e.g., Kobo, Playtika (mobile service) • Distributed database system • No need and not possible for a “join” operator • Fast third-party data aggregation • Fast caching for application objects • Globally distributed data repository • E-commerce and internet burstness • Game (data intensive applications) • Ad targeting (social networks)
Query request Ok? API DBMS
Injection • Direct DB injection • http://www.youtube.com/watch?v=v6bphRHH4sM • Indirect DB injection • http://www.irongeek.com/i.php?page=videos/webgoat-sql-injection
interrupt each transaction as you debug and trace the record of each transaction
Authorization • Restrict access to data and restrict the actions that people may take (when they access data). • Encryption • Scramble data so that the data cannot be read. • Authentication • Password check • Key protection, not to protect everything! • Role based access control
Inference (aggregation) • Basically, inference occurs when users are able to piece together (aggregate) information to determine a fact that should be protected. • Role cheating
General Jones (who has a top security clearance) requests information and would see all three. • Civilian Smith (who has no security clearance) requests the data and would see the following data:
When Smith sees that nothing is scheduled for hold B on flight 1254, he might attempt to insert the record, and his insertion will fail due to the unique constraint on cargo space availability. • He has all the data he needs to infer that there is a secret shipment on flight. • He could then cross-reference the flight information table to find out the source and destination of the secret shipment and various other information.
Poly-instantiation: allows different records (hold B) to exist in the same table. • Overbooking!
Other caurses such as: • Count of highly preferred customers • Average salary • Problem is difficult • Information? • Content: what is critical? • Path? • Hold A-C, Hold B? Total space? Probing!
Existing solutions • Limit access • Role access control • Too many restriction could seriously hinder the functionality
Perturbation • Alter the data so that individual details are accurate but overall generalization are inaccurate. • Include dummy data in the results returned by the query unauthorized. • Protect sensitive data, but also achieve preservation of the properties of the dataset. • Sketching with a probability of p. • With probability p to use the original data • With probability (1-p) to use a replacement
Preservation • Given each query f in the original table T with n rows, build a re-constructible query f’ in the revised table T’ (with n rows), so that the result difference can be controlled in a limited range with a probability of p. • In other words, the expected number of rows that get perturbation is n(1-p). For a domain ∆C, n(1-p)k rows will be expected to lie within the available value range (k ∆C), k[1, 0]. • Among total nr rows observed from T’ in the value range (k ∆C), subtracting the n(1-p)k rows, we have the estimation for the number of unperturbed rows. Scaled up by 1/p, we get the total number of original rows (n0), as only a p fraction of rows were retained.
Security and Privacy • f’ = n0/n • [n-n0, n0]A = [n-nr, nr] • a=Pr(row T) vs. b=Pr(row in perturbed table T’) • Privacy breach, security threshold > a / b b b’ (sketch does not help to distinguish the cases) • Server Storage (with a) vs. Client retaining (with b)
OO Design for DB Systems • Injection, inference • RBAC (role based access control) • Use case • http://www.cs.wcupa.edu/~zjiang/intro_uc.ppt • Class design is needed for better maintaining the data ownership • http://www.cs.wcupa.edu/~zjiang/csc545_oo_design.htm • Non-relational DB • Activity pattern – prediction of future relation, e.g., credit card security • NonSQL DB • Relations in structure for the use.