190 likes | 397 Views
Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM http://cs.unm.edu/~forrest forrest@cs.unm.edu. Introduction.
E N D
Sensitive Data In a Wired WorldNegative Representations of DataStephanie Forrest Dept. of Computer ScienceUniv. of New MexicoAlbuquerque, NM http://cs.unm.edu/~forrestforrest@cs.unm.edu
Introduction • Goal: Develop new approaches to data security and privacy that incorporate design principles from living systems: • Survivability and evolvability • Autonomy • Robustness, adaptation and self repair • Diversity • Extends earlier work on computational properties of the immune system: • Intrusion detection • Automated response • Collaborative information filtering
Project Overview • Immunology and data: • Negative representations of information • Epidemiology and the Internet: • Social networks matter • The real world is not always scale free • The social utility of privacy: • Why is privacy an important value in democratic societies? • Evolutionary perspective
Collaborations • Paul Helman and Cris Moore (UNM) • Robert Axelrod and Mark Newman (Univ. Michigan) • Matthew Williamson (Sana Security) • Rebecca Wright and Michael de Mare (Stevens) • Joan Feigenbaum and Avi Silberschatz (Yale) • Fernando Esponda’s post-doc next year.
How the Immune System Distributes Detection • Many small detectors matching nonself (negative detection). • Each detector matches multiple patterns (generalization). • Advantages of distributed negative detection: • Localized (no communication costs) • Scalable and tunable • Robust (no single point of failure) • Private
Applications to Computing • Anomaly detectorsearlier work • Information filtersearlier work • Adaptive queriesfuture • Negative representationsin progress • A positive set DB is a set of fixed length strings. • A negative set NDB represents all the strings not in DB. • Intuition: If an adversary obtains a string from NDB, little information is revealed. Example: • U= All possible four character strings • DB={juan, eric, dave} • U-DB={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…} • There are 264-3= 456973 strings in U-DB.
Results • Can U-DB be represented efficiently, given |U-DB| >> |DB| ? • YES: There is an algorithm that creates an NDB of size polynomial in DB. • Strategy: Compress information using don’t care symbol. Other representations? • What properties does the representation have? • Membership queries are tractable (linear time even without indexing). • Other queries, information leakage are future work. • Inferring information from a subset of NDB (next slide). • Inferring DB from NDB is NP-Hard (note: not doing crypto): • Currently investigating instance difficulty. • Algorithms for increasing instance difficulty. • On-line insert/delete algorithms preserve problem difficulty. • Collaborations with R. Wright, M. de Mare, and C. Moore.
What information is revealed by queries?(without assuming irreversibility) • Having access to a subset of NDB (or DB) yields some information about strings outside that subset: • Assume NDB (or DB) is partitioned into n subsets. • To the query “Is x in DB,” what do I learn about x if x is not in my subset? • Must consult n subsets of NDB to conclude that x is in DB. • Must consult the subsets only until x is found (on average n/2). • Assumes that we care more about DB than U-DB. Probability and information content as the membership of strings is revealed. DB contains 10% of all possible L-length strings (formulas).
Private Set Intersection • Determine which records are in the intersection of several databases i.e. • DB1 DB2 … DBn • (NDB1 NDB2 … NDBn) • Each party may compute the intersection • DBi (NDB1 NDB2 … NDBn) • Party i learns only the intersection of all the sets, • And not the cardinality of the other sets.
Results cont. • How might these properties be useful? • Protect data from insider attacks • Computing set intersections • Surveys involving sensitive information • Anonymous digital credentials • Fingerprint databases • Other ideas? • Prototype implementations: • Perl, C • http://esa.ackleyshack.com/ndb • See demo
Computer EpidemiologyJustin Balthrop, Mark Newman, Matt Williamson • Information spreads over networks of social contacts between computers: • Email address books. • URL links. • Network topology affects the rate and extent of spreading: • Epidemiological models, and the epidemic threshold. • Controlling spread on scale-free networks: • Random vaccination is ineffective (e.g., anti-virus software). • Targeted vaccination of high-connectivity nodes. • Control degree distribution in time rather than space. Science 304:527-529 (2004)
The Social Utility of PrivacyRobert Axelrod and Ryan Gerety • Typical framing: • Privacy values should remain as is (e.g., Lessig). • Individual rights vs. state (i.e., civil liberties vs. community safety / crime). • A community may have its own interest in defending individual privacy (and not), independent of the civil liberties argument: • To promote innovation in changing environments. • To cope with distortions (e.g., overconfidence of middle managers). • To compensate for overgeneralized norms. • Not necessarily advocating more privacy: • From a societal/informational point of view how should appropriate bounds on privacy be determined? • Current status: • Exploratory modeling based on simple games.
Next Steps: Negative Representations • Distributed negative representations • Leaking partial information • Relational algebra operators on the negative database: • Select, join, etc. • Instance difficulty: • Hiding given satisfying assignments in a SAT formula • Approximate representations • Other representations? • More realistic implementations • Negative data mining: • Is it easier/harder to find certain instances in NDB? • Imprecise representations: • Partial matching and queries • Learning algorithms
People Stephanie Forrest Fernando Esponda Paul Helman Elena Ackley
Publications • F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.'' International Journal of Information Security (submitted March 2005). • F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On-line negative databases.'' Journal of Unconventional Computing (in press). • F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative detection.'' IEEE Transactions on Systems, Man, and Cybernetics 34:1 pp. 357-373 (2004). • J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the spread of computer viruses.'’ Science 304:527-529 (2004). • H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.'' "2005 International Conference on Programming Languages and Compilers (PLC'05) (in press). • F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On-line negative databases.'' Third International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).
Probabilities BACK
Generating Hard-to-Reverse Negative Databases • The randomized algorithm can be used to create a negative database. • Insert/Delete operations turn known hard formulas into negative databases. • The Morph operator may be used to search for hard instances. H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable formulas” SAT 2004.
Effect of the Morph operation • The Morph operation takes as input a negative database NDB and outputs NDB’ that represents the same set U-DB. • The plot shows how the complexity of a database changes after applying the morph operator.