1 / 57

Scalable Trusted Computing Engineering challenge, or something more fundamental?

Scalable Trusted Computing Engineering challenge, or something more fundamental?. Ken Birman Cornell University. Cornell Quicksilver Project. Krzys Ostrowski: The key player Ken Birman, Danny Dolev: Collaborators and research supervisors

frayne
Download Presentation

Scalable Trusted Computing Engineering challenge, or something more fundamental?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Trusted ComputingEngineering challenge, or something more fundamental? Ken Birman Cornell University

  2. Cornell Quicksilver Project • Krzys Ostrowski: The key player • Ken Birman, Danny Dolev: Collaborators and research supervisors • Mahesh Balakrishnan, Maya Haridasan, Tudor Marian, Amar Phanishayee, Robbert van Renesse, Einar Vollset, Hakim Weatherspoon: Offered valuable comments and criticisms

  3. Trusted Computing • A vague term with many meanings… • For individual platforms, integrity of the computing base • Availability and exploitation of TPM h/w • Proofs of correctness for key components • Security policy specification, enforcement • Scalable trust issues arise mostly in distributed settings

  4. System model • A world of • Actors: Sally, Ted, … • Groups: Sally_Advisors = {Ted, Alice, …} • Objects: travel_plans.html, investments.xls • Actions: Open, Edit, … • Policies: • (Actor,Object,Action)  { Permit, Deny } • Places: Ted_Desktop, Sally_Phone, ….

  5. Rules • If Emp.place  Secure_Place and Emp  Client_Advisors thenAllow Open Client_Investments.xls • Can Ted, working at Ted_Desktop, open Sally_Investments.xls? • … yes, if Ted_Desktop  Secure_Places

  6. Miscellaneous stuff • Policy changes all the time • Like a database receiving updates • E.g. as new actors are added, old ones leave the system, etc • … and they have a temporal scope • Starting at time t=19 and continuing until now, Ted is permitted to access Sally’s file investments.xls

  7. Order dependent decisions • Consider rules such as: • Only one person can use the cluster at a time. • The meeting room is limited to three people • While people lacking clearance are present, no classified information can be exposed • These are sensitive to the order in which conflicting events occur • Central “clearinghouse” decides what to allow based on order in which it sees events

  8. Goal: Enforce policy Read investments.xls (data) Policy Database

  9. … reduction to a proof • Each time an action is attempted, system must develop a proof either that the action should be blocked or allowed • For example, might use the BAN logic For the sake of argument, let’s assume we know how to do all this on a single machine

  10. Implications of scale • We’ll be forced to replicate and decentralize the policy enforcement function • For ownership: Allows “local policy” to be stored close to the entity that “owns” it • For performance and scalability • For fault-tolerance

  11. Decentralized policy enforcement Read investments.xls (data) Policy Database Original Scheme

  12. Decentralized policy enforcement Read investments.xls (data) Policy DB 1 Policy DB 2 New Scheme

  13. So… how do we decentralize? • Consistency: the bane of decentralization • We want a system to behave as if all decisions occur in a single “rules” database • Yet want the decisions to actually occur in a decentralized way… a replicated policy database • System needs to handle concurrent events in a consistent manner

  14. So… how do we decentralize? • More formally: • Analogy: database 1-copy serializability Any run of the decentralized system should be indistinguishable from some run of a centralized system

  15. But this is a familiar problem! • Database researchers know it as the atomic commit problem. • Distributed systems people call it: • State machine replication • Virtual synchrony • Paxos-style replication • … and because of this we know a lot about the question!

  16. … replicated data with abcast • Closely related to the “atomic broadcast” problem within a group • Abcast sends a message to all the members of a group • Protocol guarantees order, fault-tolerance • Solves consensus… • Indeed, a dynamic policy repository would need abcast if we wanted to parallelize it for speed or replicate it for fault-tolerance!

  17. A slight digression • Consensus is a classical problem in distributed systems • N processes • They start execution with inputs {0,1} • Asynchronous, reliable network • At most 1 process fails by halting (crash) • Goal: protocol whereby all “decide” same value v, and v was an input

  18. Distributed Consensus Jenkins, if I want another yes-man, I’ll build one! Lee Lorenz, Brent Sheppard

  19. Asynchronous networks • No common clocks or shared notion of time (local ideas of time are fine, but different processes may have very different “clocks”) • No way to know how long a message will take to get from A to B • Messages are never lost in the network

  20. Fault-tolerant protocol • Collect votes from all N processes • At most one is faulty, so if one doesn’t respond, count that vote as 0 • Compute majority • Tell everyone the outcome • They “decide” (they accept outcome) • … but this has a problem! Why?

  21. What makes consensus hard? • Fundamentally, the issue revolves around membership • In an asynchronous environment, we can’t detect failures reliably • A faulty process stops sending messages but a “slow” message might confuse us • Yet when the vote is nearly a tie, this confusing situation really matters

  22. Some bad news • FLP result shows that fault-tolerant consensus protocols always have non-terminating runs. • All of the mechanisms we discussed are equivalent to consensus • Impossibility of non-blocking commit is a similar result from database community

  23. But how bad is this news? • In practice, these impossibility results don’t hold up so well • Both define “impossible  not always possible” • In fact, with probabilities, the FLP scenario is of probability zero • … must ask: Does a probability zero result even hold in a “real system”? • Indeed, people build consensus-based systems all the time…

  24. Solving consensus • Systems that “solve” consensus often use a membership service • This GMS functions as an oracle, a trusted status reporting function • Then consensus protocol involves a kind of 2-phase protocol that runs over the output of the GMS • It is known precisely when such a solution will be able to make progress

  25. More bad news • Consensus protocols don’t scale! • Isis (virtual synchrony) new view protocol • Selects a leader; normally 2-phase; 3 if leader dies • Each phase is a 1-n multicast followed by an n-1 convergecast (can tolerate n/2-1 failures) • Paxos decree protocol • Basic protocol has no leader and could have rollbacks with probability linear in n • Faster-Paxos is isomorphic to the Isis view protocol (!) • … both are linear in group size. • Regular Paxos might be O(n2) because of rollbacks

  26. Work-arounds? • Only run the consensus protocol in the “group membership service” or GMS • It has a small number of members, like 3-5 • They run a protocol like the Isis one • Track membership (and other “global” state on behalf of everything in the system as a whole” • Scalability of consensus won’t matter

  27. But this is centralized • Recall our earlier discussion • Any central service running on behalf of the whole system will become burdened if the system gets big enough • Can we decentralize our GMS service?

  28. GMS in a large system Global events are inputs to the GMS Output is the official record of events that mattered to the system GMS

  29. Hierarchical, federated GMS • Quicksilver V2 (QS2) constructs a hierarchy of GMS state machines • In this approach, each “event” is associated with some GMS that owns the relevant official record GMS0 GMS2 GMS1

  30. Delegation of roles • One (important) use of the GMS is to track membership in our rule enforcement subsystem • But “delegate” responsibility for classes of actions to subsystems that can own and handle them locally • GMS “reports” the delegation events • In effect, it tells nodes in the system about the system configuration – about their roles • And as conditions change, it reports new events

  31. Delegation In my capacity as President of the United States, I authorize John Pigg to oversee this nation’s banks Thank you, sir! You can trust me

  32. Delegation GMS0 GMS1 Policysubsystem

  33. Delegation example • IBM might delegate the handling of access to its Kingston facility to the security scanners at the doors • Events associated with Kingston access don’t need to pass through the GMS • Instead, they “exist” entirely within the group of security scanners

  34. … giving rise to pub/sub groups • Our vision spawns lots and lots of groups that own various aspects of trust enforcement • The scanners at the doors • The security subsystems on our desktops • The key management system for a VPN • … etc • A nice match with publish-subscribe

  35. Publish-subscribe in a nutshell • Publish(“topic”, message) • Subscribe(“topic”, handler) • Basic idea: • Platform invokes handler(message) each time a topic match arises • Fancier versions also support history mechanisms (lets joining process catch up)

  36. Publish-subscribe in a nutshell • Concept first mentioned by Willy Zwaenepoel in a paper on multicast in the V system • First implementation was Frank Schmuck’s Isis “news” tool • Later re-invented in TIB message bus • Also known as “event notification”… very popular

  37. Other kinds of published events • Changes in the user set • For example, IBM hired Sally. Jeff left his job at CIA. Halliburton snapped him up • Or the group set • Jeff will be handling the Iraq account • Or the rules • Jeff will have access to the secret archives • Sally is no longer allowed to access them

  38. But this raises problems • If “actors” only have partial knowledge • E.g. the Cornell library door access system only knows things normally needed by that door • … then we will need to support out-of-band interrogation of remote policy databases in some cases

  39. A Scalable Trust Architecture GMS hierarchy tracks configuration events GMS GMS GMS Pub/sub framework Roledelegation Slave systemapplies policy Masterenterprisepolicy DB Knowledge limited to locally useful policy Central database tracks overall policy Enterprise policy system for some company or entity

  40. A Scalable Trust Architecture • Enterprises talk to one-another when decisions require non-local information PeopleSoft Inquiry FBI (policy) Cornell University

  41. www.zombiesattackithaca.com

  42. Open questions? • Minimal trust • A problem reminiscent of zero-knowledge • Example: • FBI is investigating reports of zombies in Cornell’s Mann Library… Mulder is assigned to the case. • The Cornell Mann Library must verify that he is authorized to study the situation • But does FBI need to reveal to Cornell that the Cigarette Man actually runs the show?

  43. Other research questions • Pub-sub systems are organized around topics, to which applications subscribe • But in a large-scale security policy system, how would one structure these topics? • Topics are like file names – “paths” • But we still would need an agreed upon layout

  44. Practical research question • “State transfer” is the problem of initializing a database or service when it joins the system after an outage • How would we implement a rapid and secure state transfer, so that a joining security policy enforcement module can quickly come up to date? • Once it’s online, the pub-sub system reports updates on topics that matter to it

  45. Practical research question • Designing secure protocols for inter-enterprise queries • This could draw on the secured Internet transaction architecture • A hierarchy of credential databases • Used to authenticate enterprises to one-another so that they can share keys • They employ the keys to secure “queries”

  46. Recap? • We’ve suggested that scalable trust comes down to “emulation” of a trusted single-node rule enforcement service by a distributed service • And that service needs to deal with dynamics such as changing actor set, object set, rule set, group membership

  47. Recap? • Concerns that any single node • Would be politically unworkable • Would impose a maximum capacity limit • Won’t be fault-tolerant • … pushed for a decentralized alternative • Needed to make a decentralized service emulate a centralized one

  48. Recap? • This led us to recognize that our problem is an instance of an older problem: replication of a state machine or an abstract data type • The problem reduces to consensus… and hence is impossible • … but we chose to accept “Mission Impossible: V”

  49. … Impossible? Who cares! • We decided that the impossibility results were irrelevant to real systems • Federation addressed by building a hierarchy of GMS services • Each supported by a group of servers • Each GMS owns a category of global events • Now can create pub/sub topics for the various forms of information used in our decentralized policy database • … enabling decentralized policy enforcement

  50. QS2: A work in progress • We’re building Quicksilver, V2 (aka QS2) • Under development by Krzys Ostrowski at Cornell, with help from Ken Birman, Danny Dolev (HUJL) • Some parts already exist and can be downloaded now: • Quicksilver Scalable Multicast (QSM). • Focus is on reliable and scalable message delivery even with huge numbers of groups or severe stress on the system

More Related