180 likes | 189 Views
This version 1.2 discusses the integration of security innovations into Apache Storm for secure deployment in big data environments, including features like Kerberos authentication, multi-tenant scheduling, and secure integration with other Hadoop projects. It also explores possible directions for NIST Big Data PWG and the implications and definition of a security fabric.
E N D
Version 1.2 Big Data security & privacy NIST Public Working Group Version 2 Possible Directions
Version 1.2 NIST S&P Version 2: The Big Two • “New” Big Data Security and Privacy Design Patterns • Big Data Security Fabric
Version 1.2 Reality Check in Apache Ecosystem • Secure, Multi-Tenant Deployment • Much like the early days of Hadoop, Apache Storm originally evolved in an environment where security was not a high-priority concern. Rather, it was assumed that Storm would be deployed to environments suitably cordoned off from security threats. While a large number of users were comfortable setting up their own security measures for Storm, this proved a hindrance to broader adoption among larger enterprises where security policies prohibited deployment without specific safeguards. • Yahoo! hosts one of the largest Storm deployments in the world, and the engineering team recognized the need for security early on, so it implemented many of the features necessary to secure its own Apache Storm deployment. Yahoo!, Hortonworks, Symantec, and the broader Apache Storm community have been working on integrating those security innovations into the main Apache code base. • That work is nearing completion, and is slated to be included in an upcoming Apache Storm release. Some of the highlights of that release include: • Kerberos Authentication with Automatic Credential Push and Renewal • Multi-Tenant Scheduling • Secure integration with other Hadoop Projects (such as ZooKeeper, HDFS, HBase, etc.) • User isolation (Storm topologies run as the user who submitted them) • In the future, you can expect to see further integration between Apache Storm and security-focused projects like Apache Argus (formerly XA Secure). http://bit.ly/1Dlf2UP
Version 1.2 Implications | Directions • NIST Big Data PWG documentation should show awareness of trends & current efforts (good & bad) • NIST Big Data PWG should be a step or two ahead • Incorporate or link to work in grid, VLDB, distributed computing • May need to separate “Expository” from “Technical” documents (a la Oasis TCs) • What elements s/b fabric? • What elements s/b design patterns?
Version 1.2 Security & Privacy (& Management) System Orchestrator Big Data Application Provider Analytics Visualization Curation Data Consumer Data Provider DATA DATA Collection Access SW SW SW DATA Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Horizontally Scalable Vertically Scalable Platforms (databases, etc.) Security & Privacy Management Horizontally Scalable Vertically Scalable Infrastructures Horizontally Scalable (VM clusters) Vertically Scalable Physical and Virtual Resources (networking, computing, etc.) 6
Version 1.2 What is a security fabric? • Fabric computing has an accepted definition. We must clarify & amplify from that starting point: • Fabric computing or unified computing involves the creation of a computing fabric consisting of interconnected nodes that look like a 'weave' or a 'fabric' when viewed collectively from a distance.[1] • Usually this refers to a consolidated high-performance computing system consisting of loosely coupledstorage, networking and parallel processing functions linked by high bandwidth interconnects (such as 10 Gigabit Ethernet and InfiniBand)[2] but the term has also been used to describe platforms like the Azure Services Platform and grid computing in general (where the common theme is interconnected nodes that appear as a single logical unit).[3] • The fundamental components of fabrics are "nodes" (processor(s), memory, and/or peripherals) and "links" (functional connection between nodes).[2] While the term "fabric" has also been used in association with storage area networks and switched fabricnetworking, the introduction of compute resources provides a complete "unified" computing system. Other terms used to describe such fabrics include "unified fabric",[4] "data center fabric" and "unified data center fabric".[5] • According to Ian Foster, director of the Computation Institute at the Argonne National Laboratory and University of Chicago, "grid computing 'fabrics' are now poised to become the underpinning for next-generation enterprise IT architectures and be used by a much greater part of many organizations."[3]
Version 1.2 Big Data S&P Fabric Possible starting points • Orchestrator as workflow manager for policy propagation • Collection: Event triggers for collection of PII • Curation: Provenance; human-mediated processes; automated curation tools • Visualization: Risks around images of people in context; e.g., Google Street View, facial recognition • Analytics: Controls over de-anonymizationanalytics apps with demonstrable commercial or forensic value • Organization-specific issues: Tied to framework providers – internal roles, platform-specific features
Version 1.2 “Fabric” Not Original, Which is Good
Version 1.2 Use Case: Image Processing
Version 1.2 Big Data: Risks & Solutions http://bit.ly/1CaUTyZ
Version 1.2 Audit to Provenance • Policy-preserving pipelines, processes • Can the systems orchestrator do this? • Curator: Use cases from Internet of Things • Threats to data providers
Version 1.2 Existing Models • Guidance Leverage existing models that integrate roles, organizations and technologies. Don’t reinvent – show what’s new or different. • Organized around the V’s or RA components • Example: ITIL Security Management • Example: Oasis Privacy-by-Design for software engineers http://bit.ly/1Cb7gLA PbD-SE offers a privacy extension/complement to OMG’s Unified Modeling Language (UML) and serves as a complement to OASIS’ eXtensible Access Control Mark-up Language (XACML) and Privacy Management Reference Model (PMRM). • Oasis XACML core XML schema for representing authorization and entitlement policies • Oasis Privacy Management Reference Model
Version 1.2 Coordination
Version 1.2 Concepts in Play • Possible S&P Design Patterns for PII • PII should be called out in Big Data models • Borrow from DoDAF where S&P is life-and-death • Kantara Initiative User-Managed Access • Privacy by Design (next slide) • IoT standards – end point protection • Operations on encrypted content • Variety-influenced policy management • New Big Data roles for curation, risk management, governance
Version 1.2 Definitions: Privacy and Security • Possible Blur Lines • Information Assurance • Provenance • Risk Management • De-anonymizing analytics • Time-dependent information value
Version 1.2 Privacy by Design Good red zone markings, but not fabric or Big Data design patterns
Version 1.2 Mark Underwood Mark.underwood@kryptonbrothers.com http://bigdatawg.nist.gov/home.php