1 / 18

Big Data security & privacy

This version 1.2 discusses the integration of security innovations into Apache Storm for secure deployment in big data environments, including features like Kerberos authentication, multi-tenant scheduling, and secure integration with other Hadoop projects. It also explores possible directions for NIST Big Data PWG and the implications and definition of a security fabric.

timmyh
Download Presentation

Big Data security & privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Version 1.2 Big Data security & privacy NIST Public Working Group Version 2 Possible Directions

  2. Version 1.2 NIST S&P Version 2: The Big Two • “New” Big Data Security and Privacy Design Patterns • Big Data Security Fabric

  3. Version 1.2

  4. Version 1.2 Reality Check in Apache Ecosystem • Secure, Multi-Tenant Deployment • Much like the early days of Hadoop, Apache Storm originally evolved in an environment where security was not a high-priority concern. Rather, it was assumed that Storm would be deployed to environments suitably cordoned off from security threats. While a large number of users were comfortable setting up their own security measures for Storm, this proved a hindrance to broader adoption among larger enterprises where security policies prohibited deployment without specific safeguards. • Yahoo! hosts one of the largest Storm deployments in the world, and the engineering team recognized the need for security early on, so it implemented many of the features necessary to secure its own Apache Storm deployment. Yahoo!, Hortonworks, Symantec, and the broader Apache Storm community have been working on integrating those security innovations into the main Apache code base. • That work is nearing completion, and is slated to be included in an upcoming Apache Storm release. Some of the highlights of that release include: • Kerberos Authentication with Automatic Credential Push and Renewal • Multi-Tenant Scheduling • Secure integration with other Hadoop Projects (such as ZooKeeper, HDFS, HBase, etc.) • User isolation (Storm topologies run as the user who submitted them) • In the future, you can expect to see further integration between Apache Storm and security-focused projects like Apache Argus (formerly XA Secure). http://bit.ly/1Dlf2UP

  5. Version 1.2 Implications | Directions • NIST Big Data PWG documentation should show awareness of trends & current efforts (good & bad) • NIST Big Data PWG should be a step or two ahead • Incorporate or link to work in grid, VLDB, distributed computing • May need to separate “Expository” from “Technical” documents (a la Oasis TCs) • What elements s/b fabric? • What elements s/b design patterns?

  6. Version 1.2 Security & Privacy (& Management) System Orchestrator Big Data Application Provider Analytics Visualization Curation Data Consumer Data Provider DATA DATA Collection Access SW SW SW DATA Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Horizontally Scalable Vertically Scalable Platforms (databases, etc.) Security & Privacy Management Horizontally Scalable Vertically Scalable Infrastructures Horizontally Scalable (VM clusters) Vertically Scalable Physical and Virtual Resources (networking, computing, etc.) 6

  7. Version 1.2 What is a security fabric? • Fabric computing has an accepted definition. We must clarify & amplify from that starting point: • Fabric computing or unified computing involves the creation of a computing fabric consisting of interconnected nodes that look like a 'weave' or a 'fabric' when viewed collectively from a distance.[1] • Usually this refers to a consolidated high-performance computing system consisting of loosely coupledstorage, networking and parallel processing functions linked by high bandwidth interconnects (such as 10 Gigabit Ethernet and InfiniBand)[2] but the term has also been used to describe platforms like the Azure Services Platform and grid computing in general (where the common theme is interconnected nodes that appear as a single logical unit).[3] • The fundamental components of fabrics are "nodes" (processor(s), memory, and/or peripherals) and "links" (functional connection between nodes).[2] While the term "fabric" has also been used in association with storage area networks and switched fabricnetworking, the introduction of compute resources provides a complete "unified" computing system. Other terms used to describe such fabrics include "unified fabric",[4] "data center fabric" and "unified data center fabric".[5] • According to Ian Foster, director of the Computation Institute at the Argonne National Laboratory and University of Chicago, "grid computing 'fabrics' are now poised to become the underpinning for next-generation enterprise IT architectures and be used by a much greater part of many organizations."[3]

  8. Version 1.2 Big Data S&P Fabric Possible starting points • Orchestrator as workflow manager for policy propagation • Collection: Event triggers for collection of PII • Curation: Provenance; human-mediated processes; automated curation tools • Visualization: Risks around images of people in context; e.g., Google Street View, facial recognition • Analytics: Controls over de-anonymizationanalytics apps with demonstrable commercial or forensic value • Organization-specific issues: Tied to framework providers – internal roles, platform-specific features

  9. Version 1.2 “Fabric” Not Original, Which is Good

  10. Version 1.2 Use Case: Image Processing

  11. Version 1.2 Big Data: Risks & Solutions http://bit.ly/1CaUTyZ

  12. Version 1.2 Audit to Provenance • Policy-preserving pipelines, processes • Can the systems orchestrator do this? • Curator: Use cases from Internet of Things • Threats to data providers

  13. Version 1.2 Existing Models • Guidance Leverage existing models that integrate roles, organizations and technologies. Don’t reinvent – show what’s new or different. • Organized around the V’s or RA components • Example: ITIL Security Management • Example: Oasis Privacy-by-Design for software engineers http://bit.ly/1Cb7gLA PbD-SE offers a privacy extension/complement to OMG’s Unified Modeling Language (UML) and serves as a complement to OASIS’ eXtensible Access Control Mark-up Language (XACML) and Privacy Management Reference Model (PMRM). • Oasis XACML core XML schema for representing authorization and entitlement policies • Oasis Privacy Management Reference Model

  14. Version 1.2 Coordination

  15. Version 1.2 Concepts in Play • Possible S&P Design Patterns for PII • PII should be called out in Big Data models • Borrow from DoDAF where S&P is life-and-death • Kantara Initiative User-Managed Access • Privacy by Design (next slide) • IoT standards – end point protection • Operations on encrypted content • Variety-influenced policy management • New Big Data roles for curation, risk management, governance

  16. Version 1.2 Definitions: Privacy and Security • Possible Blur Lines • Information Assurance • Provenance • Risk Management • De-anonymizing analytics • Time-dependent information value

  17. Version 1.2 Privacy by Design Good red zone markings, but not fabric or Big Data design patterns

  18. Version 1.2 Mark Underwood Mark.underwood@kryptonbrothers.com http://bigdatawg.nist.gov/home.php

More Related