430 likes | 547 Views
Federal Big Data and Cognitive Metadata - Goodier. Agenda. F ederal big data is enhanced by cognitive metadata Clearly understanding the paradigm shift Review of Security and Privacy implications for the federal government Cyber Threat Cognitive Metadata solution.
E N D
Agenda Federal big data is enhanced by cognitive metadata Clearly understanding the paradigm shift Review of Security and Privacy implications for the federal government Cyber Threat Cognitive Metadata solution
1. Balancing the Cyber Big Data equation The Internet was built without a way to know who or what you were connecting to • Federal internet service providers workaround this with a patchwork of identity security controls and NIAP certifications • No fair blaming the user – no framework, no cues, no control
2. Safeguarding and Sharing Information “One of the biggest questions is how to evolve the risk managementmodel. What is secure enough and agile enough to support the mission?Security, agility, and transparency decisions are driven by mission priorities.” – Major Linus J. Barloon II, Chief, J3 Cyber Operations Division at White House Communications Agency http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf
2. Safeguarding and Sharing Information “ For example, the United States Government Accountability Office (GAO) aggregates data from many agencies. Recognizing the inherent risks, GAO sets up discrete network enclaves that are distinct from their agency-wide network, for Big Data. It assigns appropriate levels of security to each enclave driven by the sensitivity of the data therein. • Other agencies note they ensure Big Data is stripped of personally identifiable information (PII) before it leaves the originating agency’s control. • Data aggregation needs will expand as more elements of the critical infrastructure adopt increased cyber protection and detection capabilities that will drive enhanced data/ information sharing.” • - www.meritalk.com • Beacon Report • Balancing the Cyber Big Data Equation
2. Synopsis of Security and Privacy for Federal Big Data Federal agencies are required by law (e.g., the Privacy Act of 1974) to give notice to individuals, when collecting information from them, of the authority, purpose, and uses of PII when such data will be maintained as agency records that will be retrieved by individual name or other identifier.1 When agencies use a Web site to collect or share data, agencies must post a privacy policy, as required by Section 208 of the E-Gov Act and OMB guidance.2 In all cases, privacy notices must be prominent, salient, clearly labeled, written in plain language, and available at all locations where notice is needed.
2. Review: Federal Big Data is different from industry Over time, agencies, digital developers, and data users may also create, discover, or propose new and innovative ways to combine, share, or otherwise leverage the power of the digital data and content collected or disseminated by their digital services or programs. If data will be re-combined, used or shared in ways that individuals did not originally contemplate or expect, agencies must consider the need, under applicable law or policy, to provide such individuals with additional or updated notice of their privacy rights and choices.4 In determining precisely when, where, and how to give such notice, agencies, their digital developers, and partners will need to exercise creativity and ingenuity to ensure that required notices are clearly communicated to individuals at the right time and place, and in the right manner, without unduly interfering with the user experience. The timing and format of such notices may need to vary, depending on the digital or mobile platform involved. 5
2. Federal Big Data today http://www.google.com/intx/en/enterprise/apps/government/products.html?section=drive https://explore.data.gov/ http://catalog.data.gov/harvest Privacy advocates are concerned about the threat to privacy represented by increasing storage and integration of personally identifiable information; expert panels have released various policy recommendations to conform practice to expectations of privacy.[99][100][101] Cognitive metadata are sets of innovative privacy-enhancing technologies which enable new techniques for data analytics that minimize costs to privacy.
Data/Compute Storage/Metadata Utility/Networking Content Delivery 2. FED RAMP certified commercial clouds App-components-as-a-service Google App Engine Software-platform-as-a-service Data Intensive Amazon Hadoop, Public Data Sets, Simple DB Virtual-Infrastructure-as-a-Service Shared physical resources Physical infrastructure GCDS Akamai GOV CLOUD certified government clouds
2. Before clouds swallowed the enterprise, Gov met requirements with defined EA structures Data ? Mission Service Data Data MissionService MissionService MissionService Data MissionService MissionService Sub-Enterprise Security Enterprise alignment – trust, credentials Dept/Agency Dept/Agency Network Network Federation Federation SubEnterprise SubEnterprise SubEnterprise Source:H Reed DoD Multi Service SOA team
2. EA Privacy & Security focused on message exchange – NIEM 3.0 – and dissemination labels Our Typical security focus was here Programmatic Management Operational Federation According to the Multi-Service SOA community: -- Focus of DoD/IC Security is primarily at the “participant” and “operational” level. -- Implication is that most Service Oriented security discussion will be at this level. SERVICE CODE Internal MESSAGE EXCHANGE Participant SHARED SECURITY Sub-Enterprise Unfortunately that leaves lots of gray area for data spills! GOVERNANCE Super-Enterprise
2. Example: NIEM and NISS Message Exchanges Each encounter describes an interaction with a person-of-interest (POI). A POI is one who possesses an identity that is associated with derogatory information residing in a system-of-record (SOR) containing watchlisted individuals. The Encounter specification is designed to convey encounter activity (e.g., who, what, when, where), any watchlist searches performed, and any encounter analysis results for Suspicious Activity Reports (SARs). Testing PII incident responses at scale https://www.niem.gov/training/Pages/train.aspx
3. PII Incident Federal Use Case at 4V Message scale – what is the worst that can happen?
3. As Federal Big Data apps expand, our data channels grow and our exposure to risk increases http://www.verizonenterprise.com/DBIR/2013/
3. Federal PII Protections for April 15 • http://www.cnbc.com/id/101496551 Identity thieves are stealing billions of dollars a year through fraudulent tax refunds—and the IRS isn't the only target. The 43 states that collect an income tax are also being flooded with these bogus returns.
Cloud-First 2010 Lines of Business Round 2 (Geo, BFE, ITI, ISS) 2006 E-Gov Initiatives Round 2 (DAIP, ITDS, IAD-Loans/Grants) 2008 Shared Services 2011 Lines of Business Initial 5 (HR, GM, FM, FHA,CM) 2004 E-Government Act 2002 Quicksilver 2001 Payroll Consolidation Completes 2009 E-Gov Initiatives Initial 25 2003 Clinger-Cohen 1996 GAO Report: Opportunities to Reduce Potential Duplication 2011 3. Risk Exposure grows as our use of Federal Shared Services grows
4. Ensuring adherence to Security and Privacy regulations across identities shared in the federal clouds • To • retain MEANING (aka, contextual semantics) • in loosely coupled, highly flexible • multi-tenant environments
4. Solutions for the Federal Use Case from Industry Amazon Fire TV review: the set-top that tries to do everything ASAP Advanced Stream and Prediction http://www.engadget.com/2014/04/09/amazon-fire-tv-review/ Movies or tv shows are buffered for playback before users hit the play button, the company says; those choices are made by analyzing users’ watch lists and recommendations. As users’ viewing habits change, the caching prediction algorithm will adjust accordingly, and personalization capabilities should get better over time 8118 http://www.ibmbigdatahub.com/blog/caveat-use-internet-things-behavioral-analytics
4. Solutions for the Federal Use Case from Research Cognitive metadata: Advanced Streaming and Prediction for improved regulatory and incentive performance Caching prediction algorithms will adjust according to risk exposure, and personal information protection capabilities should get better over time 8118
4. Metadata solutions shared across government at the new scale of IT • Federal Risk and Authorization Management Program – FedRAMP • Align budget and acquisitions with the technology cycle; • improve program management; • streamline governance and increase accountability; • increase engagement with the IT community; and • adopt lighter technologies and shared solutions--including the adoption of a "cloud-first" policy. • www.cio.gov
4. What is the Cognitive Metadata Solution …cognitive metadata (i.e. metadata coming from our perception, reasoning, or intuition such as preference for a type of content), which is very useful for personalization purposesand conversely for limiting PII incidents. Personalities and personas We protect the personal identifying information of people that link to us, and protect what they’re interested in, so we identify and encrypt the following: What does this person care about? What are the types of things they’ll respond to? What’s the value-add our content offers them? What are their turn-ons and turn offs? Initially this is a mostly qualitative process, since we're manually reviewing the data. It's not perfect science. but it does benefit from information sharing patterns that build the cognitive metadata repository toultimately improve automated reasoning.
4. Federal Use Case and Cognitive Metadata Cognitive metadata identifies PII in the context of this study so individuals involved can be protected • http://en.wikipedia.org/wiki/Sensitivity_and_specificity Imagine a study evaluating a new test that screens people for a disease. Each person taking the test either has or does not have the disease. The test outcome can be positive (predicting that the person has the disease) or negative (predicting that the person does not have the disease). The test results for each subject may or may not match the subject's actual status. In that setting: • True positive: Sick people correctly diagnosed as sick • False positive: Healthy people incorrectly identified as sick • True negative: Healthy people correctly identified as healthy • False negative: Sick people incorrectly identified as healthy In general, Positive = identified and negative = rejected. Therefore: • True positive = correctly identified • False positive = incorrectly identified • True negative = correctly rejected • False negative = incorrectly rejected
4. Federal Use Case and Machine Learning http://en.wikipedia.org/wiki/AdaBoost Problems in machine learning often suffer from the curse of dimensionality — each sample may consist of a huge number of potential … and evaluating every feature can reduce not only the speed of classifier training and execution, but in fact reduce predictive power.... Unlike neural networks and SVMs, the AdaBoost training process selects only those features known to improve the predictive power of the model, reducing dimensionality and potentially improving execution time as irrelevant features do not need to be computed.
4. Current State of Language Technology Big Data works well making good progress Sentiment analysis still really hard mostly solved Best roast chicken in San Francisco! Question answering (QA) The waiter ignored us for 20 minutes. Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness? Coreference resolution Spam detection ✓ Let’s go to Agra! Paraphrase ✗ Carter told Mubarak he shouldn’t run again. Buy V1AGRA … Word sense disambiguation (WSD) XYZ acquired ABC yesterday ABC has been taken over by XYZ Part-of-speech (POS) tagging I need new batteries for my mouse. Summarization ADJ ADJ NOUN VERB ADV Parsing Colorless green ideas sleep furiously. The Dow Jones is up Economy is good The S&P500 jumped I can see Alcatraz from the window! Named entity recognition (NER) Housing prices rose Machine translation (MT) Dialog PERSON ORG LOC 第13届上海国际电影节开幕… Where is Citizen Kane playing in SF? Einstein met with UN officials in Princeton The 13th Shanghai International Film Festival… Castro Theatre at 7:30. Do you want a ticket? Information extraction (IE) PartyMay 27add You’re invited to our dinner party, Friday May 27 at 8:30
4. Cognitive metadata employs predictive algorithms from Big Data Machine Learning combined with Natural Language Processing Cognitive metadata uses a three-step management process that translates Policy documents into formal policy rule sets that computers can understand and evaluate. • Policy documents are translated into digital policies, using Natural Language Processing technologies. • Policy deconfliction ensures consistency and operational desirability. Automated deconfliction, using Turing methods and Theorem Proving Techniques that work with the constructs defined in XML, delivers active models of the resulting policy via a Policy Based Tool GUI. DPM delivers this new user interface to data stewards and Foreign Disclosure Officiers (FDOs) giving them total control over both the design and the approval of the resulting model. Then the human-approved set of deconflicted digital policies are translated into standard QOS policy-labeled services. • Digital policies are defined in a computer interpretable language which is also friendly to humans.
4. How cognitive metadata works • Regular expressions (regex) play a surprisingly large role • Sophisticated sequences of regular expressions are often the first model for any text processing text • For many hard tasks, we use machine learning classifiers • But regular expressions are used as features in the classifiers • Can be very useful in capturing generalizations
4. Cognitive Metadata is a result of data science Convergence Predictions that enhance machine learning fueled by knowledge at the Intersection of Our Digital Lives Danger Zone! Traditional Research Data Science Machine Learning 18/18
4. What are some applications of Cognitive Metadata • Machine Learning • Question Answering: IBM’s Watson • Paraphrase • Summarization • Information Extraction • Sentiment Analysis • Machine Translation • Coreference resolution • Word Sense disambiguation • Parsing • SPAM detection • Part Of Speech parsing • Named entity recognition
Policy Information Point (PIP) Context Handler Policy Decision Point (PDP) Policy Administration Point (PAP) 4. Cognitive Metadata provides automated reasonors for Federal PII policy adherence at scale Storage Cloud 6 10 Access Request ~ X.pdf Secure Map ~ Data & Metadata 17 9 17 USER Team Member Valid Access Policy Decision Service (PDS) 15 Cloud Gateway Soft Cert Soft Cert NPE Cert NPE Cert Policy Enforcement Point (PEP) Invalid No Access 5 Repository 8 Data Producer 2 Cognitive Metadata Audit Service Certificate Validation Service (CVS) Utility Clouds 3 7 Access Denied ~ Reason ~ Location not relevant to data Attribute Service (AS) Metadata Service CERT 4 Smart Data Digital Policy Repositoryautomated Reasoner 16 1c 13 1b 14 12 1a 11 13 11 15 Data Cloud User Attributes Metadata Tagging Tool & Repository PKI /CERT Ozone Widget Framework Access Request ~ X.pdf Secure Map ~ 1 IT SUPPORT TEAM USER Team Member
Because the Federal government has No shortage of policy… SCAP does NOT resolve security needs for SA when we are OUTSIDE the NETWORK.
But knowing this is still a challenge … Codifying federal big data decisions But people drive standards and policy. People do not move at Cyber speed. People need cognitive metadata and data to support decision-making. Data-driven situational awareness augments governance.
To perform Continuous Monitoring We divide and conquer the complexity of regulatory compliance by codifying big data relationships by mission, to maintain situational awareness of all known risk mitigations, and waivers. Organize by Mission MISSION Area Of Responsibility You can apply data and metadata according to the mission’s specific risk profile and known standards and waivers. Tier 1 Enterprise Regional Local Tier 2 Tier 3
Why Cognitive Metadata? • Cognitive metadata provides the answers you need when • Sorting through millions of data items to pinpoint key PII incidents that may be crucial. • By including sophisticated semantic analytics, vastly reduces the time and budget that might otherwise be needed for a substantive analysis of the regulatory compliance for any set of records.
Cognitive metadata maps the right Context to the right Policy as an ASAP-style service
Cognitive metadata = PII protection as a service Cognitive metadata helps support Computer Network Defense (CND) data. Cognitive metadata supports executive orders EO 13587 for rapid response to Insider Threat. Cognitive metadata helps support dynamic data for audit event management.