1 / 9

IOP ’06 Open Source Intelligence Lesson Learned

IOP ’06 Open Source Intelligence Lesson Learned. Issues in using open source for intelligence. Growth and complexity of heterogeneous content. Not all open source data is equal – Quantities vs. Qualitative. Requirements of Ecoinformatics Architectures.

avram-lang
Download Presentation

IOP ’06 Open Source Intelligence Lesson Learned

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IOP ’06Open Source Intelligence Lesson Learned

  2. Issues in using open source for intelligence • Growth and complexity of heterogeneous content • Not all open source data is equal – Quantities vs. Qualitative • Requirements of Ecoinformatics Architectures

  3. Digital content is growing at dramatic rate 1024 = 1Trillion Terabytes of data which is equivalent to all the information consumed visually by all humans in a year Years Source: IBM 2005 GTO

  4. The scale of open source data and its heterogeneous form increases complexity of extracting intelligence Intelligence Personal multimedia Medical data stored Photos multimedia Surveillance bytes Structured data Storage online Free from text 109 1012 Heterogeneity 1015 1021 Scalable 1024 1027 Source: IBM 2005 GTO

  5. Legal Filings Quantitative Government Publications These are authoritative sources, where data is trusted and is defended Company Internal Content Industry Publication Company Publication Industry Journals Conference Proceedings These are credentialed opinions , the source is known and can be weighted Content Aggregators News & Press Releases Website affiliated with an organization NGO Publications Open opinion, it is impossible to verify the authority of the source User Groups / Forums News Letters Blogs / Weblogs Qualitative Non affiliated Websites Open Source Intelligence from the periphery requires an understanding of its topology, including strengths and weaknesses sources in the periphery

  6. Index Store Source Spotters Date Spotters Language Spotters DATA ACQUISITION Ecoinformatics Architectures need to be multi-layered WebFountain Business Insights Workbench Drug Research Performance Management Business Insights Workbench WS OminFind II Search Topic Tracking Buzz Analysis Network Associations Affinity Analysis Natural Clustering Applications Trending Snippet Analysis Customer Applications Clustering Communities Ranking Classification 10’s Cross-Page Annotators Annotation Parsing/ Tokenizing Searching Relevancy Volume Store Index Auto Entity Spotters Auto Geography Spotter Customer Taxonomy Spotter Porn & Dup Detection 100’s (pages/second) Un-Structured Data Structured Data Per-Page Annotators DATA ACQUISITION 1000’s World Wide Web Licensed Feeds Taxonomies Intranet Data Data Bases Newspapers Blogs Commercial Date Bases

  7. 140 120 100 # of Web Pages (000) 80 60 40 20 0 2001 2002 2003 2004 2005 Year 70 60 50 # of Web Pages (000) 40 30 20 10 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2005 Finding intelligence can require different view of the same information One dominant voice 4.5% Open Source Trend on Web 4.0% 3.5% 3.0% 2.5% % of OSI web documents 2.0% 1.5% 1.0% 0.5% 0.0% Robert Steele Eliot Jardines Mr Arno Reuser Douglas Rushkoff Congressman Rob Simmons Major General Patrick Cammaert Some event happened in August

  8. Network of Conference Attendees to auto-spotted Companies and Universities In this network view we don’t care about association with “Open Source Intelligence” but with companies and universities Context

  9. Conclusions on Open Source Intelligence Computers don’t create intelligence, people do – computers enable smart people Not all open source content is equal – know the sources Not every thing you see is right – it’s all about the CONTEXT Ecoinformation architecture supports - Large scale analytics of open source content - Integration of content other than open source - Power text analytic tools to support analysis of on topic stores

More Related