1 / 20

Generating Dynamic Social Networks from Large Scale Unstructured Data

Discover how to use enterprise software to make sense of complex and messy data, and create social networks that provide valuable insights and drive actions. Tim Estes, CEO of Digital Reasoning, explains the importance of social networks and how they can be built algorithmically. Explore the concepts of similarity, link analysis, and connectedness, and see how powerful analytics can remove the need for a priori structure. Learn why Digital Reasoning specializes in making sense of unstructured data and how they leverage entity-centric systems to support critical intelligence in the Defense and Intelligence Community.

davidculler
Download Presentation

Generating Dynamic Social Networks from Large Scale Unstructured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating Dynamic Social Networks from Large Scale Unstructured DataEnterprise Software to Make Sense of Really Junky Data • Tim Estes - CEO, Digital Reasoning

  2. What We’ll Discuss • What is a social network? • The web of relationships between entities that influences actions • Why does it matter? • To reference Aesop: “You are known by the company you keep.” • What’s required to build one algorithmically? • What’s similar, what’s the same, what’s connected

  3. What’s similar? We use patented algorithms for deducing related terms from the data… Bush White House Nashville Justin Timberlake Britney Spears president bush president george w administration bush administration george george w george bush brown american clinton house gov white clinton the administration president-elect barack obama barack president george w tenn the predators predators oakland milwaukee st louis carolina a season baltimore kentucky miley cyrus pussycat dolls bob dylan nine inch nails rock star the timberwolves sean preston lanarkshire ticket prices nme britney spears the album x factor my friends mtv madonna lady gaga singer a student

  4. What’s the same? Concept resolution: Roll up similar things into groups of the same (again, algorithmically) Example: Tony Blair

  5. Link analysis: Show who and what are connected (again, you guessed it, algorithmically) • What’s connected? Terrorist Leader Connections

  6. Let’s Put an Idea to the Test... • With powerful analytics can you remove some or most of the need for a priori structure in designing and understanding social networks or other quasi-ontological schemas? • Can you also do it with messy unstructured data? YES and YES

  7. But first... Why do we(Digital Reasoning) care?

  8. Because its what we do for a living. We make sense of the senseless. • Our customers have critical needs • Digital Reasoning works primarily in the Defense and Intelligence Community making sense of noisy, unstructured data and turning it into usable entity-centric systems supporting mission critical intelligence. • The data is big and bad • Little structure in content, topics all over the place, and totally different ontologies/schemas across the community. • The times we live in create urgencies • We care because the better and faster we are at making sense of this kind of data, the safer our country is.

  9. Why did we take a data-centric, deployed software model? • Unique Environments • Given who our customers are... we can’t host their data. No one can. The solution had to be a pure deployed software model. • Meaning in Hard to Reach Places • The data is basically a bunch of pieces that don’t want to be connected. People that don’t want to be found. • Result? • Imagine trying to turn that kind of data in that type of architecture from a bunch of loose communication into a social network that has patterns of life, weightings of influence, and projections of probable future actions...

  10. Here’s what it looks like in an architecture…

  11. Now let’s show what can be learned with a little application of Entity-Oriented Analytics to a bunch of web data.

  12. Test Case • Web Blog+Wikipedia data (collected by Fetch) • 6M Blog URLs collected over 1Yr + • 16M unique blog messages • no unifying these, topic or author • tricky to get “good” big data from the open web. ended up using .5% of that original source. 1TB became 4GB. • No a priori structure, sparse metadata, nearly all meaning emerges from analysis • Let’s see what we can find out...

  13. Examining connections related to “Carl Icahn” • The data shows connections to and from Carl Icahn by: • people • periodicals • topics • companies On closer examination the data tells us: Carl Icahn “is backing” a startup company that “would build” products related to Barack Obama

  14. Let’s examine what connections we find to “Egypt” Egypt is identified as a location, as an organization (country) and as an unassigned entity with all related connections On closer examination we see interesting connections in the blogs for Egypt, Cairo, Issues and the phrase “powder keg”. If we drill down into the actual blog entry we see the context of the connections

  15. How about connections to “Steve Jobs”? One connection is interesting: “Steve Jobs” to “Walt Mossberg” to “Kindle” Synthesys shows the reason for connection as “pricing” Clicking on this word we see the context of the connection The entities and connections in the blog data are vast – which is not surprising. The large amount of authors and topics reflect the popularity of Steve Jobs as a blog subject Topics Authors

  16. Demo Platform • Synthesys Platform Beta • elastic • user-driven • entity-oriented-analytics on demand

  17. Observations • New innovations will be algorithmic and focused on turning hard-to-use data into dynamic, evolving knowledge that can automate machine execution • Architectures/solutions will have to accommodate customers that don’t want to move their data to a Public Cloud • It is a true statement... “If you can connect the dots, you can connect the people”

  18. So why should You care? • Because there is a lot of data that doesn‘tbelong on a shared grid. Such as Top Secret data, Sensitive Corporate Data, and Personal Data. • Because people may want to own (Personal Computing model) vs. rent (Mainframe model) analytics • Because you may not want to convert your data to fit the model of the hosted solution or map to their ontology to get the answers you need.

  19. To learn more… • See us at: • Strata Science Fair (Wed evening 6:45PM) • Digital Reasoning Booth #305 • www.digitalreasoning.com

  20. Questions? Automated Understanding, Trusted Decisions, True Intelligence

More Related