1 / 23

Natural Language Processing for Underground Communications

Natural Language Processing for Underground Communications. Dan Klein MURI Kickoff, 11/20/2009. Underground Communications. Example Data. Underground Communications. Example Data, Manual Extraction. Processing: Information Extraction. Observation Graphs. http://www.rossmail.ru/offline.htm.

markku
Download Presentation

Natural Language Processing for Underground Communications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing for Underground Communications Dan Klein MURI Kickoff, 11/20/2009

  2. Underground Communications Example Data

  3. Underground Communications Example Data, Manual Extraction

  4. Processing: Information Extraction

  5. Observation Graphs http://www.rossmail.ru/offline.htm http://www.f-mail.ru/kontact/ http://www.spam-reklama.ru/contact.html http://www.fax-reklama.ru/contact.html

  6. Underlying Entities and Relations Employee Person: Person 9876 Product: 5621 Role: Developer Referral From: Person 2133 To: Person 1211 Product: 3319 Person 1211 Alias: Steakcap ICQ: 598199837 Location: France Person 2133 Alias: Thunderelvi ICQ: 787659871 Location: USA Person 9876 Alias: Zakar ICQ: 234150301 Email: zakar@e-... Product 3319 Type: FB Harvester Contact: 709-324-0989 Product 5621 Type: Spam Sender Contact: 495-210-4423 Extraction Goal

  7. Existing NLP Tasks

  8. Discourse Structure sign deliver vote

  9. General Approach

  10. An Entity Reference Model Our Existing Approach

  11. America Onlinecompany Adding Semantic Knowledge Our Current Work

  12. Does it Work? Evaluation: Reference Unsupervised MUC F1 -Cluster Similarity Supervised Unsupervised Baseline Bengston & Roth 08 Preliminary Current Work

  13. What’s Coming Up Cross-Document Identity

  14. Extracting Global Entities

  15. Subsequent Goals Underlying Entities and Relations Employee Person: Person 9876 Product: 5621 Role: Developer Referral From: Person 2133 To: Person 1211 Product: 3319 Person 1211 Alias: Steakcap ICQ: 598199837 Location: France Person 2133 Alias: Thunderelvi ICQ: 787659871 Location: USA Person 9876 Alias: Zakar ICQ: 234150301 Email: zakar@e-... Product 3319 Type: FB Harvester Contact: 709-324-0989 Product 5621 Type: Spam Sender Contact: 495-210-4423

  16. Summary • Goal: systems which simultaneously extract and dedupe • Train in an unsupervised / discovery manner • Requires: both new statistical machinery and good models of underlying domain structure (transactions, etc) • Requires: processing domain-specific language (domain adaptation, grammar induction) • Evaluation: are the entities and relations correct? • First steps: measure general approach on newswire, etc. where we know the right answers • Also: evaluate on underground network data • Near term: increased accuracy in identity resolution, begin to extract simple relations, better basic analysis

  17. Thanks!

More Related