1 / 14

Analysing Crime-Scene Reports

Analysing Crime-Scene Reports. Scene of Crime Information System. Katerina Pastra and Horacio Saggion University of Sheffield. Outline. Project Overview SOCIS Architecture Corpus Linguistic Analysis Pointers. Project Overview. 2000 - 2003.

Anita
Download Presentation

Analysing Crime-Scene Reports

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysing Crime-Scene Reports Scene of Crime Information System Katerina Pastra and Horacio Saggion University of Sheffield

  2. Outline • Project Overview • SOCIS Architecture • Corpus • Linguistic Analysis • Pointers

  3. Project Overview 2000 - 2003 • Domain: Scene of Crime Investigation (SOC) • Main Features : 1. Multimedia briefing • · Summarisation of text and images • 2. Generation • · Of formal reports & of photo albums 3. Intelligent Search

  4. Project Overview (2) • Other systems for Crime Investigation: · Academic R&D Projects · Governmental agencies’ Systems · Commercial Systems BUT: SOCIS brings ‘intelligence’ to CI systems • The ‘Digital Evidence in Court’ issue: · Authenticity has to be verified ·Recently accepted in court

  5. + Image processing Text processing Integrated Knowledge Base A view of SOCIS

  6. Text Processing • - Text Corpus • - Information Extraction system • >> Named Entities Recognition • >> Co-reference Resolution • Need: • Linguistic Analysis of the Language at the SOC • Lexical Information • Morphosyntactic Information • Semantic Information

  7. The Corpus 4 days spent with a SOCO: 12 scenes visited * 2 complete case files examined * official documentation collected • Official documentation : SOC Reports = 77 Photo Indexes = 300 Witness Statements = 14 • Reported SOC Information : Press Association = 792 Washington Post = 233 Crime Watch = 8 • Reports - Photo indexes Witness statements Photographs • For the same case • For major crime • Of significant quantity NEEDED

  8. Examples

  9. SOC Language Characteristics • General Characteristics: • Telegraphic • Descriptive • Accurate • Objective Special text type : Reports

  10. Lexical Information Characteristics: - Extensive use of abbreviations - Jargon Creation of Word - Lists (gazetteers): - Based on PITO’s CDM - Over 200 lists (domain + general) Words of interest are assigned a semantic category

  11. Morphosyntactic Features • Extensive Ellipsis • Simple temporal dimensions • Limited co-ordination • Sub-ordination avoided • POS : NPs, PPs • Adjuncts of place - time, Qualifiers For identifying entities of interest automatically, we need to write specific rules using: • The word lists + Context Information

  12. Modelling (1)

  13. Modelling (2)

  14. Pointers • SOCIS Sheffield Web Page http://www.dcs.shef.ac.uk/nlp/socis  • SOCIS Surrey Web Page http://www.computing.surrey.ac.uk/ai/socis  • NLP Group http://www.dcs.shef.ac.uk/nlp

More Related