200 likes | 318 Views
Automatic indexing and retrieval of crime-scene photographs. Scene of Crime Information System (SOCIS). Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield. Outline. Application Scenario Project Overview SOCIS features Text-based approaches
E N D
Automatic indexing and retrieval of crime-scene photographs Scene of Crime Information System (SOCIS) Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield
Outline • Application Scenario • Project Overview • SOCIS features • Text-based approaches • Using NLP: • The Indexing mechanism • The Retrieval mechanism • Preliminary system evaluation • Links Cambridge 2002
Crime Scene Documentation:Current Practices • Scene of Crime Officers: • attend crime scene • photograph the scene • collect evidence (package and label items) • write reports and create indexed photo-album(s) • case-files piled in storage rooms Cambridge 2002
Examples Cambridge 2002
IT support for CSI • Crime Investigation requires: • Fast and accurate retrieval of case-related info (and therefore efficient classification of this info) • Identification of “patterns” among cases • IT support for Crime Investigation: • Governmental agencies’ Systems (HOLMES) • Commercial Systems (LOCARD, SOCRATES) (Crime Management and Administration Systems) Needed: “Intelligent”support for Crime Investigation Cambridge 2002
2000 - 2003 Project Overview • Domain: Scene of Crime Investigation (SOC) • Scenario: Use of digital photography and speech to populate a central police database with case related information • Objective: Creation of a prototype system that allows for intelligent indexing and retrieval of crime photographs Cambridge 2002
SOCIS features • Access through the web (JSP application) • Storage of case documentation & meta-information in central database • Automatic indexing of photographs • Automatic retrieval of photographs • Automatic population of official forms Cambridge 2002
“view of deceased with computer cable removed” Cambridge 2002
Text-based image indexing & retrieval: approaches • Manual assignment of keywords • Automatic extraction of keywords (statistics +/ semantic expansion) [Smeaton’96, Sable’99, Rose’00] • Extraction of logical form representations (syntactic relations and concept classification) [Rowe’99] Precision and recall increase as indexing terms go beyond keywords capturing relational info Cambridge 2002
Text-based image indexing & retrieval: problems • keyword barrier • syntactic relations need to be complemented with semantic information • Consider: • “view to the loft” vs. “view into loft” • “position of baby with no bedding” • “position of baby with bedding removed” Cambridge 2002
Pipeline of processing resources: tokeniser sentence splitter POS tagger lemmatizer NE recognizer parser discourse interpreter (+ triple extraction layer) Indexing terms Query triples ARG1 REL ARG2 ARG1 REL ARG2 Indexing-Retrieval Mechanism captions matching OntoCrime + KB Free text query Cambridge 2002
Corpus and Domain Model • 1200 captions from 350 different crime cases dealt by South Yorkshire Police (text files) • 65 captions (transcribed speech experiment) Different lengths but same characteristics: Phrasal constructions, named entities, meta-info, what and where references Domain model = OntoCrime and knowledge base Role = selection restrictions for triples’ arguments and semantic expansion for retrieval Cambridge 2002
Triple Extraction • 17Relations : AND, AROUND, MADE-OF, OF, ON, WITHOUT etc. • Form of triples: ARG1 REL ARG2 • Restrictions and filters for arguments • Rules for captions with multiple relations • Inferences restricted to certain cases Cambridge 2002
Triple Extraction examples • “body on floor surrounded by blood” Body ON floor blood AROUND floor blood AROUND body • “shot of footprint on top of bar” • “photograph from behind bar of body on floor” • “bottle, gun and ashtray on table” • “footprint with zigzag and target on chair” Cambridge 2002
Class: Class: Retrieval Mechanism • Allow for free text query • Extract relational facts from the query • Match the query triples with the indexing triples of each captioned photograph • Allow for exact match of arguments or class info ARG1, RELATION, ARG2 • If no triples can be extracted, keyword matching takes place with semantic expansion if needed Cambridge 2002
Preliminary Evaluation • Indexing mechanism evaluation run on 600 captions indicated refinements on the rules (80% accuracy in extracting and inferring triples) • Preliminary usability evaluation with real users: Relational information considered to be an intuitive way for forming queries for image retrieval • Future work: overall evaluation of free text query for image retrieval Cambridge 2002
Conclusions • Could the SOCIS approach be ported to other domains ? • Thorough testing and experimentation needed • However, it is a corpus-driven approach: Not just an alternative image indexing/retrieval approach,but the one dictated by a real application For more information on SOCIS: http://www.dcs.shef.ac.uk/nlp/socis Cambridge 2002