240 likes | 388 Views
August 21st, 2008 ICFHR, Montreal. Ralph Niels , Franc Grootjen & Louis Vuurpijl. Writer identification through information retrieval. Writer identification through information retrieval. Ralph Niels Franc Grootjen Louis Vuurpijl. A search engine for forensic experts.
E N D
August 21st, 2008 ICFHR, Montreal Ralph Niels, Franc Grootjen& Louis Vuurpijl Writeridentificationthroughinformationretrieval
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl A search engine for forensic experts
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Overview • Forensic writer identification • Prototypical shapes in handwriting • Information retrieval (IR) • Traditional • Writer identification usingprototypes • Experiments • Method • Results • Conclusions & future work
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Forensic writer identification
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Forensic information retrieval • Web search: query of words to search in documents containing words • Forensic search: query of characters to search in documents containing characters • Previous work*: sub-character level, binary features • Based on characters: improves justification possibilities * A. Bensefia, T. Paquet, and L. Heutte. A writeridentification and verification system. PatternRecogn. Letters, 26(13):2080–2092, 2005.
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Forensic information retrieval • Dictionary of character shapes: prototypes • Experts use prototypes • Describe query & documents by prototype usage Prototypes instances ofprototype
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Character to prototype matcher • Find most similar prototype for each character a5 a9 a16 (…) W48 h16 a9 t1 y2 o1 u23 d16 i25 d12 i6 s12 (…) a52
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Prototypes • Averaged shapes of real handwritten characters • Dynamic Time Warping-distance to find most similar prototype Prototypes R. Niels & L. Vuurpijl & L. Schomaker. Automaticallographmatching in forensicwriteridentification. International Journal of PatternRecognition and ArtificialIntelligence. Vol. 21, No. 1. Pages 61-81. February 2007.
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q)
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Indexing: create weighted vectors • Vector of prototype usage for each writer: af(w) • Adjust weight of prototypes in that vector: • Protos used by many writers: not distinctive -> lower weight • wf(p) = number of writers using proto p • Weighted vector of prototype use for each writer
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q) Prototype frequency in query
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q)
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Matching • Input • ‘Database writers’: Indexed writer vectors aw(w) • ‘Query writer’: Vector af(q) • Match: • Calculate cosine of angle between af(q) and each aw(w) • Output • Ranked list of writers (similarity to query)
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q)
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Justification • Similarity value (cosine of angle) • Prototype contribution to retrieval result
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Justification • Forensic expert can further inspect justification
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Experiment • 43 writers from plucoll database • Online data • Segmented into characters • How well does our technique perform given a certain amount of data (characters)? • Amount of characters in database (d) • Amount of characters in query (q)
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Experiment • Pick drandom letters from each database writer • Pick q random other letters from one writer,and use those as query • Find most similar writer • Prototypes • iwf(p), aw(w) • Matching • Vary d and q Repeat 10 times for each comb. ofd and q Repeat 10 times for each writer
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Results d q q d
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Conclusions & future work • Needed for 100%: 70 chars (q), 300 chars (d) • Average English sentence: 75-100 characters • No black box: results are justified • Online data: forensic practice? • Extract semi-automatically with help expert • Use offline matching technique • Just 43 writers • Bigger (n writers & n techniques) experiments planned • Promising results
Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl A search engine for forensic experts