1 / 21

Writer identification through information retrieval

August 21st, 2008 ICFHR, Montreal. Ralph Niels , Franc Grootjen & Louis Vuurpijl. Writer identification through information retrieval. Writer identification through information retrieval. Ralph Niels Franc Grootjen Louis Vuurpijl. A search engine for forensic experts.

jeroen
Download Presentation

Writer identification through information retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. August 21st, 2008 ICFHR, Montreal Ralph Niels, Franc Grootjen& Louis Vuurpijl Writeridentificationthroughinformationretrieval

  2. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl A search engine for forensic experts

  3. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Overview • Forensic writer identification • Prototypical shapes in handwriting • Information retrieval (IR) • Traditional • Writer identification usingprototypes • Experiments • Method • Results • Conclusions & future work

  4. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Forensic writer identification

  5. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Forensic information retrieval • Web search: query of words to search in documents containing words • Forensic search: query of characters to search in documents containing characters • Previous work*: sub-character level, binary features • Based on characters: improves justification possibilities * A. Bensefia, T. Paquet, and L. Heutte. A writeridentification and verification system. PatternRecogn. Letters, 26(13):2080–2092, 2005.

  6. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Forensic information retrieval • Dictionary of character shapes: prototypes • Experts use prototypes • Describe query & documents by prototype usage Prototypes instances ofprototype

  7. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Character to prototype matcher • Find most similar prototype for each character a5 a9 a16 (…) W48 h16 a9 t1 y2 o1 u23 d16 i25 d12 i6 s12 (…) a52

  8. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Prototypes • Averaged shapes of real handwritten characters • Dynamic Time Warping-distance to find most similar prototype Prototypes R. Niels & L. Vuurpijl & L. Schomaker. Automaticallographmatching in forensicwriteridentification. International Journal of PatternRecognition and ArtificialIntelligence. Vol. 21, No. 1. Pages 61-81. February 2007.

  9. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q)

  10. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Indexing: create weighted vectors • Vector of prototype usage for each writer: af(w) • Adjust weight of prototypes in that vector: • Protos used by many writers: not distinctive -> lower weight • wf(p) = number of writers using proto p • Weighted vector of prototype use for each writer

  11. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q) Prototype frequency in query

  12. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q)

  13. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Matching • Input • ‘Database writers’: Indexed writer vectors aw(w) • ‘Query writer’: Vector af(q) • Match: • Calculate cosine of angle between af(q) and each aw(w) • Output • Ranked list of writers (similarity to query)

  14. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl The IR model for writer identification Character to prototype matcher Indexing Writer input af(w) aw(w) Rankedlist Matching Prototype list Justification Character to prototype matcher Query input af(q)

  15. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Justification • Similarity value (cosine of angle) • Prototype contribution to retrieval result

  16. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Justification • Forensic expert can further inspect justification

  17. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Experiment • 43 writers from plucoll database • Online data • Segmented into characters • How well does our technique perform given a certain amount of data (characters)? • Amount of characters in database (d) • Amount of characters in query (q)

  18. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Experiment • Pick drandom letters from each database writer • Pick q random other letters from one writer,and use those as query • Find most similar writer • Prototypes • iwf(p), aw(w) • Matching • Vary d and q Repeat 10 times for each comb. ofd and q Repeat 10 times for each writer

  19. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Results d q q d

  20. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl Conclusions & future work • Needed for 100%: 70 chars (q), 300 chars (d) • Average English sentence: 75-100 characters • No black box: results are justified • Online data: forensic practice? • Extract semi-automatically with help expert • Use offline matching technique • Just 43 writers • Bigger (n writers & n techniques) experiments planned • Promising results

  21. Writeridentificationthroughinformationretrieval Ralph Niels Franc Grootjen Louis Vuurpijl A search engine for forensic experts

More Related