1 / 18

Simultaneous detection of vertical and horizontal text lines based on perceptual organisation

Simultaneous detection of vertical and horizontal text lines based on perceptual organisation. Claudie Faure CNRS-LTCI TELECOM-ParisTech. Nicole Vincent CRIP5 Université Paris Descartes. Context. Historical Medical Digital Library: Medic@

azize
Download Presentation

Simultaneous detection of vertical and horizontal text lines based on perceptual organisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simultaneous detection of vertical and horizontal text lines based on perceptual organisation Claudie Faure CNRS-LTCI TELECOM-ParisTech Nicole Vincent CRIP5 Université Paris Descartes

  2. Context • Historical Medical Digital Library: Medic@ BIUM: Bibliothèque InterUniversitaire Médicale http://www.bium.univ-paris5.fr/histmed/medica.htm • Document image analysis • Information search • Visualisation of the collections

  3. The readers’ needs • Find documents • Find information in the documents • Textual information • Visual information (illustrations, decoration, drop caps, …)

  4. Figure&Caption detection Detection of vertical and horizontal text lines Origins of the method: • The caption lines • Perceptual grouping

  5. Preprocessing Web image Binarisation Connected components

  6. Graphics segmentation • Size: graphics (CCG) • Shape: rules, frames • Location: merge CCG • Text components (?)

  7. NNE NNS Labelled connected components(1) Ex. 1 Ex. 2 Grouping by proximity

  8. Labelled connected components(2) • Complementary labels: • No East neighbour • No South neighbour • Dot •  Nearest neighbour of several CCs

  9. 1. Creating alignments: CC labels 2. Expanding alignments 3. Merging alignments Incremental grouping process: • Easier to control • No jump from local CCs to Text lines • Several levels of decision Grouping by proximity and continuity of direction

  10. Conflicts ConflitHV: CC.Vline  null AND CC.Hline  null Vline is eliminated if : #CCs in Vline (6) < #CCs in Hlines (13)

  11. Typographic conditions • Word spacing Character size D D > 2 * lineHeight • Continuity h1 h2 < 1.5 * h1 h2 h1 h2 > 1.5 * h1 h2

  12. Separators Text lines do not straddle separators

  13. Caption lines For each side of a Figure: the nearest text line Confidence of text lines: +1 for the closest line to the Figure +1 if the Figure and the text line are centred Caption line candidates: confidence > 0

  14. Results • 52 pages with vertical and horizontal lines • Web images (different sizes and resolutions) Medic@ • 22 books (XIX century) • First caption lines (102) • 31 horizontal lines - 71 vertical lines

  15. Conclusion • How do readers detect text lines? • Perceptually-based method • Reliable results • Material for further investigations • How do readers associate Figure and Caption? • Spatial reasonning • Visual contrast • Word spotting (« Fig---- »)

More Related