1 / 22

Improving Interpretive Interfaces for Math Entry

This project aims to enhance theory and tools for pattern recognition systems, focusing on document recognition and pen-based computing issues. The current direction includes improving integration tools, game-theoretic models, and machine learning algorithms. Specifically, the focus is on pen and image-based math entry challenges, OCR, CAPTCHAs, and table recognition. The goal is to develop interpretive interfaces that can accurately interpret math symbols in a variety of contexts.

Download Presentation

Improving Interpretive Interfaces for Math Entry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Interpretive Interfaces for Math Entry Richard Zanibbi Department of Computer Science Rochester Institute of Technology

  2. RIT Document and Pattern Recognition Lab (DPRL) • Goals: • Improve theory and tools for constructing and evaluating pattern recognition systems • Apply these to problems in document recognition and pen-based computing • Members: • Richard Zanibbi • Kurt Kluever (Master’s student) • New members welcome! • http://www.cs.rit.edu/~rlaz/dprl.html

  3. Current Directions: • Theory and Tools: • Tools for recognition module integration and evaluation, such as the Recognition Strategy Language (Zanibbi et al.) • Game-theoretic models of recognition problems and systems (e.g. for classifier combination) • Machine learning algorithms for system optimization • 2. Applications: • Pen and image-based math entry (lab maintains open-source Freehand Formula Entry System(Smithies, Novins, Arvo, Zanibbi et al.) • Optical character recognition (OCR) • Image and text-based document retrieval • “CAPTCHAs” (for distinguishing humans from 'bots’) • Table recognition, etc.

  4. Interpretive Interfaces for Math Entry

  5. Pen-Based Math Entry • Recognition Challenges • Large number (e.g. > 500 in LaTeX) of symbols, many similar in structure (e.g. 0 and O) • Layout of symbols on baselines can be ambiguous • Little redundancy • Context influences symbol identity and layout interpretation

  6. Example:Freehand Formula Entry System/DRACULAE • Contributors: • FFES first developed as an MSc project at University of Otago (Smithites, Novins), New Zealand, using CIT tools of Jim Arvo et al. in 1998 • Since then, contributors from Queen’s University (CA), Concordia University (CA), and around the world (CMU, UC Berkley, Companies and non-profits in California and France)

  7. DRACULAE (Zanibbi, 2002) • “Diagram Recognition Application for Computer Understanding of Large Algebraic Expressions”

  8. DRACULAE:Layout Classes for Symbols • Symbol name defines class membership.

  9. DRACULAE Layout Analysis: Sketch • Algorithm: • Symbols assigned layout type (class) based on symbol identity • Sort symbols left-right on leftmost edge of Bounding Box • Create baseline structure tree with region node “Expression” • Recursively: • Search right-to-left, locate the leftmost (“start”) baseline (dominance rules for symbol layout class pairs) • From start symbol, search left-right in symbol list for symbols adjacent on baseline (**Zhang: fuzzy version) • Add baseline symbols as children of parent region node • Place non-baseline symbols in lists associated with region nodes (e.g. for super/subsc/bleft etc.) • Apply a-d to each new region, until no new regions created

  10. Expanding the View… • Integration of scanned and pen-based expressions • Infty system, FFES prototype (impl. Josh Zimler 2006) • Long Term Goal: Flexible input and combination • Allow one to easily combine and then reformat/interpret • LaTeX, eqn, etc. • MATLAB, Mathematica, etc. • Handwritten expressions (tablet/mouse) • Scanned images of handwritten or typeset expressions • “Vector drawing” interface input, e.g. as in Xpress (Pollanen et al.)

  11. Other Math Entry Interfaces • Natural Log by Matsakis, Miller, and Viola (MIT) • JIMHR: (Java-Based) Interactive Math Handwriting Recognizer, a merge and port of FFES/DRACULAE and the Natural Log system by Joy-Gong Ho (Acuitus Corp., USA) • JMathNotes by Ernesto Tapia Rodriguez (Free University of Berlin) • Infty by M. Suzuki et. al. (Kyushu University, Japan) • MathJournal by XThink Inc: first commercial pen-based math recognition system • MathPad by Joseph LaViola • Links available: http://www.cs.rit.edu/~rlaz

  12. The Recognition Strategy Language (RSL)

  13. Motivation: A high-level language for pattern recognition algorithms • Table Recognition Survey (Zanibbi et al. 2004) • Summarizes literature in terms of observations, transformations, and inferences. • Techniques studied characterized as making the follow types of inferences (decisions): • Parameter values (e.g. thresholds) • Interpretation Model Operations: • Segmentation (identifying regions of interest in data) • Classification (assigning types to regions) • Relating regions (e.g. topology (adjacencies)) • Rejecting segments, classes, and region relationships • (Unanswered) Question: • How should we combine recognition modules in a complex math entry system?

  14. Example: Simple Table Structure Recognition Algorithm (Part 1) • model regions • Image Word Cell % default:’Region’ • Row Column • end regions • model relations • % default:’contains’ • adjacent_right adjacent_below • end relations • recognition parameters • sMaxRowSeparation 2 % millimetres • sMaxColumnSeparation 2 % millimetres • aResolution 300 % dpi; default • end parameters

  15. strategy main adapt aResolution using getScanResolution() observing {Image} regions classify {Word} regions as {Cell} relate {Cell} regions with {adjacent_right} using defineRightAdjacency(sMaxRowSeparation,aResolution) segment {Cell} regions into {Row} regions using relationClosure() observing {adjacent_right} relations relate {Cell} regions with {adjacent_below} using defineLowerAdjacency(sMaxColSeparation,aResolution) segment {Cell} regions into {Column} regions using relationClosure() observing {adjacent_below} relations accept interpretations end strategy External Decision Function Observation Specification Decision type Trivial Decision Decision Function Parameters Input: Params, Graph with Image, Word regions (BBs) Output: Cells, Rows, Cols

  16. Running RSL Programs • Translate RSL Program to TXL (Using TXL) • Pass Input Graph (text file) to Program • Output (text files): • Accepted Structures (interpretations) • Log of all decisions and their outcomes

  17. New Metrics Based on Hypothesis Histories: Historical Recall and Precision False Negatives ( F ) Generated Hypotheses: ( A U R ) Recognition Targets: Correct Hypotheses

  18. Hypothesis History

  19. *Inference times shown are those affecting cells Cell Detection Results (Handley, 2001) RSL Re-implementation on Table ‘a038’ (UW-III) • 0: Input (words and lines) • 1: Classify words as cells • 16: Merge ‘horizontally close’ cells • 35: Merge cells sharing column, row assignments. Nearly 50% of correct cells rejected; new correct cells also detected • 47: Two cells merged producing column header ‘Total pore space (percent)’ • 51: Merge header cells bounded by two horizontal lines • 83: Merge cells sharing line and white space separators

  20. RSL and Math Entry • Proposal: “MIN” System • New interface for math entry and offline experiments • Use RSL to define recognition strategies, capture results. • (Really): testbed for studying recognition algorithms and their intelligent combination, organization, and deployment in practice. • Goals: • Compare different approaches to recognizing mathematical expressions (from input to output) represented in RSL • Allow flexible training, combination, and alteration of various recognition strategies. • Extend RSL to accommodate math and other problem domains more effectively, while remaining abstract

  21. (Some) Relevant Journals and Conferences • Journals • IEEE Trans. Pattern Analysis and Machine Intelligence • Machine Learning • Pattern Recognition • Pattern Recognition Letters • Artificial Intelligence • Int’l J. Document Analysis and Recognition • … • Conferences • Int’l Conf. Machine Learning • IEEE Computer Vision and Pattern Recognition • Computational Learning Theory (COLT) • Int’l Conf. Document Analysis and Recognition • Int’l Work. Document Analysis Systems • …

  22. Thank you. • Questions? • Support: GCCIS Department of Computer Science

More Related