220 likes | 235 Views
This project aims to enhance theory and tools for pattern recognition systems, focusing on document recognition and pen-based computing issues. The current direction includes improving integration tools, game-theoretic models, and machine learning algorithms. Specifically, the focus is on pen and image-based math entry challenges, OCR, CAPTCHAs, and table recognition. The goal is to develop interpretive interfaces that can accurately interpret math symbols in a variety of contexts.
E N D
Improving Interpretive Interfaces for Math Entry Richard Zanibbi Department of Computer Science Rochester Institute of Technology
RIT Document and Pattern Recognition Lab (DPRL) • Goals: • Improve theory and tools for constructing and evaluating pattern recognition systems • Apply these to problems in document recognition and pen-based computing • Members: • Richard Zanibbi • Kurt Kluever (Master’s student) • New members welcome! • http://www.cs.rit.edu/~rlaz/dprl.html
Current Directions: • Theory and Tools: • Tools for recognition module integration and evaluation, such as the Recognition Strategy Language (Zanibbi et al.) • Game-theoretic models of recognition problems and systems (e.g. for classifier combination) • Machine learning algorithms for system optimization • 2. Applications: • Pen and image-based math entry (lab maintains open-source Freehand Formula Entry System(Smithies, Novins, Arvo, Zanibbi et al.) • Optical character recognition (OCR) • Image and text-based document retrieval • “CAPTCHAs” (for distinguishing humans from 'bots’) • Table recognition, etc.
Pen-Based Math Entry • Recognition Challenges • Large number (e.g. > 500 in LaTeX) of symbols, many similar in structure (e.g. 0 and O) • Layout of symbols on baselines can be ambiguous • Little redundancy • Context influences symbol identity and layout interpretation
Example:Freehand Formula Entry System/DRACULAE • Contributors: • FFES first developed as an MSc project at University of Otago (Smithites, Novins), New Zealand, using CIT tools of Jim Arvo et al. in 1998 • Since then, contributors from Queen’s University (CA), Concordia University (CA), and around the world (CMU, UC Berkley, Companies and non-profits in California and France)
DRACULAE (Zanibbi, 2002) • “Diagram Recognition Application for Computer Understanding of Large Algebraic Expressions”
DRACULAE:Layout Classes for Symbols • Symbol name defines class membership.
DRACULAE Layout Analysis: Sketch • Algorithm: • Symbols assigned layout type (class) based on symbol identity • Sort symbols left-right on leftmost edge of Bounding Box • Create baseline structure tree with region node “Expression” • Recursively: • Search right-to-left, locate the leftmost (“start”) baseline (dominance rules for symbol layout class pairs) • From start symbol, search left-right in symbol list for symbols adjacent on baseline (**Zhang: fuzzy version) • Add baseline symbols as children of parent region node • Place non-baseline symbols in lists associated with region nodes (e.g. for super/subsc/bleft etc.) • Apply a-d to each new region, until no new regions created
Expanding the View… • Integration of scanned and pen-based expressions • Infty system, FFES prototype (impl. Josh Zimler 2006) • Long Term Goal: Flexible input and combination • Allow one to easily combine and then reformat/interpret • LaTeX, eqn, etc. • MATLAB, Mathematica, etc. • Handwritten expressions (tablet/mouse) • Scanned images of handwritten or typeset expressions • “Vector drawing” interface input, e.g. as in Xpress (Pollanen et al.)
Other Math Entry Interfaces • Natural Log by Matsakis, Miller, and Viola (MIT) • JIMHR: (Java-Based) Interactive Math Handwriting Recognizer, a merge and port of FFES/DRACULAE and the Natural Log system by Joy-Gong Ho (Acuitus Corp., USA) • JMathNotes by Ernesto Tapia Rodriguez (Free University of Berlin) • Infty by M. Suzuki et. al. (Kyushu University, Japan) • MathJournal by XThink Inc: first commercial pen-based math recognition system • MathPad by Joseph LaViola • Links available: http://www.cs.rit.edu/~rlaz
Motivation: A high-level language for pattern recognition algorithms • Table Recognition Survey (Zanibbi et al. 2004) • Summarizes literature in terms of observations, transformations, and inferences. • Techniques studied characterized as making the follow types of inferences (decisions): • Parameter values (e.g. thresholds) • Interpretation Model Operations: • Segmentation (identifying regions of interest in data) • Classification (assigning types to regions) • Relating regions (e.g. topology (adjacencies)) • Rejecting segments, classes, and region relationships • (Unanswered) Question: • How should we combine recognition modules in a complex math entry system?
Example: Simple Table Structure Recognition Algorithm (Part 1) • model regions • Image Word Cell % default:’Region’ • Row Column • end regions • model relations • % default:’contains’ • adjacent_right adjacent_below • end relations • recognition parameters • sMaxRowSeparation 2 % millimetres • sMaxColumnSeparation 2 % millimetres • aResolution 300 % dpi; default • end parameters
strategy main adapt aResolution using getScanResolution() observing {Image} regions classify {Word} regions as {Cell} relate {Cell} regions with {adjacent_right} using defineRightAdjacency(sMaxRowSeparation,aResolution) segment {Cell} regions into {Row} regions using relationClosure() observing {adjacent_right} relations relate {Cell} regions with {adjacent_below} using defineLowerAdjacency(sMaxColSeparation,aResolution) segment {Cell} regions into {Column} regions using relationClosure() observing {adjacent_below} relations accept interpretations end strategy External Decision Function Observation Specification Decision type Trivial Decision Decision Function Parameters Input: Params, Graph with Image, Word regions (BBs) Output: Cells, Rows, Cols
Running RSL Programs • Translate RSL Program to TXL (Using TXL) • Pass Input Graph (text file) to Program • Output (text files): • Accepted Structures (interpretations) • Log of all decisions and their outcomes
New Metrics Based on Hypothesis Histories: Historical Recall and Precision False Negatives ( F ) Generated Hypotheses: ( A U R ) Recognition Targets: Correct Hypotheses
*Inference times shown are those affecting cells Cell Detection Results (Handley, 2001) RSL Re-implementation on Table ‘a038’ (UW-III) • 0: Input (words and lines) • 1: Classify words as cells • 16: Merge ‘horizontally close’ cells • 35: Merge cells sharing column, row assignments. Nearly 50% of correct cells rejected; new correct cells also detected • 47: Two cells merged producing column header ‘Total pore space (percent)’ • 51: Merge header cells bounded by two horizontal lines • 83: Merge cells sharing line and white space separators
RSL and Math Entry • Proposal: “MIN” System • New interface for math entry and offline experiments • Use RSL to define recognition strategies, capture results. • (Really): testbed for studying recognition algorithms and their intelligent combination, organization, and deployment in practice. • Goals: • Compare different approaches to recognizing mathematical expressions (from input to output) represented in RSL • Allow flexible training, combination, and alteration of various recognition strategies. • Extend RSL to accommodate math and other problem domains more effectively, while remaining abstract
(Some) Relevant Journals and Conferences • Journals • IEEE Trans. Pattern Analysis and Machine Intelligence • Machine Learning • Pattern Recognition • Pattern Recognition Letters • Artificial Intelligence • Int’l J. Document Analysis and Recognition • … • Conferences • Int’l Conf. Machine Learning • IEEE Computer Vision and Pattern Recognition • Computational Learning Theory (COLT) • Int’l Conf. Document Analysis and Recognition • Int’l Work. Document Analysis Systems • …
Thank you. • Questions? • Support: GCCIS Department of Computer Science