900 likes | 1.17k Views
Paper & Pencil Interface 2000 년 3 월 30 일 KAIST 전산학과 인공지능 연구실 김 진 형 Jkim@cs.kaist.ac.kr http://ai.kaist.ac.kr/~jkim. 목 차. Paper & Pen Interface Pen Computers : Dream and Realization Pen Computing 요소 기술 Pen Computing 을 위한 인식기술 KAIST Approach Killer Applications
E N D
Paper & Pencil Interface 2000년 3월 30일 KAIST 전산학과 인공지능 연구실 김 진 형 Jkim@cs.kaist.ac.kr http://ai.kaist.ac.kr/~jkim
목 차 • Paper & Pen Interface • Pen Computers : Dream and Realization • Pen Computing 요소 기술 • Pen Computing을 위한 인식기술 • KAIST Approach • Killer Applications • Innovative Applications • KAIST Pen-based Arithmetic Tutor • Conclusion
"Paper and Pencil" Interface • 가장 자연스러운 Interface • (Probably) Between Computer and Humans • VDT, Keyboard, Mouse combination is limited • VDT syndrome : eyestrain, headaches, backaches, stiff necks, sore wrists • Mouse+keyboard vs Pen • Equation input • Drawing input • Large set Character input • 한자, 한글(?)
Pen Computer • Pen is the major input device • optional keyboard • various size and shape • Aims vertical market • Not a general purpose device • Some are already on the market • Not a big success yet • No killer applications yet
Pen Computer의 일반적인 사양 • 작고 가벼워서 들고 다닐 수 있다. • Mobile computing device, PDA • (전자)펜으로 글씨와 그림 작성, 메뉴 선택 • 무선 통신에 의한 정보 교환 • Handy internet terminal로 • MultiMedia 처리 • 글씨, 도형, 화상, 음성 • 값이 싸다.
Long History of Research • Dreams • DynaBook - Alan Kay • Knowledge Navigator - Apple Co. • Tablet - U. of Illinois Undergrads • Projects • Pattern Information Processing (Japan) • Electronic Paper (ESPRIT) • IBM, Sony, etc.
NeoPoint • Read and send email • Access Internet • Popular PDA programs • Contact • Schedule • To do • Sync data with PC • Wake-up-call
E-book : Rocket Book • Download book contents • Touch-Screen based interaction • Make note, annotation, book mark, under lining • No handwriting yet • Too expensive • No other usage • Need Internet Access
전자부품연구원 ebook • 인터넷에 접속 정보검색 • 문서작성(?) • 8.2인치 터치 스크린 • 무게: 1㎏ • 배터리 사용시간 : 8시간 • PDA 4배크기 칼라 화면 • 저장 : 반도체 메모리 • 중소기업과 협동작업
PenComputer 요소 기술 • Hardware 기술 • Flat Panel Display • Pen/Digitizer • MicroProcessor • Battery and Power management • Storage Devices • Packaging • Software 기술 • Wireless Communications • Operating System • Handwriting recognition • Utilities and Accessories • Applications
Pen Computing을 위한 인식 문제 • 인식 대상 (online) • Menu 선택 • Characters (한글, 영문자, 한자, 숫자, 특수문자) • Drawings, Gestures • 영문 인식 시스템 시장에 출현 • Limited Capability • Printed style Only : C+ • Cursive Style : C- • 적극적인 활용의 장애 요인 • Pattern Recognition 학계의 활발한 연구 주제
필기 문자 • A Sequence of some writing units • Temporally ordered • (mostly) left-to-right
필기 문자 인식 • Source of Difficulty • Static Variability - personal style • Dynamic Variability - shape deviation • Stroke connection - coarticulation effect • Problems to solve Free-Writing • Variability Modeling • simple model for high flexibility • Resolve coarticulation • segmentation problem
Non-Roman Character Recognition • 동양언어권에서의 펜 인식 요구가 강함 • 일본의 PDA products • 상자안에 쓴 KANA • 또박 또박 쓴 한자 • 상자안에 쓴 영문자 • 중국인들의 노력 • 한국에서의 노력 • 한글 실용화 수준에 도달 • KAIST에서 수년간 연구
Printed All alphabet separated One Syllable in a box Cursive Ligature within Syllable One Syllable in a box Handwritten Hangul Styles Cursive Ligature within Syllable Syllables may Overlab Spatially Cursive Ligature over Syllables
Approaches for Recognizer Development • Knowledge-based Approach • Structural / Feature based • heuristics / Fuzzy • Encoding of expert knowledge • Data-driven Approach • Decision Theoritic • Artifical Neural Network • Hidden Markov Model • Training procedure
Recognizer vs. Recognizer Generator Give a man a fish, and he’ll eat for a day Teach him to fish, and he’ll eat for a lifetime - Laotse
KAIST Online 필기문자 인식 연구 • 한글 인식기(2 개의 국내 특허) • 무제한 필기 형태, 백지위에 연속 필기 가능 • 인식률 95+% • 무제한 연속 필기 영문자 단어(US Patent) • Boxed, Run-on, unconstrained cursive word • 약 88% 의 인식률 • 한자 인식기 • Gesture Recognizer • 응용 연구 • 도형편집기 • Arithmetic Tutoring System 등
필기 문자 인식을 위한 KAIST의 HMM 기반 방법론 • Variability is modeled with HMM • Alphabet as character model • Stroke connection as ligature model • as a separate entity • Viewing handwritten word as an alternating sequence of character model and ligature model • Network of HMM • knowledge of language utilized • Hierarchy of Networks • component - character - word - sentence
Hidden Markov Model • Stochastic model of process with uncertain and incomplete information • Doubly stochastic process • transition parameters model temporal variability • output distribution model spatial variability • Efficient and good modeling tool for • sequences with temporal constraints • spatial variability along the sequence • real world complex processes • Highly successful for speech recognition
Why HMM for Handwritten Character and Ligature ? • HMM is doubly stochastic model • Spacial variation and temporal variation • Well developed Search Algorithm • Viterbi algorithm • Welll developed training algorithm • Baum-Welsh algorithm • A model is represented by either a path of the HMM or the set of all paths of the HMM
HMMThree Problems • What is the probability of generating an observation sequence? • calculating the model-input matching score, i.e, likelihood. P [ X = x1, x2, ..., xT | ] = ? • What is the most probable transition sequence? Q* = argmax P [ Q, X | ] • How do we estimate or optimize the parameters in an HMM? • training problem
Model of Handwritten Word • View Handwritten word as alternating sequence of characters and ligatures • Handwritten words are modeled as Network of character and ligature models • Ligature as separate entity
HMM for Character and Ligature • Character and ligature are viewed as sequence of chain codes • Each character and ligature model as Hidden Markov Model • Produce Probability as its score ligature model character model
Character Model • Character • atomic units of handwriting • consistency in shape • small number of models • Characrter Model • HMM-based • model variability in time(length) and shape • simple left-to-right model • small number of states ( < 10)
Ligature Model • Ligature • Between-letter stroke pattern • pen-up or pen-down dragging • linear or slightly curved • Ligature Model • HMM-based • represent connecting pattern and variation • simple model structure (1 - 3 states)
Input Data Encoding • 16 directional chain coding • 0 ~ 15 for pen-down movement • 16 ~ 31 for pen-up movement • 32 for small dot 3 2 18 1 17 0 16 31 15 14 30 pen down movement pen up movement ==> (11, 11, 13, 15, 2, 22, 22, 22, 1, 1, 1) down up down
Hidden Markov Modelling • We are searching the max. probability model M* such that P(M*|X) = max P(X|M) P(M) M* = argmax P(X|M) P(M) • HMM produces P(X|M) which is interpreted as the degree of matching between model M and given data sequence X M M
Network Approach for Word Recognition • One HMM for each word is unmanageable • HMMs are interconnected • represents word construction rules from characters and ligatures (English) • represents syllable construction rules from phonetic symbols and ligatures (Hangul) • Node represents start and termination of chain code sequence of the character • Arc represents character HMM
English Word Network • Writing sequence assumed • left-to-right, as the sequence of appearance • Delayed strokes are allowed • Circular Network • Initial node for the start of word • Final node for the end of word • Circular path thru ligature arcs • Based on character HMM • Special treatment for delayed stroke • Ligature grouping • based on starting and terminating location
Hangul Syllable Network • Writing sequence assumed • first consonant, vowel, last consonant if exist • Layered network • Initial node for the start of syllable • Final node for the end of syllablle • Based on Symbol HMM • Ligature grouping • based on starting and terminating location • Null Transition if no last consonant exist
Initial Node Final Node BongNet : Hangul Syllable Model Consonant Ligature Vowel Ligature Consonant
Recognition Problem • A Path corresponds to a Hangul Syllable / English word • Complete sequence of states and arcs from initial node to final node • Recognition • Finding the maximal probability path for given input chain code sequence • Yields optimal segmentation and character label, simultaneously
Model Training Procedure 1) Collect Handwritings with correct label 2) Preprocessing and encoding 3) Manual segmentation 4) Collection of each character and ligature ligature grouping to reduce the number of models 5) Estimating Model parameters Baum-Welch algorithm
Unconstained English Word Recognition (Segmentation Result)
Advantages of Network Approach • Segmentation and character labels obtained simultaneously • No external segmentation needed • Segmentation is obtained from global view point • Hierarchy of HMM can be constructed • Handwritten sentences with word model and interword gap model • Smooth Integration with postprocessing / language models • Framework for unified recognizer of multiple languages