Table Extraction Using MaxEnt

Table Extraction Using MaxEnt Zonghui Lian

Introduction • Table extraction • Table format

Problem • HTML table • Tags can help us to understand it • How about plain text table?

title title title separator header header header header datarow datarow datarow datarow datarow datarow An Example

How to define features How to learn model weights MaxEnt

Data Set • CS dept university of Massachusetts Amherst (FedStats.gov) • Training data: 9321 Test data: 1200 • Format

Features • White space • Large gaps /Small gaps • Four space indents • Space percentage • Text feature • Digit percentage • Month and year

Features • Special characters -, +, =, :, |, .

Result

TABLEFOOTNOTE -> NONTABLE DATAROW DATAROW -> SECTIONDATAROW TABLEHEADER -> SUPERHEADER Most error happened when recognizing … [TABLEFOOTNOTE : 0.2719665271966527 DATAROW : 0.12552301255230125 TABLEHEADER : 0.11715481171548117 Error Analysis TABLEFOOTNOTE 1 Includes Hawaii. TABLEFOOTNOTE 2 Includes processing total for dual usage crops.

Future Work • Improve the performance • Features For example Alphabet characters Previous label Next label • Data set size

Future Work • Identity columns • Add tags • Use table understanding algorithm

Table Extraction Using MaxEnt

Table Extraction Using MaxEnt

Presentation Transcript

Sentiment Analysis + MaxEnt *

EOR signal extraction using skewness

More on Maxent

MaxEnt POS Tagging

MaxEnt : Training, Smoothing, Tagging

Event Extraction Using Distant Supervision

Maxent

A MAXENT viewpoint

MaxEnt 2007

Fast Submatch Extraction using OBDDs

Information Geometry of MaxEnt Principle

More on Maxent

Information extraction from web pages using extraction ontologies

Building a Maxent Model

Object Extraction using Segmentation

Text Extraction using Regular Expressions

Table Extraction Using Conditional Random Fields

Data Extraction using Image Similarity

Wavelet Extraction using Seismic Interferometry

Information extraction from web pages using extraction ontologies

Text Extraction using Regular Expressions