Automatic Extraction and Incorporation of Purpose Data into PurposeNet

P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI Automatic Extraction and Incorporation of Purpose Data into PurposeNet

INTRODUCTION Purpose Need for a knowledge base of objects and actions in which the knowledge is organized around purpose.

PurposeNet PurposeNet is an intelligent knowledge-based system dealing with specialized attributes of artifacts – namely, their purpose, purpose of their types, components, accessories, as also data about their birth, processes, side-effects, maintenance and result on destruction.

PurposeNet

Building the PurposeNet Template Designing Revision & Refinement of template Selection of Domain Information Retrieval from Web Ontology population Testing

Need for Automation Acquisition bottleneck Massive availability of text Availability of purpose cues

Purpose data required Artifact -- garage Purpose Action -- store Upon -- vehicle

Purpose Cues Word(s)‏ Lexical entities in a particular order Classification Sentences beginning with artifact name Sentences ending with artifact name Sentence containing artifact name Hidden Cues

Sentences commencing with artifact name

Sentences ending with artifact name We cut trees with an axe. action upon artifact

Sentences containing artifact name Use the air+pump to fill the tyre. Use the <artifact> to <action> the <upon>

Methodology for purpose data extraction

Algorithm for Purpose Data Extraction Algorithm PurpDataExtract(corpus)‏ Step1 : Read first sentence in Corpus. Step2 : Loop until end-of-corpus – 2a. if contains(sentence, artifact) and match( sentence, cuetable)‏ then extract(sentence, artifact)‏ extract(sentence, to_action)‏ extract(sentence, to_upon)‏ add_to_ontology(artifact, to_action, to_upon) else 2b. goto step 3. Step3 : Read next sentence

Data Wikipedia – 249 files Wordnet – 81,837 descriptions Princeton noun-artifact corpus – 82,115 sentences

Observations – summary results

Purpose Data Extraction Misses

IE Metrics for Extraction

Result BreakUp per Cue Class

Comparison with manually built Ontology Exponential increase in speed High Error Rate

Issues Redundancy Primary purpose not always obtained Pronouns and brand names Correctness and consistency not guaranteed One-to-one mapping assumed Other sentence manifestations

Further Enhancements Parsed input Cues for hidden case Better artifact lookup list Multipage lookup for consistency Cloud computing Automating other attributes of PurposeNet

Conclusions A methodology was proposed for automated ontology population of purposenet The methodology was implemented on three corpora The time-taken for purposenet 'purpose' ontology population was a fraction of that by manual methods The Error rate was found to be high

Thank You

Automatic Extraction and Incorporation of Purpose Data into PurposeNet

Automatic Extraction and Incorporation of Purpose Data into PurposeNet

Presentation Transcript

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

Automatic Bibliographic Extraction System ABES

Incorporation of EMS audits into EPAS

Extraction of Coocurrence Data

Measurement and data extraction.

Automatic Extraction of Subcategorization Frames From Corpora

Data extraction

Beam alignment and incorporation into optical design

Incorporation of Genomic Information into Selection Tools

Automatic Extraction of Hierarchical Relations from Text

Value and Purpose of STEP Data

INCORPORATION OF AISA INTO THE HSRC

DSpace, ETDs, Automatic Metadata Extraction

Automatic Extraction of Object-Oriented Component Interfaces

Injection and Extraction into/out of EMMA

Automatic Data

Data Harvesting: automatic extraction of information necessary

Incorporation of EMS audits into EPAS

Data Extraction

Data Extraction

Data Extraction

Purpose of Data Analytics