240 likes | 377 Views
Automatic Extraction of Hierarchical Relations from Text. Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva , Hamish Cunningham, Ji Wang Presented by: Khalifeh Al- Jadda. Outlines. Introduction Motivation Contribution Experiment and Results Conclusion Discussion points.
E N D
Automatic Extraction of Hierarchical Relations from Text Authors: Ting Wang, Yaoyong Li, KalinaBontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda
Outlines • Introduction • Motivation • Contribution • Experiment and Results • Conclusion • Discussion points
Introduction • What is Information Extraction (IE)? • is a process which takes unseen texts as input and produces fixed-format, unambiguous data as output. It involves processing text to identify selected information, such as particular named entity or relations among them from text documents.
Introduction • Most researches have focused on use of IE for populating ontologies with concept instances. • Examples: • Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM Semi-automatic CREAtion of Metadata, 2002. • Motta, E., VargasVera, M., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup, 2002.
Motivation • An Ontology-based application can’t be adapted to work with different domains. • Some Machine Learning (ML) techniques were used to overcome the problem this problem. • ML techniques: • Hidden Markov Models (HMM). • Conditional Random Fields (CRF). • Maximum Entropy Models (MEM). • Support Vector Machine (SVM)--- The best
Contribution • The paper propose a new technique by applying SVM with new features to discover a relation between entities and then determine the type of that relation. • This technique can be applied to any domain. • The Information Extraction system that used as a base to the proposed technique was Automatic Content Extraction (ACE).
The Automatic Content Extraction (ACE) • Is a relational extraction program that uses Relation Detection and Characterization (RDC) according to a predefined entity type system. • ACE2004 introduced a Type and Subtype hierarchy for both entity and relations. • Entities are categorized in a two level hierarchy, consisting of 7 types and 44 subtypes.
Why SVM? • Even though it is a binary classifier but it can be easily extended to be multi-class classifier by using simple techniques like one-against-all or one-against-one. • It is scalable which means it can work with large scale and complex data set. • It start with a huge number of features but then it ignores and eliminate unnecessary features.
Features for relation extraction • The researchers have used General Architecture for Text Engineering (GATE) for feature extraction. • Let’s take this example of a sentence to show different type of features: Atlanta has many cars
Cont.. • Word Features: • 14 features include: • Entity mention (Atlanta,cars) • The two heads (two words before entity and two after) • Word list between two entities • POS Tag Features: part-of-speech tagging • Atlanta/NNP has/VBZ many/JJ cars/NNS • NNP: proper name • JJ: adjective • NNS: plural noun
Cont.. • Entity Features: ACE2004 classify each entity into it’s proper Type, subtype, and class. • Atlanta is GPE • Mention Features: includes • Mention type (AtlantaNAM, CarsNOM) • Role information (only for GPE) • Overlap Features: concern on the position of entities • The number of words separating them. • Number of other entity mentions in between. • Whether one mention contains the other.
Cont.. • Chunk Features: GATE integrate two chunk parsers: • Noun phrase chunker (NP) (Atlanta,Cars). • Verb phrase chunker (VP) (has). • Dependency Features: determine the dependency relationships between the words of a sentence. • Parse Tree Features: the features on syntactic level are extracted from the parse tree. BuChart parser used in this research. Atlanta
Cont.. • Semantic Features from SQLF: Buchart provides semantic analysis to produce SQLF for each phrasal constituent. • Semantic features from WordNet: • Synset-id list of the two entity mentions. • Synset-id of the heads (two words before and words after)
Experiment Results • To assess the accuracy of classification these measures are used: • Precision • Recall • F-measure
Conclusion • This research investigated SVM-based classification for relation extraction and explored a diverse set of NLP features. • The research introduces some new features including: • POS tag, entity subtype, entity mention role..etc • The experiments show an important contribute to performance improvements
Discussion points • Is this technique convenience to automate ontology building? • Are you with or against using huge number of features (in our case 94) to represent a relation? • How many people see that this is an applicable and useful technique for relation extraction? • Why yes and why No?