120 likes | 132 Views
BLUE (Boeing Language Understanding Engine) - A Quick Tutorial on How it Works. Working Note 35 2009 Peter Clark Phil Harrison (Boeing Phantom Works). BLUE. Each paragraph is broken up into sentences, then each sentence is processed in turn. For each sentence: BLUE has a
E N D
BLUE (Boeing Language Understanding Engine) - A Quick Tutorial on How it Works Working Note 35 2009 Peter Clark Phil Harrison (Boeing Phantom Works)
BLUE Each paragraph is broken up into sentences, then each sentence is processed in turn. For each sentence: BLUE has a pipelined architecture with 10 transformation steps: 1. Preprocessing 2. Parsing 3. Syntactic logic generation 4. Reference resolution 5. Transforming Verbs to Relations 6. Word Sense Disambiguation 7. Semantic Role Labelling 8. Metonymy resolution 9. Question Annotation 10. Additional Processing “An object is thrown from a cliff.” isa(object01,Object), isa(cliff01,Cliff), isa(throw01,Throw), object(throw01,object01), origin(throw01,cliff01).
1. Preprocessing • Replace math symbols with words • +,-,/,*,= become “plus”,”minus”,”divided by”,”times”,”is” • remove non-ASCII characters • replace chemical formulae with dummy noun “NaCl is a chemical” “formula1 is a chemical”
2. Parsing “An object is thrown from a cliff” *S:-16* +-----------------+-------+ NP:-1 VP:-12 +---+---+ +--------------+--+ DET:0 N^:0 AUX:0 VP:-8 | | | +---------+---+ AN N:0 IS VP:0 *PP:-4* | | +------+--+ OBJECT V:0 P:0 NP:-3 | | +--+---+ THROWN FROM DET:-2 N^:0 | | A N:0 | CLIFF
3. Syntactic Logic Generation • Produce initial “syntactic logic” • Nouns, verbs, adjectives, adverbs become objects • prepositions, verb-argument positions become relations *S:-16* +-----------------+-------+ NP:-1 VP:-12 +---+---+ +--------------+--+ DET:0 N^:0 AUX:0 VP:-8 | | | +---------+---+ AN N:0 IS VP:0 *PP:-4* | | +------+--+ OBJECT V:0 P:0 NP:-3 | | +--+---+ THROWN FROM DET:-2 N^:0 | | A N:0 | CLIFF throw01: input-word(throw01, [“throw”,v]) subject(throw01,object01) “from”(throw01,cliff01) object01: input-word(object01, [“object”,n]) determiner(object01, “an”) cliff01: input-word(cliff01,[“cliff”,n]) determiner(cliff01, “a”)
4. Reference resolution • Reference: Ties sentences together • BLUE accumulates logic for each sentence in turn • “The red ball” • search for previous object which is a red ball • If > 1, warn user and pick the most recent • If 0, assume a new object • “The second red ball” → take 2nd matching object A ball fell from a cliff. The ball weighs 10 N.
5. Transforming verbs to relations • Simple case: syntactic structure = semantic structure • But more likely: they differ • IF: a semantic relation appears as a verb • use the verb’s subject and sobject as args of the relation ;;; "A cell contains a nucleus" subject(contain01,cell01) sobject(contain01,nucleus01) input-word(contain01, ["contain",v]) ;;; "A cell contains a nucleus" encloses(cell01,nucleus01) • Special cases: • verb’s subject and preposition are the args of the relation • “The explosion resulted in a fire” → causes(explosion01,fire01) • “be” and “have” map to an underspecified relation • “The cell has a nucleus” → “have”(cell01,nucleus01)
6. Word Sense Disambiguation • Largely naïve (context-independent) WSD • same word always maps to same concept • If word maps to CLib concept, use that • If > 1 mapping, use a preference table to pick best • else climb WordNet from most likely WN sense to CLib concept CLib Ontology WordNet Physical-Object “object” Goal Concept (Word Sense) Lexical Term
7. Semantic Role Labeling • Assign using a hand-built database of (~100) rules ;;; "The man sang for 1 hour" subject(sing01,man01) "for"(sing01,x01) value(x01,[1,*hour]) ;;; "The man sang for a woman" subject(sing01,man01) "for"(sing01,woman01) ;;; "The man sang for 1 hour" agent(sing01,man01) duration(sing01,x01) value(x01,[1,*hour]) ;;; "The man sang for a woman" agent(sing01,man01) beneficiary(sing01,woman01)
8. Metonymy Resolution • Where a word is replaced with a closely related word. • Literal meaning is non-sensical • “John read Shakespeare” “Left lane must exit” • “Erase the blackboard” “Change the washing machine” • NOTE: non-sensical with respect to target ontology • 5 main types of metonymy fixed a. FORCE for EXERTION "The force on the sled"→ "The force of the exertion on the sled" b. VECTOR-PROPERTY for VECTOR "The direction of the move" → "The direction of the velocity of the move" c. SUBSTANCE for STRUCTURAL-UNIT "The oxidation number of NaCl" → "The oxidation number of the basic structural unit of NaCl" d. OBJECT for EVENT "The speed of the car is 10 km/h" → "The speed of the movement of the car is 10 km/h" e. PLACE for OBJECT "The cat sits on the mat" → "The cat sits at a location on the mat"
9. Question Annotation • Find-a-value questions: Extract a variable of interest • Clausal questions: Extract clauses to be queried about _Height23 ; find the value (no wrapper) (what-is-a _Elephant23) ; find the definition (what-is-the _Process23) ; find the identity of (how-many _Elephant23) ; find the count (how-much _Water23) ; find the amount (what-types _Cell23) ; find the subclasses of the instance's class ;;; "Is it true that the big block is red?" Assertional Triples: size(block01,x01) value(x01,[*big,Spatial-Entity) value(x02,*red) Query Triples: color(block01,x02) query-type(is-it-true-that-questionp,t) Is it true Is it false It it possible Why How 5 types
10. Additional Processing • Occasional specific tweaks, e.g., ;;; "Is it true that the reaction is an oxidation reaction?" equal(reaction01,oxidation-reaction01) ;; "Is it true that the reaction is an oxidation reaction?" is-a(reaction01,oxidation-reaction01)