Parsing by chunks

Parsing by chunks Steven P. Abney 발표자 : 박 경 미

0 introduction • Chunk는 문장을 읽는 단위(운율 패턴) • (1) [I begin] [with an intuition]: [when I read] [a sentence], [I read it] [a chunk] [at a time] • Chunk는 타입의 문법적인 전환점을 표현 • 전형적인 chunk는 여러 개의 기능어와 한 개의 내용어로 구성됨 • 고정된 템플릿과 잘 들어맞음 • Chunk의 구조를 기술하는데 CFG가 적당함 • Chunk들 사이의 관계는 어휘 선택에 의해서 결정됨 • Chunk들 사이의 공기는 각 chunk내의 head word에 민감 • Chunk 발생 순서는 chunk내의 단어 순서보다 더 유동적

1 chunks • Chunk의 존재에 대한 심리학적인 증거 • Gee and Grosjean 1983은 performance structure를 실험 • 다양한 종류의 실험 데이타로부터 나타난 word clustering의 구조가 있음 • Performance structure는 Φ-phrase에 의해 가장 잘 예측됨 • Φ-phrase는 syntactic head 다음에서 input string을 분할 • 예외) 목적격 대명사처럼 이전 내용어와 관련된 기능어는 이전 내용어와 묶임 • 결함 • 관형형 형용사는 syntactic head로 나타나지 않는다고 가정 • 예) “a big dog” 2개의 chunk 가능 • Chunk에 구문 구조 할당하지 않음

1 chunks • 본 논문에서는… • 문장에 대한 parse tree의 연결된 subgraph를 포함시켜 chunk에 구문 구조 할당 • Major head에 관하여 chunk를 정의 • Major head는 모든 내용어 • 예외) 기능어 f와 f가 선택한 내용어 사이에 나타난 내용어는 제외 • 예) “a man proud of his son”, “the proud man” h : major head Chunk의 root : h가 s-head인 parse tree에서 가장 높은 노드 구의 s-head는 가장 중요한 단어 예) 동사는 문장의 s-head 명사는 명사구, 전치사구의 s-head

1 chunks • s-head와 syntactic head가 다를 수 있음 • 예) 문장의 head로 abstract element Infl, embedded sentence(CP)의 head로 complementizer(C) (Chomsky 1986) • 예) PP의 head로 명사가 아니라 P • 예) DP-analysis하에서 명사구의 head로 한정사, 형용사구의 head로 degree element (Abney 1987) • s-head를 syntactic head에 관하여 정의 • 구 P의 syntactic head h가 내용어이면, h는 P의 s-head • h가 기능어이면 P의 s-head는 h에 의해서 선택된 구의 s-head

1 chunks • Chunk C의 parse tree TC : global parse tree T의 subgraph • TC의 root r : C를 정의하는 s-head가 내용어인 가장 높은 노드 • 예) (2)에서 major head는 “man, sitting,suitcase”, r=DP는 s-head가 “man”인, CP는 “sitting”인, PP는 “suitcase”인 가장 높은 노드 • TC는 또다른 chunk의 root를 포함하지 않는 r에 의해서 지배를 받는 T의 largest subgraph • 예) (2)에서 “man” chunk의 parse tree는 DP를 root로 하는 subtree, “sitting” chunk는 CP, “suitcase” chunk는 PP • CP는 완전한 global parse tree를 나타냄, 여기서는 subtree DP, PP가 삭제된 상태 A special case 불연속적인 경계를 포함하는 chunk내의 단어는 배제됨 Complementizer, preposition

1 chunks • Φ-phrases are generated from chunks by sweeping orphaned words into an adjacent chunk • Φ-phrases, unlike chunks, do not always span connected subgraphs of the parse tree • Ex) in (3), that John constitutes a Φ-phrase; but syntactically, the phrase that John contains two unconnected fragments • The correspondence between prosodic units and syntactic units is not direct, but mediated by chunks • Φ-phrases are elements in a prosodic level of representation • Chunks and global parse-trees are elements of two different levels of syntactic representation

1 chunks • A final issue regarding the definition of chunks is the status of pronouns • 1. Since pronouns function syntactically like noun chunks, we would like to consider them chunks • 2. they are generally stressless, suggesting that they not be treated as separate chunks • A reasonable solution is to consider them to be lexical noun phrase, and assign them the same status as orphaned words • At the level of chunks, they are orphaned words, belonging to no chunk • At the level of Φ-phrase, they are swept into an adjacent chunk • At the level of syntax, they are treated like any other noun phrase

1 chunks • About which adjacent chunk orphaned words are swept into • If the orphaned word takes a complement, it is swept into the nearest chunk in the direction of its complement • Otherwise it is swept into the nearest chunk in the direction of its syntactic governor • Ex) 주격 대명사는 뒤의 chunk, 목적격 대명사는 앞의 chunk • The units marked in (1) are Φ-phrases (not chunks)

2 Structure of the parser • A parser processes text in two stages • A tokenizer/morphological analyzer converts a stream of characters into a stream of words • The parser proper converts a stream of words into a parsed sentence, or a stream of parsed sentences • In a chunking parser, the syntactic analyzer is decomposed into two separate stages • The chunker converts a stream of words into a stream of chunks • The attacher converts the stream of chunks into a stream of sentences • It attaches one chunk to another by adding missing arcs • Ex) in (2), IP-VP lowerVP-PP

2 Structure of the parser • to illustrate the action of these three stages • Words are sets of readings. Readings, but not words, have unique syntactic categories, feature-sets, etc. “of course” • lexical ambiguity is often resolvable within chunks • There is no distinction in the final parse between nodes built by the chunker and nodes built by the attacher

Chunker3.1 LR parsing • The chunker is a non-deterministic version of an LR parser (Knuth 1965), employing a best-first search • A LR parser is a deterministic bottom-up parser (CFG) • Shifts words from the IS onto the stack until it recognizes a seq. of words matching the RHS of a rule from the grammar • Reduces the sequence to a single node, whose category is given in the LHS of the rule

3.1 LR parsing • Control is mediated by LR states (a separate control stack) • LR states correspond to sets of items • An item is a rule with a dot making how much of the rule has already been seen • The kernel of an item-set is the set of items with some category preceding the dot • ex) if (4) is at the top of the control stack, we may shift on either Det or N • The new kernel is [NP→N·] • calls for reduction of N to NP • conflict: reduce by rule VP→V NP or shift a V. In this case, lookahead decides the conflict. Shift if the next word is a V, reduce if is no input left

3.2 grammar • The lexicon includes ’s and possessive pronouns in category D • Modals and to are in category Infl • Selectional constraint • Ex) Aux imposes restrictions on its complement • Ex) a DP whose determiner is ’s does not appear in a PP chunk • incomplete→covers most chunks

3.3 non-determinism in the chunks • Two sources of non-determinism in the chunker • The points at which chunks end are not explicitly marked • Leading to ambiguities involving chunks of different lengths • A given word may belong to more than one category • Leading to conflicts in which the chunker doesn’t know whether to shift the following word onto the stack as an N or as a V • The aim of using best-first search is… • To approach deterministic parsing without losing robustness • Marcus-style deterministic parsing has two related drawback • The complexity of grammar development and debugging increase too rapidly • If the parser’s best initial guess at every choice point leads to a dead end, the parser simply fails

3.3 non-determinism in the chunks • The chunker builds one task for each possible next action • A task is a tuple that includes the current configuration, a next action, and a score • A score is an estimate of how likely it is that a given task will lead to the best parse • takes the best task from the queue • executes the task’s next action • producing a new configuration • a new set of tasks are computed for the new configuration • placed on the priority queue

3.3 non-determinism in the chunks • Executing the first task yields configuration ([[[NP→N·]], [N], 1) • Only one possible next action, [RE NP→N], producing a single new task • a score is a vector of length 4 • values range from 0 to negative infinity • as is desirable for best-first search, scores decrease monotonically as the parser proceeds • this guarantees that the first solution found is a best solution

3.4 deciding where a chunk ends • A problem with deciding where a chunk ends • Every word has an alternate reading as an end-of-input marker • LP parser treat end-of-input as a grammatical category • One piece of information that we must keep with a task, whether we hallucinate end-of-input marks or not, is which subset of the readings of the lookahead word the task is legal on • Ex) suppose we have just shifted the word many onto the stack as a Q, and the current configuration is: • (6) [[[QP→Q·]], [Q], 1

3.4 deciding where a chunk ends • The next word is are, which has two readings • Only one legal next action from configuration (6), Reduce QP→Q • That reduction is legal only if the next word is a noun • Since the noun reading of are is rare, we should disprefer the task T calling for reduction by QP→Q • If we keep sets of lookahead readings with each task, • we can slip fake end-of-input markers in among those lookahead readings

Attacher4.1 attachment ambiguities and lexical selection • The attacher’s main job is… • Dealing with attachment ambiguities • 5. prefer argument attachment, prefer verb attachment • 6. prefer low attachment • Potential attachment sites are ranked as follows: • Attachment as verb argument(best) • Attachment as argument of non-verb • Attachment as verb modifier • Attachment as modifier of non-verb • The second factor is relative height of attachment sites, • Counted as number of sentence (IP) nodes below the attachment site in the rightmost branch of the tree

4.1 attachment ambiguities and lexical selection • Special machinery • The attacher must deal with words’ selectional properties • The lexical selectional properties of a head detemine which phrase can co-occur with that head • A given word has a set of subcategorization frames • There is a good deal of freedom in the order in which arguments appear, but there are also some constraints • Two positional constraints • ‘only appears first’ (annotation on slot: ‘<’) • ‘only appears last’ (annotation on slot: ‘>’) • Arguments are also marked as • Obligatory(), optinal(‘?’), iterable(‘*’) ex) [DP<?, PP*, CP>]

4.1 attachment ambiguities and lexical selection • A frameset contains a specification of the adjuncts

Parsing by chunks