140 likes | 227 Views
Extracting LTAGs from Treebanks. Fei Xia 04/26/07. Q1: How does grammar extraction work?. S. VP. VP. NP. ADVP. VP*. V. NP. ADV. draft. still. Two types of elementary tree in LTAG. Initial tree:. Auxiliary tree:. Arguments and adjuncts are in different types of elementary trees.
E N D
Extracting LTAGs from Treebanks Fei Xia 04/26/07
S VP VP NP ADVP VP* V NP ADV draft still Two types of elementary tree in LTAG Initial tree: Auxiliary tree: • Arguments and adjuncts are in different types of elementary trees
Y Y* Y* Adjoining operation
S S VP VP NP NP ADVP VP ADVP PRP PRP RB VBP VBP NP NP RB they they still draft NNS NNS draft still policies policies Step 2: Insert additional nodes
Step 3: Build elementary trees #3: #1: #2: #3: #4:
#1: #2: VP NP NP ADVP VP* PRP NNS RB they still policies #3: S VP NP VBP NP draft Extracted grammar #4:
Q2: What info was missing in the source treebank? • Head/argument/adjunct distinction • Use function tags and heuristics • Raising verbs (e.g., seem, appear) vs. other verbs. • He seems to be late • He wants to be late Need a list of raising verbs in that language • Features, feature equation (e.g., agreement), …
Q3: what methodological lessons can be drawn? • The algorithm for extracting LTAGs from treebanks is straightforward. • Some missing information can be “recovered” based on heuristics, others cannot. The extracted LTAGs are not as rich as the ones built by hand. • Nevertheless, the grammars have been shown to be useful for parsing, SuperTagging, etc.
Q4: What are the advantages of a PS or DS treebank? • The original extraction algorithm assumes the input is a PS treebank. • But it can be easily extended if the input is a DS treebank. • Extract tree segments from DS • Run DS PS algorithm on the segments to get elementary trees
Q5: Building a treebank for a formalism or building a general treebank? • I prefer the latter because • A general treebank can be used for different formalisms. • Different grammars under the same formalisms can be extracted. • Annotating a general treebank is often easier.