80 likes | 150 Views
The Linguist’s Search Engine. 02/04/2004. Background. Address: http://lse.umiacs.umd.edu/ Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). Accessible to a general audience since 20 January 2004 (brand new!)
E N D
The Linguist’s Search Engine 02/04/2004
Background • Address: http://lse.umiacs.umd.edu/ • Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). • Accessible to a general audience since 20 January 2004 (brand new!) • No fees or complicated registration process
Some Facts – Built-in Corpus • Preprocessed corpus of about three million sentences taken from the Internet Archive www.archive.org • Automatically annotated in Penn Treebank style syntactic bracketing • Relies on computational linguistic tools (such as MXTERMINATOR, MXPOST, Charniak’s stochastic parser, the Minipar Parser, Wordnet, etc.)
Searching the built-in corpus • Nice features: • Query by example • Limited regular expressions support (e.g. disjunction, negation) • Wordnet relations are supported • Save queries for later reuse • Offensive content filter (for less embarrassing live demonstrations) • Problems: • Only English is supported (without even once mentioning this fact anywhere in the documentation!)
Demo – Simple Search • Simple search of the built-in corpus • Query by example • Search for of-genitive constructions • Query by hand • Search for ‘s-genitives where the possessor is not a proper name (i.e. NNP / NNPS) • Searching for synonyms of fearsome: fearsome#a#1/syns GO TO THE LSE
Some Facts – Customized Corpora • You can build your own collection of sentences and have them annotated • Uses AltaVista as a basis for web-wide search www.altavista.com (about 1.000.000 pages) • Extracts sentences from retrieved pages and annotates them • Job-based with fair scheduling procedures • Query syntax restricted to AltaVista queries plus expansion of inflectional forms
Demo – Customized Collection • Demo search on a collection of sentences with the verb give • How to start a new collection GO TO THE LSE
Further Information • LSE Starter’s Guide: lse.umiacs.umd.edu/lse_guide.html • LSE User’s Guide: lse.umiacs.umd.edu/lseuser/lseuser.pdf • LSE Users’ Forum: lse.umiacs.umd.edu/forum • AltaVista Documentation: www.altavista.com/help/search/help_adv • Penn Tagset: www.computing.dcu.ie/~acahill/tagset.html • Still ugly but flexible alternative: www.stanford.edu/~jstrunk/