110 likes | 330 Views
GoogleDictionary. Paul Nepywoda Alla Rozovskaya. Goal. Develop a tool for English that, given a word, will illustrate its usage. Who Will Benefit. Learners of English Teachers of English Native speakers who wish to find common usages of a word. Similar Tools?. Dictionaries BUT our tool
E N D
GoogleDictionary Paul Nepywoda Alla Rozovskaya
Goal • Develop a tool for English that, given a word, will illustrate its usage
Who Will Benefit • Learners of English • Teachers of English • Native speakers who wish to find common usages of a word
Similar Tools? • Dictionaries BUT our tool • focuses on the usage of words and not on defining their meanings • ranks expressions based on frequency • extracts examples straight from context
Similar Tools? • Google BUT our tool • focuses on finding high frequency neighboring words instead of simply the documents that contain the target word
Data Resources • Corpus of newspaper articles (3.5 Million words) [used for demo] • Advantage: large amount of data • Disadvantage: limited domain • Use a search engine to build a corpus of documents containing the target word • Advantages: various domains, dynamic data source • Disadvantage: time to download documents
Implementation (1) • Search a corpus to determine the most typical words by extracting words within a certain window of the target word and rank words based on their frequencies -compute rank of single words and pairs of words within a window
Implementation (2) • Computing rank of expression • Tf :raw count • Idf of a word : • Position Normalization: Reward context words closer to the target
Interface • Output ranked list of expressions with example sentences via the Web Examples: course information notorious come come(without idf)
Further Improvements • Use a search engine to build a corpus • Allow phrase searching • Provide option to search for highly frequent phrases as opposed to idiomatic expressions
Conclusion • We have presented a tool that given a word will find typical usages of the word in natural language • The tool should be useful for • learners of English • native speakers