390 likes | 552 Views
Why am I here?. Details, Details, Details. Location, Location, Location. Everybody has to be somewhere! (Eccles: Goon Show circa 1950). Computing Research Laboratory. Language Engineering at CRL. Information retrieval Language learning and language teaching Automatic translation
E N D
Why am I here? Details, Details, Details Location, Location, Location
Everybody has to be somewhere! (Eccles: Goon Show circa 1950) Computing Research Laboratory
Language Engineering at CRL • Information retrieval • Language learning and language teaching • Automatic translation • Summarization • Question answering • Dictionary development • Knowledge discovery
Field Guide Locations Habitat Mainly deciduous forests and woodlands; often seen over adjacent farmlands. Nesting 2 whitish eggs, heavily marked with dark brown, placed without nest or lining in a crevice in rocks, in a hollow tree, or in a fallen hollow log. Range Breeds from southern British Columbia, central Saskatchewan, Great Lakes, and New Hampshire southward. Winters in Southwest, and in East northward to southern New England.
Tipster/MUC Named Entity Task Scotland (CITY) Alabama (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Arkansas (PROVINCE 1) United States (COUNTRY)Scotland (CITY) California (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Connecticut (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Florida (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Georgia (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Indiana (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Maine (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Maryland (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Massachusetts (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Mississippi (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Missouri (PROVINCE 1) United States (COUNTRY)Scotland (CITY) New Hampshire (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Ohio (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Pennsylvania (PROVINCE 1) United States (COUNTRY)Scotland (CITY) South Dakota (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Texas (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Virginia (PROVINCE 1) United States (COUNTRY)Scotland (PROVINCE 1) United Kingdom (COUNTRY)Scotland (PROVINCE 2) Missouri (PROVINCE 1) United States (COUNTRY) Seven Page Task Definition Multi-name expressions containing conjoined modifiers (with elision of the head of one conjunct) should be marked up as separate expressions. "North and South America" <ENAMEX TYPE="LOCATION">North</ENAMEX> and <ENAMEX TYPE="LOCATION">South America</ENAMEX> + Gazetteer – from USGS and National Geographic
MINDS • User configurable summarization system based on sentence selection • Summarizes documents in Spanish, Japanese, Russian, Turkish, Korean, and English • Summaries can be biased to favor place names, or other named entities
Basic Summarization Method • Document structure analysis • Keyword analysis • Part of Speech and Proper Name Recognition • Sentence selection based on weighted scores
Boas: “A Linguist in the Box” Boas is a semi-automatic knowledge elicitation system that guides a language speaker through the process of developing the static knowledge sources for a moderate-quality, broad-coverage MT system from any “low-density” language into English in about six months. One of the tasks is translating a long list of place names from English into the source language.
The ethnologist and linguist Franz Boas was the founder of the American school of descriptive linguistics. In this photo, circa 1900?, he is shown posing for a model which was being made of a Kwakuitl Winter Ceremonial dancer in which the dancer emerges from within a circular hole cut in the dancing screen.
Onomastics The study of proper names
Keizai - Human Assisted Query Translation for Cross-Language Retrieval
ATS – Differences in Arabicالاختلافاتاللغويةفيالأقطارالعربية Arabic Speaking Countries Document Collection Morphological Analysis Information Retrieval Sub-corpus Analysis • Explore differences in lexical usage due to – • Transliteration • Cultural background(west – French, east – English) • Spelling differences For example – السيدا SIDA, الايدز AIDS الأوبيب OPEP, الأوبيك OPEC الأستاذTeacher (Algeria), الاستاذTeacher (Oman)
Word AFP English Occurrences لوس انجليس Los Angeles 21 لوسانجلوس Los Angeles 23 لوس انجيلس Los Angeles 2 لوس انجيليس Los Angeles 34 انجلترا England 2 انكلتر England 1 انكلترا England 1 كارولاينا Carolina 26 كارولينا Carolina 14 ويسكونسين Wisconsin 8 ويسكنسن Wisconsin 2 ويسكونسن Wisconsin 16 نيوهامبشير New Hampshire 15 نيوهامبشر New Hampshire 9 Different Spelling Forms from AFP Arabic Newswire
Word Transliteration English Name شارلوت Sharlote Charlotte المانيا Alemania Germany اوروبا Europa Europe موسكو Moscou Moscow طرابلس Tarabulus Tripoli الكويت Al Kuwayt Kuwait باولوس Paulos Pauls بروكسل Brussells Brussels برلين Berlien Berlin فلسطين Palestina Palestine بريطانيا Britania Britain بيروت Bayreuth Beirut لاغوس Lagus Lagos Place Names Transliterated using National Name
Meaning Oriented Question Answering - MOQA Computing Research Laboratory (NMSU) Institute for Language and Information Technology (UMBC) CoGenTex, Inc. ILIT An AQUAINT project by: • Domain • Travels • Meetings • Languages • English • Arabic • Persian • Method • Fact Repository from Text • Ontology based: Search Form Results - Text Retrieval - Text Analysis - Question Analysis This work was supported in full by the Advanced Research and Development Activity (ARDA)’s Advanced Question Answering for Intelligence (AQUAINT) Program under contract number 2002*H167200*000.
Text Meaning Representation • proposition _1 • head %exit_1 • agent human_54 “Mr. Smith” • source location_23 “London” • destination location_25 “Ankara” • means vehicle_65 “Boeing 757” • tmr-time • time-begin YYYYMMDD “July 2, 2000” • aspect • iteration single; phase end… “departed” • polarity positive • mood indicative
Resources • U.S. GEOLOGICAL SURVEY • FEDERAL GEOGRAPHIC DATA COMMITTEE • NATIONAL IMAGERY AND MAPPING AGENCY • U.S. BUREAU OF LAND MANAGEMENT • U.S. FOREST SERVICE • US CENSUS BUREAU • GETTY THESAURUS OF GEOGRAPHIC NAMES
Resources from (and for) the Humanities are Multilingual! Computers and the Visual Arts: Editorial ... These developments led a branch of the Comité Internationale Pour l'Histoire de L'Art (CIHA) to conceive Thesaurus Artis Universalis (TAU) which soon proved ... www.sumscorp.com/sums/articles/dahlberg.html - 11k - Cached - Similar pages [PDF]1 Regard sur l'informatisation des collections de musées d'art ... File Format: PDF/Adobe Acrobat - View as HTML ... audacieuses qui nous promettaient, par exemple, un renouvellement complet et inéluctable de l'histoire de l'art grâce au Thesaurus Artis universalis ? ... www.kikirpa.be/www2/Site_irpa/ En/Publi/Doc/PYK/Kairis.pdf - Similar pages Marco Lattanzi - [ Translate this page ] ... stata da tempo recepita dalla comunità internazionale degli studiosi che, nell’ambito del gruppo di lavoro TAU (Thesaurus Artis Universalis), ha costituito ... www.ibc.regione.emilia-romagna.it/soprintendenza/ arcaut/lattanzi.html -
Getty Record for Edmonton • ID: 7013032 Record Type: administrative • Edmonton (inhabited place) • Coordinates: • Lat: 53 34 00 N degrees minutes • Long: 113 25 00 W degrees minutes • Note: Located on N Saskatchewan river; flourished as center for agricultural distribution & processing after arrival of Canadian Pacific Railway 1891; petroleum was discovered nearby at Leduc, Redwater & Pembina mid-20th cen. • Names: • Edmonton (preferred, C,V,N) • Strathcona (H,V,N) ............ formerly located on river's S bank; absorbed into city 1912 • Ft. Edmonton (H,V,N) ............ fur-trading post for Hudson's Bay Company constructed 20 miles downstream from current site 1795; abandoned 1810
Getty Record for Edmonton (Contd.) • Hierarchical Position: • World (facet (hierarchical)) • North and Central America (continent) • Canada (nation) • Alberta (province) • Edmonton (inhabited place) • Place Types: • inhabited place (preferred, C)............ expanded in 19th cen. • city (C) ............ incorporated 1904 • provincial capital (C) ............since 1905 • industrial center (C) • transportation center (C) • university center (C)
Address Data Content Standard Public Review Draft Subcommittee on Cultural and Demographic Data Federal Geographic Data Committee April 17, 2003 Version 2 http://www.fgdc.gov/