130 likes | 150 Views
Learn about Named Entity Recognition techniques, tools like Google Cloud Natural Language, and real-world applications in text analysis and information retrieval. Dive deep into Named Entity Recognition systems and their role in modern AI and NLP. Explore dependency-based parsing and Universal Dependencies for enhanced linguistic analysis. Get practical insights from examples and demos.
E N D
LING/C SC 581: Advanced Computational Linguistics Lecture 3 Jan 17th
Named Entity Recognition In my other class, doing a demo: • University of Illinois • https://cogcomp.org/page/demo_view/NERextended • Unfortunately, it is down this week so far…
Named Entity Recognition • Google Cloud Natural Language: • https://cloud.google.com/natural-language/ • also supplies sentiment/magnitude scores for the identified entities
Named Entity Recognition • Illinois Named Entity Recognizer example: Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace. Down below, bomb-sniffing dogs will patrol the trains and buses that are expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super Bowl between the Denver Broncos and Seattle Seahawks. The Transportation Security Administration said it has added about two dozen dogs to monitor passengers coming in and out of the airport around the Super Bowl. On Saturday, TSA agents demonstrated how the dogs can sniff out many different types of explosives. Once they do, they're trained to sit rather than attack, so as not to raise suspicion or create a panic. TSA spokeswoman Lisa Farbstein said the dogs undergo 12 weeks of training, which costs about $200,000, factoring in food, vehicles and salaries for trainers. Dogs have been used in cargo areas for some time, but have just been introduced recently in passenger areas at Newark and JFK airports. JFK has one dog and Newark has a handful, Farbstein said.
Universal Dependencies (UD) http://universaldependencies.org/ • 100 treebanks in over 70 languages Some relations involving dependent clauses: • ccomp: connects higher verb with verbal head of sentential complement with overt subject • xcomp: connects higher verb with verbal head of non-finite sentential complement without a subject. • csubj: connects higher verb with verbal head of sentential subject. • vmod ➤ advcl/acl: connects word to verbal head of a reduced non-finite verbal modifier (deprecated in UD; still emitted by syntaxnet)
Google Cloud Natural Language RRS Sir David Attenborough "BoatyMcBoatface" • ParseyMcParseface (Andor et al., 2016) • Free: DragNN (Kong et al., 2017), the follow-on to SyntaxNet(2016) • Free sampling at https://cloud.google.com/natural-language/ • For-Pay Google Cloud version is trained on additional proprietary corpora
Quick Homework 3 • The Penn Treebank is partially installed as a corpus in NLTK Data (Sections 00 and 01: wsj_0001.mrg to wsj_0199.mrg) • from nltk.corpus import treebank • Methods: • .words() • .sents() • .parsed_sents() • .draw() • .fileids()
Quick Homework 3 • Pick a random (see right) parse from treebank • Run it through the Google Cloud Parser • Analyze and comment on how it compares to the gold standard parse • include the gold tree and the Google dependency parse • One PDF file • Due next Wednesday (by midnight) • import random • random.seed() • random.randrange(0,3914) 1462 >>> len(treebank.sents()) 3914