360 likes | 539 Views
Semantic Textual Similarity (STS) Workshop. Mona Diab Eneko Agirre. Logistics. Location March 12: The Interchurch Center (TIC), Room C&D March 13: The Interschool Laboratory (IL), CEPSR 750 Lunch & Breaks Same room on both days Dinner Monday March 12 (Today)
E N D
Semantic Textual Similarity (STS) Workshop Mona Diab EnekoAgirre
Logistics • Location • March 12: The Interchurch Center (TIC), Room C&D • March 13: The Interschool Laboratory (IL), CEPSR 750 • Lunch & Breaks • Same room on both days • Dinner Monday March 12 (Today) • If you have not signed up, please do by 10:30am Monday March 12 • Restrooms • Monday TIC: Lower level, take escalators down one floor and then to the left of the cafeteria • Tuesday IL: Same floor (signs are posted)
Logistics • Wifi: General Wifi access • SSID: guest@interchurch • User Name: guest • Password: guest12345 • Presentations • Please send them to weiwei@cs.columbia.edu or give them to him on a flash drive during a break ahead of the session
Today’s Agenda Highlights • 9:00 - 9:30am Introductions and Overarching goals of workshop • 9:30 -10:30am Discussion of What is STS? [Item A] • 10:30 - 11:00 Coffee Break • 11:00 - 11:30am SemEval 2012 STS Task • 11:30 - 12:00pm Sample Manual Annotation by participants • 12:00 - 1:00pm Discussion of participant annotations • 1:00 - 2:00pm Lunch • 2:00 - 2:30pm Evaluation of STS [Item B] • 2:30 - 4:00pm NLP applications that would benefit from STS [Item C] • 4:00 - 4:30pm coffee break • 4:30 - 5:30 Howtocreatean STS blackbox? [Item D]
Game plan for both days • This is a workingworkshop, participants are encouraged (urged) to participate and contribute, both physically present people and remotely participating people • Each session is led by either Mona or Eneko, but discussion is expected throughout • End of each session we will go over a summary/action points from the session where relevant
Acknowledgments: Credit where due • Ido Dagan • Martha Palmer • Dan Cer • Alessandro Moschitti • SIGLEX Board members: Diana McCarthy, KatrinErk, Sebastian Pado, RadaMihalcea • Nancy Ide, James Pustejovsky, SandaHarabagiu • NSF Program Directors (Tanya Korelsky, Terry Langendeon) • DARPA for funding this! $$ is important • CCLS for their logistical support • Special thanks to WeiweiGuo (just got his STS paper accepted to ACL, YAY)! • Thanks All for accepting our invitation!
Discussions Resulted in …. • *SEM • http://ixa2.si.ehu.es/starsem/ • Be sure to submit papers there (please ) • SEMEVAL 2012 STS Task 6 • http://www.cs.york.ac.uk/semeval-2012/task6/ • This STS Workshop • http://www.cs.columbia.edu/~weiwei/workshop/index.html
Introductions • Please introduce yourself • Name and Affiliation • Briefly: Relevance of STS to you/your work, name • Semantic component (enabling technology) • Resource for STS • End NLP application • Infrastructure/large systems • Theoretical considerations • All of the above
Goals of STS Workshop • Pool community with respect to relevance of STSto NLP (thanks for overwhelming positive response to our invitation) • Foster collaboration with a concrete by-in from different participants towards building a real STS framework • Pursue/seek funding to realize STS
STS Workshop Considerations • What is STS? • How to characterize STS quantitatively and qualitatively? • What semantic components contribute to STS • How to create a principled empirical STS framework with utility and intrepretability? • Could this lead to a better understanding of semantics of NL • How to create an STS blackbox? • How can different semantic components/features interact • What kind of resources and tools are necessary for such an effort • Infrastructure desiderata
STS Workshop Considerations • Evaluation of STS • Intrinsic • Graded vs. Binary Similarity • Metric considerations • Extrinsic • How to illustrate the utility of STS to end NLP applications such as MT, Distillation, etc. • Future directions • Monolingual vs. Multilingual • Shared *SEM task? • Potential proposal submissions/funding avenues • Collaboration across the pond!
STS Framework Research Goals • To create an interoperable STS pipeline that integrates different semantic components ranging from simple word similarity to more nuanced semantic components that can handle more complex semantic and pragmatic phenomena such as modality and lambda logic. • To perform intrinsic evaluation of STS • To show the utility of STS to large NLP applications using extrinsic evaluations • To advance our understanding of the underpinning semantics of natural languages and how we can empirically exploit this knowledge • To foster stronger collaborations within the Semantic community and across to other sub-communities within CL
STS Vision UIMA or some other platform? NLP Applications Text A STS Box Text B Fundamental NLP Tools: Tokenizers, POS Taggers, Lemmatizers, Chunkers, etc. Linguistic Resources: Corpora (raw and annotated),Treebanks, Ontologies, Propbanks, Dictionaries, etc,
STS Box • A single system which takes features from different semantic layers of representation integrated (focus of current SemEval 2012 STS Task 6) • Multiple semantic components • Performance of components (confidence in results) • Type of component • Relevance to task • How to order the components in a sequential pipeline • If multiple components performing same task, how to control for redundancy and complementarity • Layering annotations of different semantic knowledge on the same data • Interaction/dependency between different semantic annotations • Representation assumptions • Formalism assumptions • How to operationalize the interaction among components
What is Semantic Textual Similarity? Shjkahsiunuiuhndhaudhdkhn hdhaud8 kdhikahdidhjhddhjhjiidhiihiiohiohihiahdiodYo! Come over here, you will be pleasantly surprised idoasdioidjioiojidjduioiodiooiiiouiooiudoiifuiosufiuoioiuiouoiiohiyuify 8iy ihiouoiuou o ooihyiushiuhfhdfosiipupouosuoiuoi o oisyoisyoisihoiiouiosoisuoisuoisoudiosudoisoidduososoiiooioisosuo. جدالكجد يدجياجد يجدي يج جي وغو يحيح يحسيفحس يحيحفي سف ي جي جيييدج كجساكجاس حفجحسوجح ج. كححسح حيحي حوحوس دح حدي يجدي يو جي جيحجفححكسحجسكحك حفحسوحوشيحيدويويد وي يوسحفوفوفوطبس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Hnhwhdunduuhjjsijddjiowoijdoidjdk uwhd8 yhdjhdhwuihjhu h uh jhihk, jdhhii, gdytysla, yuiyduinsjsh, iodpisomkncijsi. Kjhhuduh, dhdhhdhhduhdjjhuiq…Welcome to my world, trust me you will never be disappointed djijdpidiowdiw I iwfiowifiwoufowiioiowruoiyfi I wioiwfoidoiiwoiwyiowuouwrujjdhihiiohoihiofuouoou o oufois f uhdiyoioiooouiosufoisufiouioufpaidppaudoiuiufhuhhioiof Semantic Similarity 안녕하세요 제가 당신에게 전화했지만 아무 소용이있을려고 ... 당신이 시간을 즐기고 있었다 희망 Добро пожаловать в мой мир, поверьте мне вы никогда не будете разочарованы Quantitative Graded Similarity Score Confidence Score Principled Interpretability, which semantic components/features led to results (hopefully will lead to us gaining a better understanding of semantics)
Monolingual Semantic Similarity Yo! Come over here, you will be pleasantly surprised Welcome to my world, trust me you will never be disappointed بس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Semantic Similarity
Monolingual Semantic Similarity Yo! Come over here, you will be pleasantly surprised Welcome to my world, trust me you will never be disappointed بس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Semantic Similarity Semantic Similarity score: 4.5, Grade: 4 Interpretation: Lexical X Y, Syntactic AB, CD, Scoping xyz, etc Confidence: 0.8
Multilingual Semantic Similarity Yo! Come over here, you will be pleasantly surprised Welcome to my world, trust me you will never be disappointed بس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Semantic Similarity Semantic Similarity score: 3, Grade: 5 Interpretation: lexical B C D, syntactic, pragmatic Confidence: 0.9
Why STS? • Most NLP applications need some notion of semantic similarity to overcome brittleness and sparseness • IR, IE, QA, MT, Dialogue, Pedagogical Systems, … • Also enabling tasks like parsing, SRL, Textual Entailment, ... • Provides evaluation beyond surface text processing • “Understanding” or interpretability of results • Nuanced semantics with utility • A hub for semantic processing as a black box in applications beyond NLP (open source release) • Lends itself to an extrinsic evaluation of scattered semantic components
Why STS? • Monolingual Space • MT evaluation • Summarization • Paraphrase Generation • Multi Lingual Space • Direct MT evaluation • X-lingual Summarization • X-lingual Generation • But overall better understanding of semantic spaces • How do different languages carve up the space • What impact does it have on our thinking • Relates to code switching and speaker state as well?
What is STS? • The graded process by which two snippets of text (t1 and t2) are deemed equivalent semantically, i.e. bear the same meaning • An STS system will quantifiably inform us on howsimilar t1 and t2 are, resulting in a similarity score • An STS system will tell us why t1 and t2 are similar giving a nuanced interpretation of similarity based on semantic components’ contributions
What is STS? • Word similarity has been relatively well studied • For example according to WN cord smile 0.02 rooster voyage 0.04 noon string 0.04 fruit furnace 0.05 ... hill woodland 1.48 car journey 1.55 cemetery mound 1.69 ... cemetery graveyard 3.88 automobile car 3.92 More similar
What is STS? • Fewer datasets for similarity between sentences A forest is a large area where trees grow close together. VS. The coast is an area of land that is next to the sea. [0.25]
What is STS? • Fewer datasets for similarity between sentences A forest is a large area where trees grow close together. VS. Woodland is land with a lot of trees. [2.51]
What is STS? • Fewer datasets for similarity between sentences Once there was a Czar who had three lovely daughters. VS. There were three beautiful girls, whose father was a Czar. [4.3]
Multilingual STS • No one to our knowledge has directly quantified the cross linguistic similarity between two texts
How is STS different from … • Rich Textual Entailment (RTE) to date • RTE binary vs. STS graded • directionality (text to hypothesis) • typically text is (much) longer than hypothesis • Paraphrase (Pph) to date • Pph binary vs. STS graded • Notion of (principled) interpretability
Pipelined STS • An interoperable pipeline of semantic components • Input • Two text snippets • Output • Numerical score of similarity with graded similarity on a scale of 0-5 • What semantic components/features led to score (principled interpretability) • Confidence level in response • Evaluation • Intrinsic evaluation in the context of sentence similarity • Extrinsic evaluation in the context of MT evaluation • Intrinsic component evaluations
Main Objectives • Plug & play environment for semantic components • WSD/WSI, lexical substitution, SRL, MWE, paraphrase, anaphora and coreferenceresolution, time and date resolution, named-entity handling, Under specification, hedging, semantic scoping, discourse analysis, etc. • Pipeline Creation • Components produce scores, then combine • Combine Features directly in MuSeS environment • Interpretability of contributing factors • Explicitly characterize why they are considered similar, i.e. which semantic component(s) contributed to the similarity score • Quantifying STS, formalizing it as a probabilistic story • Associating confidence levels with scores
Call on people for contribution KatrinErk Christian Chiarcos Enrique Alfonesca
Intrinsic Evaluation Issues (Item B) • Binary similarity • What is the cut off threshold • Graded similarity • How to bin the results (2-4) • How to assess and integrate confidence values from components? Should we weight different components differently? • Depend on their stand alone performance • Weight their contribution by their salience and relevance to STS? Theoretical considerations? • Degree/Level of transparency/interpretability?
Extrinsic Evaluation Issues (Item B) • How to integrate the STS blackbox in an NLP application • Is it simply ablation or is there something more interesting • Where to integrate STS in different applications • Do different applications require different types of STS (biased/weighted STS)? What implications would that have on design of STS? • Can we come up with different STS formalisms (i.e. maybe with a known set of components?) similar to different syntactic formalisms/perspectives • Role of intrinsic STS confidence level in integration and evaluation • Again, Degree/Level of transparency/interpretabilityof underlying semantic components?
STS in NLP Applications (Item C) • Distillation and MT (Marjorie Freedman) • MT and MT evaluation (AlonLavie, Dekai Wu, Lucia Specia, Kevin Knight, Scott Miller) • Machine Reading (Ralph Weishdel) • Watson Jeopardy (AlfioGliozzo) • Generation (Christian Chiarcos) • Summarization (Enrique Alfonseca) • Opinion Mining and Social Media Mining (SandaHarabagiu) • Inference (Johan Bos, Ido Dagan) • (Tentative) Semantic Web and Ontologies (Michael Uschold)