1 / 9

STS: under the hood

Diana McCarthy Erasmus Mundus Visiting Scholar Saarland University. STS: under the hood. Proposal: Annotation with Alignments. so that we can see where the similarity lies and rationale for scores sub-alignments look for consensus on sub-parts

brygid
Download Presentation

STS: under the hood

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diana McCarthy Erasmus Mundus Visiting Scholar Saarland University STS: under the hood

  2. Proposal: Annotation with Alignments • so that we can see where the similarity lies and rationale for scores • sub-alignments • look for consensus on sub-parts • alignments annotated with relation e.g. (just for brainstorming purposes) • = (equivalence/substitutable) • != (contradiction) • → entailment • - (missing) • extra propositional • speculation/certainty • sentiment • assign category (relation) and score to whole text pair • or to sub- alignments

  3. Reference: The new system costs between $1.1 million and $22 million, depending on configuration. Candidate: The system is priced from US$1.1 million to $22.4 million, depending on configuration. Annotation (examples from Microsoft Research Paraphrase Corpus: STS pilot task)

  4. Reference: The new system costs between $1.1 million and $22 million, depending on configuration. Candidate: score 4.2 The system is priced from US$1.1 million to $22.4 million, depending on configuration. (* good starting point, but we want to look inside the box * brain storming – all annotations done by myself in 20 mins before coming to the workshop.) Annotation (Please note, 1-5 scores off top of my head before seeing guidelines for illustrative purposes only)

  5. Reference: [=A The [-X new] system][=B[=D costs] between [=C $1.1 million and $22 million], depending on configuration.] Candidate: score 4.2 [=A.4.2 The [-X] system][=B.4[=D.5 is priced] from [=C.4 US$1.1 million to $22.4 million], depending on configuration.] (* mark alignments between reference and candidate, with category (equivalence =, entails !=, - missing etc...) and score. * Alignments may overlap * May also get non contiguous sections which we can mark with same id (A, B, C etc...)) Annotation with Alignments: (brainstorming purposes)

  6. Reference: [=A The hearing occurred a day after the Pentagon for the first time singled out an officer, Dallager, for not addressing the scandal.] Candidate: score 4.9 [=A.4.9 The hearing came one day after the Pentagon for the first time singled out an officer - Dallager - for failing to address the scandal.] (* To save annotators – and systems, could avoid aligning everything. Do sub-alignments where the a subpart differs from the whole, by category or score) Annotation with Alignments: (brainstorming purposes)

  7. Reference: [=C U.S.] prosecutors [=B have arrested more than 130 individuals] and have [=D[=F seized][-Y more than] $17 million [-X]] in a continuing crackdown on [=E Internet fraud [-Z and abuse].] Candidate: score ? [=B.5 More than 130 people have been arrested] and [=D.3[-Y] $17 million [-X worth of property][=F.5 seized]] in an [=E Internet fraud [-Z]] sweep announced Friday by three [=C.5 U.S.] government agencies. (* annotators should be allowed to leave parts without annotation. Don't know is important. Also allow for comments on any item. * Could weight according to salience of word, modifier or predicate, syntactic relation, order in sentence (new information towards the end). All depends on goal.) Annotation with Alignments

  8. Reference: [=A The company][!=C didn't detail [-specD][=B the costs of the replacement and repairs]]. Candidate: score 4.9 But [=A.5 company officials][!=C [-specD expect][=B.5 the costs of the replacement work] to run into the millions of dollars.] ( * mark speculation somehow, where missing or different type of level) Annotation with Alignments:

  9. Components • Need for semantic and non semantic (syntax, pragmatic, extra propositional, extra-linguistic. • Interleaved, but components could provide score on sub-components just as annotators can • Systems mark confidence and components used on sub-alignments with categories (equivalence, contradicts, entails, speculation) • We can learn interaction, rather than assume a priori • Sampling really important, esp if want thin tail rather than just fat head! (Steedman, ACL dinner 2007) • However, all depends on your goal/practical requirement

More Related