110 likes | 234 Views
Get out the vote: Determining support or opposition from Congressional floor- debate transcripts. Matt Thomas, Bo Pang, and Lillian Lee Cornell University. Get Out the Vote. * http://www.cs.cornell.edu/home/llee/papers/tpl-convote.home.html. Motivation. Congressional debates contains
E N D
Get out the vote: Determining support or opposition from Congressional floor-debate transcripts Matt Thomas, Bo Pang, and Lillian Lee Cornell University
Get Out the Vote * http://www.cs.cornell.edu/home/llee/papers/tpl-convote.home.html
Motivation • Congressional debates contains • Very rich Language • Wide variety of topics • More time spent on evidence • Agreement information between debaters can provide additional benefits for documents that are relatively harder to classify individually.
Corpus • GovTrack • House of Representatives’ floor-debates transcripts for 2005 • Concentrated on debates regarding “controversial” bills (ones in which the losing side generated at least 20% of the speeches) • Made sure that the speech segments from an individual debate appears in the same set
Method 2 Classifiers (SVMlight) • Individual-Document Classifier: scores each speech-segment s in isolation. • Agreement Classifier: for pairs of speech segments. • Trained to score by-name references as how much they indicate agreement. • e.g “I blieve Mr. Smith’s argument is persuasive”
Method Individual-Document Classifier: • Plain unigrams as features • Normalized presence-of-features rather than frequency-of-features. • Y for yea, N for nay • d(s): the signed distance between sand the trained SVM decision plane • where σs is the standard deviation of d(s) over all speech segments s in the debate in question • ind(s,N) = 1−ind(s,Y)
Method Two types of Agreements: • Same-speaker: (A number of comments - Opinions may change) • Under the assumption that most speakers do not change their positions. There are two possible solutions: • All comments by the same speaker receive the same label Y or N . • Concatenation of same-speaker speech segments. • Different-Speaker: • References are: • Indentified by-name mention. • Represented as word-presence vectors derived from windows of text surrounding the reference. • Annotated with a positive or negative label based on the speakers’ agreements in voting.
Method Agreement Classifier: • d(r) is the distance from ragreement-vectorto the SVM decision plane • σris the standard deviation of d(r) over all reference segments r in the debate in question. • αisa free parameter to specify the relative importance of the agrscores derived via tuning on the development set. • θagris the threshold to control the precision of the agreement links. Agreement-classifier precision
Method • Classification framework: Optimization problem: find a classification C that minimizes:
Evaluation Amendment/No amendment agreement Classification: Note that amendment-related speech segments were never included in the development or test set, since their labels are probably noisy
Evaluation • Speaker-based speech-segment Classification: Segment-based speech-segment Classification: