150 likes | 407 Views
Authorship Attribution. By Allison Pollard. What is Authorship Attribution?. The way of determining who wrote a text when it is unclear who wrote it. It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece.
E N D
Authorship Attribution By Allison Pollard
What is Authorship Attribution? • The way of determining who wrote a text when it is unclear who wrote it. • It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece
The Basis • A text makes use of all linguistic domains: semantics, syntax, lexicography, phonology (orthography) and morphology. Each of these domains is rule governed, yet, within these rules and among the components, the grammar offers the writer choices. • The text as an end product is an outcome of the particular choices taken by its author. This is why each specific text carries the fingerprints of its creator.
The Assumptions: • there is a specific single author • there are choices to be made • the author is consistent in his/her preferred choices • these choices are present and could be detected in all end products of that creator
Computerized Analysis • Developed in the 1980s • Based on stylometry—the statistical analysis of literary style [quantifying some of the features of an author’s style]
Method 1:Word- or Sentence- Length • The origin of stylometry • First developed in 1887, later extended in 1938 • NOT reliable methods
Method 2:Function Words • Relies on word usage and context-free (“function”) words • Analyze frequency, position, or immediate context of words • Criticized method, cannot reliably distinguish between certain literature types
Method 3:Vocabulary Distributions • Measuring the “richness” or “diversity” of an author’s vocabulary • Analyzes the frequency profile of word-usage to glimpse the author’s extent of vocabulary
Method 4:Content Analysis • Tabulates the frequency of types of words in a text • Aims to reach the denotative or connotative meaning of the text
Method 5:Neural Networks • Recognize the underlying organization of data (which is vitally important for any pattern recognition problem, which Stylometry is)
Past Uses—Scholarly • Did Shakespeare write his own plays? • Who wrote the Federalist papers?
Recent Uses—Literary • Determine who wrote the anonymously published novel Primary Colors [Joe Klein] • Target suspects for the authorship of the Unabomber’s Manifesto [Ted Kaczynski]
Future Uses—Beyond • Identifying and blocking spam • Detecting lies, flag potential inconsistencies • Locate authors of malicious code
References • Ephratt, Michal. Authorship attribution - the case of lexical innovations. http://www.cs.queensu.ca/achallc97/papers/p006.html • Gerritsen, Corey M. Authorship Attribution Using Lexical Attraction. http://genesis.csail.mit.edu/papers/Gerritsen2003.pdf • Holmes, David I. Stylometry: Its Origins, Development and Aspirations. http://www.cs.queensu.ca/achallc97/papers/s004.html • Pfleeger, Charles P. and Shari Lawrence Pfleeger. Security in Computing. Pg 342.