170 likes | 302 Views
Marlowe or Shakespeare? Determining the Authorship of a Mysterious Play. Chapter 9, Exercise 4 Bill Camarinos Andy Gibbons. Background.
E N D
Marlowe or Shakespeare? Determining the Authorship of a Mysterious Play Chapter 9, Exercise 4 Bill Camarinos Andy Gibbons
Background • Virtually every year for the past one hundred years a play or other work of literature is found somewhere in the United Kingdom ostensibly written by William Shakespeare or Christopher Marlowe. • Specialists in Elizabethan literature typically conclude that these “finds” are frauds.
Shakespeare and Marlowe • Both were born in 1564. • Shakespeare died in 1616. • Marlowe supposedly killed in a tavern brawl in 1593, but many suspect that this death was staged. There is enough doubt that Marlowe’s window in Westminster Abbey’s Poets’ corner has the dates of his life as “1564-1593?”
Shakespeare Authorship Controversy • Some maintain that someone other than Shakespeare was the true author of the Shakespearean canon. • Among the candidates • Edward DeVere, 17th Earl of Oxford. • Francis Bacon. • Queen Elizabeth I • Christopher Marlowe • Every year there is a court-type competition in Washington among leading attorneys to prove who is the author. Supreme Court Justices sit as judges. • Separately, there is a prize, the Hoffman prize, that will be given to whoever can convince the world that Christopher Marlowe wrote the works attributed to Shakespeare.
Our Assumptions • Shakespeare, not Marlowe or anyone else, wrote the Shakespearean canon. • The mystery play which has been found was definitely written by either Marlowe or Shakespeare. It is not another of the frauds that keep turning up.
What we have to work with • An electronic version of a play of unknown authorship • Electronic versions of all known works of William Shakespeare • Electronic versions of all known works of Christopher Marlowe
How we propose to proceed • Investigate how quantitative techniques and computers have been used to solve authorship attribution problems in the past. • Determine which techniques have the greatest probabilities of success. • Design a process for applying the selected techniques using what we have at our disposal. • Determine the true authorship.
Early and Simple Quantitative Approaches • Compare word length. Frequency distribution of word lengths in works by the authors in question. • Average number of syllables per word. • Sentence length • Percentages of different parts of speech
What is the result of applying the simple tests? • Many are better at identifying types of writing (e.g. narrative vs. drama) than they are at distinguishing one author from another. • The word-length test was actually applied to Shakespeare and Marlowe and the result was “Christopher Marlowe agrees with Shakespeare about as well as Shakespeare agrees with himself.”
Other Methods • Function-word approach. Focus on the frequency with which different articles, conjunctions and prepositions (“context-free words”) are used. Frequencies often vary significantly from one author to another. • Measure “pace” - the rate of introduction of new vocabulary into the texts. • Focus on words used only once or twice.
Other Methods (Continued) • Cumulative Sum Charts (cusums or qsums)- Compare two features using a chart • one of which is sentence length • the other of which is something like the number of two or three letter words in each sentence • similar chart patterns suggest uniform authorship. • chart patterns for a different author will diverge
Other Methods (Continued) • Use of Neural Networks • Neural Networks have powerful pattern-recognition capabilities • Network is “trained” or calibrated using data from a known author ( such as the known works of Shakespeare or Marlowe) • The network can then classify doubtful text (such as the mystery play) based on what it has “learned.” • Two researchers reported success using neural networks to compare Shakespeare and Marlowe.
What Previous Authorship Attribution Studies Have Shown • The simplest tests (e.g. word length analysis) don’t work. • Some only slightly more complex tests (e.g. function-word analysis) have had some success. • Combinations of tests, even if some are quite simple, have a high probability of success. • Success in attribution is much more likely when only two candidate authors are present. • Success becomes even more likely if there is a large body of known material available (and we have all the known works of both Shakespeare and Marlowe). • With leading edge techniques that you really don’t understand-Don’t try this at home.
Methods We Considered • Even though they show a lot of promise we ruled out neural networks since we have no experience at all in using them. • We also considered data mining. • Data are stored in a data warehouse. • Query and reporting tools, multidimensional analysis tools, and intelligent agents are used to analyze the data. • For example, intelligent agents could be fitted with an algorithm designed to find patterns. • We decided that data mining was overkill for the problem at hand.
Method We Selected • Use a readily available relational data base, Oracle, as our analysis and reporting tool. • Relational data bases organize data into tables which are related to one another using key fields. • Some of the tables we would create • Words used by Shakespeare. • Troublesome words used in the plays • Weird words used by Shakespeare • Examples of each author’s use of verse and meter.
Method We Selected (Continued) • Structured Query Language (SQL) or the associated Query by Example (QBE) would be used to query the data. • We would define how many points of similarity in use of language, verse, etc. would be needed to establish authorship. For example, samples of Shakespeare’s and Marlowe’s use of iambic pentameter in their known works would be compared to that in the mystery play • Oracle’s Report Generator would be used to create a report showing how the mystery play compares with the known texts based on the criteria we established.
Conclusion • Our task has been a fascinating, and fun, one. • Our survey of the work previously done showed that Computers and Linguistics have come a way and that computers can be used to help solve the type of authorship attribution questions that scholars have debated for years. • We believe that using a powerful relational data base to perform the kinds of tests that have proven most successful in previous studies would convince the quantitatively oriented community of the authorship of the mystery play. • We would seek validation of our results from an Elizabethan scholar who specializes in the works of Shakespeare and Marlowe. This would give credence to our results among those who are dubious of quantitative approaches.