1 / 24

Determining Authorship

Determining Authorship. March 21, 2013. Determining Authorship. Define Problem. Find Data. Write a set of instructions. Project Gutenberg. Python. Solution. Determining Authorship: Data. Five Books from a Famous Children’s Series. One Book from a Famous Children’s Series.

amina
Download Presentation

Determining Authorship

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Determining Authorship March 21, 2013 CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  2. Determining Authorship Define Problem Find Data Write a set of instructions Project Gutenberg Python Solution CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  3. Determining Authorship: Data Five Books from a Famous Children’s Series One Book from a Famous Children’s Series CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  4. Determining Authorship: Data Five Books from a Famous Children’s Series One Book from a Famous Children’s Series Six Books from Two Famous Children’s Series CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  5. Determining Authorship Define Problem Find Data Discern the Outlier: The one book that is NOT in the series of the others. Write a set of instructions Python Solution CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  6. Remember the Federalist Papers Wikipedia http://pages.cs.wisc.edu/~gfung/federalist.pdf • 85 articles written in 1787 to promote the ratification of the US Constitution • In 1944, Douglass Adair guessed authorship • Alexander Hamilton (51) • James Madison (26) • John Jay (5) • 3 were a collaboration • Corroborated in 1964 by a computer analysis CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  7. Determining Authorship 1 2 3 4 5 6 Discern the Outlier: The one book that is NOT in the series of the others. vs. 1 2 CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  8. Stop Words Stop Words are words that are filtered out in natural language processing CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  9. Stop Words Stop Words are words that are filtered out in natural language processing CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  10. Stop Words a, able, about, across, after, all, almost, also, am, among, an, and, any, are, as, at, be, because, been, but, by, can, cannot, could, dear, did, do, does, either, else, ever, every, for, from, get, got, had, has, have, he, her, hers, him, his, how, however, i, if, in, into, is, it, its, just, least, let, like, likely, may, me, might, most, must, my, neither, no, nor, not, of, off, often, on, only, or, other, our, own, rather, said, say, says, she, should, since, so, some, than, that, the, their, them, then, there, these, they, this, tis, to, too, twas, us, wants, was, we, were, what, when, where, which, while, who, whom, why, will, with, would, yet, you, your http://www.textfixer.com/resources/common-english-words.txt Stop Words are words that are filtered out in natural language processing CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  11. Stop Words a, able, about, across, after, all, almost, also, am, among, an, and, any, are, as, at, be, because, been, but, by, can, cannot, could, dear, did, do, does, either, else, ever, every, for, from, get, got, had, has, have, he, her, hers, him, his, how, however, i, if, in, into, is, it, its, just, least, let, like, likely, may, me, might, most, must, my, neither, no, nor, not, of, off, often, on, only, or, other, our, own, rather, said, say, says, she, should, since, so, some, than, that, the, their, them, then, there, these, they, this, tis, to, too, twas, us, wants, was, we, were, what, when, where, which, while, who, whom, why, will, with, would, yet, you, your Why should we look at the frequencies of stop words? http://www.textfixer.com/resources/common-english-words.txt Stop Words are words that are filtered out in natural language processing CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  12. Determining Authorship Discern the Outlier: The one book that is NOT in the series of the others. vs. 1 2 Calculate the word frequencies of the stop words in the two books CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  13. Determining Authorship Discern the Outlier: The one book that is NOT in the series of the others. vs. 1 2 Calculate the word frequencies of the stop words in the two books Normalize the word frequencies CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  14. Determining Authorship Calculate the word frequencies of the stop words in the two books Normalize the word frequencies • Design a metric to compare the two files • A metric is a function that defines a distancebetween two things CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  15. Determining Authorship Calculate the word frequencies of the stop words in the two books Normalize the word frequencies • Design a metric to compare the two files • A metric is a function that defines a distancebetween two things Write a compareTwo(list1,list2) function that returns a float. CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  16. Determining Authorship Download and extract ACT2-7.zip Compile and run testFiles('output.csv') CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  17. Determining Authorship Download and extract ACT2-7.zip Compile and run testFiles('output.csv') We are going to modify two things: • compareTwo function • Write distance matrix to a file CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  18. Determining Authorship Download and extract ACT2-7.zip Compile and run testFiles('output.csv') We are going to modify two things: • compareTwo function • Write distance matrix to a file First, what does the current program do? CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  19. Break PerpetualOcean CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  20. Distance Matrix This matrix looks kind of familiar... CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  21. Distance Matrix myNum = 1 myFile = open('output.csv','w') myFile.write('this is an output file\n') myFile.write(str(myNum)) myFile.write('\n') myFile.close() This matrix looks kind of familiar... Instead of printing to the screen, write it to a file in CSV (comma-separated value) format. CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  22. Distance Matrix myNum = 1 myFile = open('output.csv','w') myFile.write('this is an output file\n') myFile.write(str(myNum)) myFile.write('\n') myFile.close() this is an output file 1 This matrix looks kind of familiar... Instead of printing to the screen, write it to a file in CSV (comma-separated value) format. CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  23. Distance Matrix This matrix looks kind of familiar... Instead of printing to the screen, write it to a file in CSV (comma-separated value) format. Open the CSV file in Excel. Use conditional formatting to look for patterns. CS0931 - Intro. to Comp. for the Humanities and Social Sciences

  24. What’s Your Answer? Discern the Outlier: The one book that is NOT in the series of the others. CS0931 - Intro. to Comp. for the Humanities and Social Sciences

More Related