130 likes | 153 Views
Explore challenges faced in coding and instructions for a community-based summarizer project. Propose final project ideas and guidelines for successful implementation. Offer options like blog analysis, best-paper award accuracy analysis, and more. Encourage teamwork and creative exploration in natural language processing projects.
E N D
I256: Applied Natural Language Processing Marti Hearst October 18, 2006
Community-based Summarizer • Results on training data with cross-validation?
Community-based Summarizer • Results on test data:
Problems with Community Code • Not reading the instructions: • Hardcoding directory paths • Hardcoding filenames of testing files • Here is an easy way to do it generally: import os files = os.listdir(“dirname”) • So the code should take two parameters: • Directory name containing the documents • Filename in which to write the output
Problems with Community Code • Not reading the instructions: • Hardcoding directory paths within the code • Hardcoding filenames of testing files • Here is an easy way to do it generally: import os files = os.listdir(“dirname”) • So the code should take two parameters: • Directory name containing the documents • Filename in which to write the output
Problems with Community Code • What I did wrong: • Had said in class that the files should be self-contained but didn’t put that into the assignment description. • Should have said explicitly that you should take as input a directory name and an output filename. • Should have made an easy way to indicate if external files were needed, and what they were. • Should have added another task: analyze the individual features contribution.
Final Projects • I’d like proposals in two weeks (Nov 1) • Gives me a week to give you feedback • We’ll spend about 5 weeks on the projects • I want to give you one or two more homeworks • Class presentations the week of Dec 5, but projects due the following week • You can work in teams of 2 (maybe 3, depends on the project)
Final Project Ideas • Blog analysis • Categorize blog topics (maybe including link analysis) • Segment blogs into pieces based on topics • Do blog author analysis • Summarize blog reaction to some event, e.g., what did people think of “An Inconvenient Truth” • There is a contest on this: • http://www.icwsm.org/ • Do analysis as input for an interesting viz: • http://benfry.com/linking/
Final Project Ideas • Analyze the accuracy of best-paper awards* • Often given out for conferences • How prescient are these awards?
Final Project Ideas • Create a Negativity/Emotion/Flame Recognizer • There is some related work, but this is somewhat under-explored
Final Project Ideas • Improve an Automatic Faceted Hierarchy Creation Tool* • Students used this two years ago for making a hierarchy for photo text • Sample output on two collections: • http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/recipes-automated/Flamenco • http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/recipes-automated/Flamenco
Final Project Ideas • Analyze profiles for online dating* • Use characteristics from social psychology to score them • Use other metrics as well.
Final Project Ideas • Work on a timeline comparison project • One idea: use output of the new Google news archive • Create input for a visualizer built by students last semester: • http://www2.sims.berkeley.edu/courses/is247/f05/projects/timelinecompare/