110 likes | 217 Views
Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson Sophia Gansky Chuan Yu. Goal. Create a specialized search engine for events in the Seattle area. Enable search by Location Date and Time Price Category Enable full-text search. The Internet.
E N D
Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson Sophia Gansky Chuan Yu
Goal • Create a specialized search engine for events in the Seattle area. • Enable search by • Location • Date and Time • Price • Category • Enable full-text search
The Internet Nutch Crawler Event Extraction Event Indexing Geocoding Event Database Web Service Event Classification Web Application
Reality • Site-specific extractors used rather than machine learning approach • Classifier is tuned for the events from specifically chosen websites rather than the full web. (overfitted, but in a good way) • Email notifications not implemented. • Less dynamic web interface.
Demo http://amlia.cs.washington.edu:1987/eventfinder/search/
What We Found Surprising • CRF++ limited in extracting attributes • CRF extracts multiple values for one attribute from an event description • If only extract from a paragraph of description, in most cases, some attribute are not contented in. • Use Site-specific structures • Extracted attributes values from corresponding Html tags • No more ambiguity • Can extract all the attributes from the each event page • Javascript's negative impact • Some sites not fully loading after the first Http Request • URLs generated by Javascript functions could not be retrieved
Classifier Experiments • Variables / Results • Number of Categories: 8 • Training Data Source: Crawler Data • Scaled vs. Non-scaled Training Data: Scaled • Single Words vs. Word Pairs as attributes: Single Words Only • Scaled LIBSVM attribute values? Yes • Lower Bound: 0 • Upper Bound: 1 • Weights • URL Words: 50 • Title Words: 15 • Location Words: 10 • Tests (32 Recorded Tests) • Ablation: Turning On/Off features • Tuning: Adjusting Variable Values
Usability Experiment • Survey • Participants went to the website and completed 3 tasks • Task completion and overall feedback • Only three submissions due to server load • Results • All found site navigable • Participants used list view, map view and individual event pages • Some results not relevant
What We Learned • Information Retrieval / Extraction/Indexing: • Gained more experience with regex, Java servlet technology, and working with open-source projects. • Classifier: • Many methods of classification and related variables. Deciding on a classification method and setting variables can be as much of an art form as a science. • Front End: • Frameworks (Django) are very powerful but it takes a long time to learn. Watch your resources!
Breakdown of Work • Justin - Classifier and Database • Jessica and Jenn - Front end • Sophia - Extraction, Nutch, Lucene, Database • Chuan - Extraction