160 likes | 313 Views
Which Log for which Information? Gathering Multilinguality Data from Different Log File Types. Maria Gäde, Vivien Petras, and Juliane Stiller Humboldt-Universität zu Berlin CLEF 2010 Padova, 21 September 2010. Premise.
E N D
Which Log for which Information?Gathering Multilinguality Data from Different Log File Types Maria Gäde, Vivien Petras, and Juliane Stiller Humboldt-Universität zu Berlin CLEF 2010 Padova, 21 September 2010
Premise Assume you are building a multilingual digital library and could log every user action with particular consideration for multilingual activities. • Which questions could one ask? • (Which questions cannot be answered by logging?) Outline: • Europeana • Log file types • Logging multilingual information • Europeana ClickStreamLogger
Europeana • 1,000+ content providers • Portal + APIs • Services September 2010: • 7.8 mio. images • 4.6 mio. texts • 127,000 videos • 68,000 sounds “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” European Parliament, 27 September 2007
Multilingual Europeana • Interface • Search • Browse • Results
Log File Types Example Apache web server log 123.123.123.123 - - [11/Mar/2010:09:42:06 +0100] "GET /cache/image/?uri=http://images.scran.ac.uk/rb/images/ thumb/0098/00980252.jpg&size=BRIEF_DOC&type=IMAGE HTTP/1.0" 200 2843 "http://www.europeana.eu/portal/brief-doc.html?start=1&view=table&query=italy" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)"
Log File Types Example Google Analytics Map overlay (IP address) Languages (system language)
Log File Types – Missing Information • Web server log (Apache) • Interface language missing • Certain actions cannot be distinguished (browse = search) • Ajax / Flash actions (saved searches, tags, filter) • Reconstruct sessions • Search engine log (Solr) • Only queries • Google Analytics • Queries missing
Logging Multilingual Information Stages of the interaction: • Approaching the system / background information • Launching queries / browsing • Viewing results • Interacting with the results (filter, save, tag, repeat) • User background • Interface language • Query language • Query type • Query content • Query translation • Searchresults • Resultsetviews • Resulttranslation • Query reformulation • User-generatedcontent • Savedsearches / docs
Logging Multilingual Information - Background • User background information • Country of access, system language, referrer site • Interface language • Change stronger intervention
Logging Multilingual Information - Query • Query language • Query processing • Adapting languages to system • Query type • Simple, advanced, fielded (e.g. language restriction) • Pre-selected categories for browsing • Query content • Named entities, dates, numbers (language ambiguous) • Query translation
Logging Multilingual Information - Results • Search results • Document languages • Result set views • Detailed view, external click stronger intervention • Result translation
Logging Multilingual Information – User Activities • Query reformulation / refinement • Language switch • Filtering (language), related-item search • User-generated content • Language of tags • Language of documents being tagged • Saved searches / documents • ???
Europeana ClickStreamLogger • Interface language • state + change for every activity • Search • Result numbers, distribution of results by language / country • Filtering and related searches • Browse • Browsing activities + starting points • Navigation • Move outside Europeana • Ajax • Save / remove searches / tags • User management • Account creation etc.
What happens now… • Soft roll-outs of new releases change site • Analysis of log data • Interpretation • Re-iteration of “useful information” categories • Re-design user interaction?