290 likes | 500 Views
Evaluating Automatically Generated timelines. Roberta Catizone Angelo Dalli Yorick Wilks. Project and Sponsors. University of Sheffield Cronopath project Funded by EPSRC Joint Research Council (UK science agency) 2 year project. Motivation.
E N D
Evaluating Automatically Generated timelines Roberta Catizone Angelo Dalli Yorick Wilks
Project and Sponsors • University of Sheffield • Cronopath project • Funded by EPSRC Joint Research Council (UK science agency) • 2 year project
Motivation • Internet text growing fast (Yahoo claimed to index 20 billion pages in 2005) • Large electronic document collections on the increase (library collections, archived company documents etc) • Popular search engines (Google) use “page authority” as a means for ranking search results. This gives overweights older, established, links. • Presentation of search results is very basic: by list
Motivation • Document facts are usually not presented with any notion of time. • There are many search applications where the timeframe of the key document facts is important. Some examples include searching: • news articles • historical documents • entertainment guides
The Cronopath System • Large Scale Information Retrieval (high Terabyte range) • Fast NLP Algorithms capable of working on a large scale • Cronopath project is developing and testing new techniques • Identifying fresh information • Ordering text chronologically • Generating structure from unstructured text
The Cronopath System • Information Retrieval System on a Document Collection which returns the results of the search query in the form of a timeline, with labels at the appropriate places on the timeline which link to the relevant documents in the collection. • At present the system is tuned to work on queries over Named Entities (people,organisations and locations)
Temporal Information Processing • Tries to assign a date to the document as a whole by • Assigning dates to particular events/facts in the collection. • and • Assigning a date to the creation of the document
Evaluation of Automatically Generated Timelines • Correctness • Does the timeline adequately represent the chosen document’s facts? • Presentation Guidelines
Evaluation: the setup Automatically generated timeline fact1 fact 2 fact 5 fact 3 Document Collection fact 7 fact 4 fact 6
Timeline Correctness • Does the timeline reflect the range of dates in the document collection? • Are the timeline labels/document facts correctly placed on the timeline? • Do the timeline facts correspond/link to the correct document in the collection? • Do the timeline labels reflect the summarised facts in the document collection? • Do the labels used in the timeline represent the significant facts in the document collection?
Timeline Presentation Issues • Balance • Should the timeline be • divided into timeframes that contain roughly the same number of linked documents (leading to timeframes covering different time periods) • or into • evenly divided time periods • Display type (depends on space available) • Horizontal / vertical • Circular • Comparative timelines • Layered (Jensen Visualizing Complex Semantic Timelines) • Web features • Scrollable and expandable • Magnification and miniaturizing: • Maximize detail, minimize the overview • Minimize detail, maximize overview
Web-based Timelines • Hyperhistory • skyscanner • wikipedia
The approach • We propose to use the Wikipedia timelines as a means for evaluating the Cronopath automatically generated timelines • Wikipedia has thousands of hand-craftted timelines • Each Wikipedia timeline has an associated document collection available through hyperlinks
The approach • Run Cronopath over a particular Wikipedia timeline document collection (all the documents linked to by the Wikipepdia timeline) • Automatically generate the timeline for the document collection facts • Compare the automatically generated timeline to the Wikipedia timeline for the same document collection.
What we are comparing • Named Entities in the Cronopath timeline labels with the Named Entities in the Wikipedia timeline • Hyperlinks in the Cronopath timeline with the hyperlinks in the Wikipedia timeline. • Range of dates in the document collection with the range of dates in the timeline.
Example Query: Internal combustion engine • Wilkipedia Document Collection (about 150 documents) • Source:http://en.wikipedia.org/wiki/Internal_combustion • Timeline Links: • http://en.wikipedia.org/wiki/Leonardo_da_Vinci • http://en.wikipedia.org/wiki/Christiaan_Huygens • http://en.wikipedia.org/wiki/Alessandro_Volta • http://ppp.unipv.it/Volta/Pages/eF5struF.html • http://en.wikipedia.org/wiki/Samuel_Morland • http://en.wikipedia.org/wiki/Nicolas_Léonard_Sadi_Carnot • http://en.wikipedia.org/wiki/Samuel_Morey • http://en.wikipedia.org/wiki/Eugenio_Barsanti • http://en.wikipedia.org/wiki/Felice_Matteucci • http://en.wikipedia.org/wiki/Etienne_Lenoir • http://en.wikipedia.org/wiki/Nikolaus_Otto • http://en.wikipedia.org/wiki/Siegfried_Marcus • http://en.wikipedia.org/wiki/Gottlieb_Daimler • http://en.wikipedia.org/wiki/Wilhelm_Maybach • http://en.wikipedia.org/wiki/Karl_Benz • http://en.wikipedia.org/wiki/Rudolf_Diesel • http://en.wikipedia.org/wiki/Samuel_Morland
Extracted facts sample: internal combustion engine • 1824-??-?? • Brown's latter designed an engine that used hydrogen as a fuel-- an early example of an internal combustion engine. It was based on an old Newcomen steam engine, and it had a separate combbustion and working cylinders. He tested the engine by using it to propel a vehicle up Shooter's Hill in 1824. • http://en.wikipedia.org/wiki/Nicolas_Leonard_Sadi_Carnot • 1796-06-01 • 1832-08-24 • Nicolas Leonard Sadi Carnot (June 1, 1796 - August 24, 1832) was a French mathematician and engineer who gave the first successful theoretical account of heat engines, the Carnot cycle, and laid the foundations of the second law of thermodynamics. • http://en.wikipedia.org/wiki/Nicolas_Leonard_Sadi_Carnot • 1712-??-?? • 1712-??-?? • Newcomen had invented the first piston operated steam engine over a century before, in 1712.
Cronopath automatically generated timeline: internal combustion engine Timeline: internal combustion engine
Wikipedia hand-crafted Timeline: internal combustion engine 1509: Leonardo da Vinci described a compression-less engine. (His description may not imply that the idea was original with him or that it was actually built.) 1673: Christiaan Huygens described a compression-less engine. 1780's: Alessandro Volta built a toy electric pistol ([1]) in which an electric spark exploded a mixture of air and hydrogen, firing a cork from the end of the gun. 17th century: English inventor Sir Samuel Morland used gunpowder to drive water pumps. 1794:Robert Street built a compression-less engine whose principle of operation would dominate for nearly a century. 1823: Samuel Brown patented the first internal combustion engine to be applied industrially. It was compression-less and based on what Hardenberg calls the "Leonardo cycle," which, as this name implies, was already out of date at that time. Just as today, early major funding, in an area where standards had not yet been established, went to the best showmen sooner than to the best workers. 1824: Sadi Carnot established the thermodynamic theory of idealized heat engines in France in 1824. This scientifically established the need for compression to increase the difference between the upper and lower working temperatures, but it is not clear that engine designers were aware of this before compression was already commonly used. It may have misled designers who tried to emulate the Carnot cycle in ways that were not useful. 1826 April 1: The American Samuel Morey received a patent for a compression-less "Gas Or Vapor Engine". 1838: a patent was granted to William Barnet (English). This was the first recorded suggestion of in-cylinder compression. He apparently did not realize its advantages, but his cycle would have been a great advance if developed enough. 1854: The Italians Eugenio Barsanti and Felice Matteucci patented the first working efficient internal combustion engine in London (pt. Num. 1072) but did not get into production with it. It was similar in concept to the successful Otto Langen indirect engine, but not so well worked out in detail. 1860: Jean Joseph Etienne Lenoir (1822 - 1900) produced a gas-fired internal combustion engine closely similar in appearance to a horizontal double-acting steam beam engine, with cylinders, pistons, connecting rods, and flywheel in which the gas essentially took the place of the steam. This was the first internal combustion engine to be produced in numbers. His first engine with compression shocked itself apart. 1862:Nikolaus Otto designed an indirect-acting free-piston compression-less engine whose greater efficiency won the support of Langen and then most of the market, which at that time, was mostly for small stationary engines fueled by lighting gas. 1870: In Vienna Siegfried Marcus put the first mobile gasoline engine on a handcart. 1876: Nikolaus Otto working with Gottlieb Daimler and Wilhelm Maybach developed a practical four-stroke cycle (Otto cycle) engine. The German courts, however, did not hold his patent to cover all in-cylinder compression engines or even the four stroke cycle, and after this decision in-cylinder compression became universal. 1879: Karl Benz, working independently, was granted a patent for his internal combustion engine, a reliable two-stroke gas engine, based on Nikolaus Otto's design of the four-stroke engine. Later Benz designed and built his own four-stroke engine that was used in his automobiles, which became the first automobiles in production. 1892: Rudolf Diesel invented the diesel engine. 1893 February 23: Rudolf Diesel received the patent forthe diesel engine. 1896: Karl Benz invented the boxer engine, also known as the horizontally opposed engine, in which the corresponding pistons reach top dead centre at the same time, thus balancing each other in momentum. 1900: Rudolf Diesel demonstrated the diesel engine in the 1900 Exposition Universelle (World's Fair) using peanut oil (see biodiesel).(see biodiesel).ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꁼ➀ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ쇓ꆱ
Hyperlinks • Average number of hyperlinks on Cronopath timeline is 1.5 • Average number of hyperlinks on Wikipedia timeline is 2.5 • Cronopath timeline links only to people • Wikipedia timeline links to people, objects (boxer engine, flywheel), and events (World’s Fair)
Using N-grams for comparison • Comparing Cronopath timeline label n-grams to Wikipedia label n-grams for the timelines as a whole gives a general measure for the term set (for the time line). • A better measure, in progess, will investigate comparing Cronopath label n-grams to Wikipedia label n-grams 1-1 by time point. • A third measure was named-entity matching by time point. • We are assessing the relative values of the three methods against the Wikipedia gold standard.
How did we do? • Does the timeline reflect the range of dates in the smmarized document facts? Yes • Does the timeline reflect the range of dates in the document collection? No • Are the timeline labels/document events correctly placed on the timeline? Yes • Do the timeline events correspond/link to the correct document in the collection?Yes • Do the timeline labels reflect the summarised facts in the document collection? Yes
Observations • The Cronopath timeline omitted 8 people that the Wiki timeline included. • Although the Cronopath and Wikipedia labels were sometimes different (where the timelines overlap), the links associated with the labels were the same in both timelines. • There were fewer words in the Cronopath timeline labels (Wikipedia timeline labels frequently contained multiple sentences)
Observations • The Cronopath automatically generated timelines give preference to events/facts with Named Entities that occur frequently. • This means that less frequently occurring NE are not in the timeline (not included in the list of summarized facts).
Work in progress • We will continue to use Wikipedia to evaluate the automatically generated timelines from the Cronopath system. • We will try to find subject-verb-object triples via skip-grams (Guthrie, Allison, Liu and Guthrie and Wilks, 2006) to compare the timeline labels in the Cronopath and Wikipedia timelines. • We will continue to look at issues of graphic presentation of Web based timelines by taking into consideration good interface design principles accompanied by user satisfaction surveys.