370 likes | 569 Views
Part 3 Real World Applications: SumTime-Mousam. In this lecture you learn. SumTime-Mousam Knowledge acquisition Design Document planning Microplanning realization Evaluation Post-edit End-user. Introduction. So far we studied Data analysis techniques Time series data Spatial data
E N D
In this lecture you learn • SumTime-Mousam • Knowledge acquisition • Design • Document planning • Microplanning • realization • Evaluation • Post-edit • End-user Dept. of Computing Science, University of Aberdeen
Introduction • So far we studied • Data analysis techniques • Time series data • Spatial data • Visualization techniques • NLG techniques • Now we will study • SumTime-Mousam • a weather forecast text generation system • HCE 3.0 • a visual knowledge discovery tool Dept. of Computing Science, University of Aberdeen
SumTime-Mousam • NLG system that automates the task of writing weather forecasts • Developed in our department • Input:Numerical Weather Prediction (NWP) data • Data samples for a few dozens of parameters every hour/3 hour from two NWP models • Output: marine forecasts - forecasts for offshore oilrig applications • Has been used by our industrial collaborator since June 2002. • Forecasts for 150 locations per day Dept. of Computing Science, University of Aberdeen
Example Dept. of Computing Science, University of Aberdeen
Example Dept. of Computing Science, University of Aberdeen
Knowledge Acquisition (KA) • KA Tasks • Think aloud sessions • Direct Acquisition of knowledge • Onsite Observations • Corpus analysis • Collaborative prototype development Dept. of Computing Science, University of Aberdeen
Corpus Description • SumTime-Meteo - parallel Text-Data Corpus • Size - 1045 parallel Text-Data units • Unit • NWP Model Data • Human Written Forecast Text • Similar in concept to statistical MT (Machine Translation) • Naturally Occurring • written for oilrig staff in the North Sea • Distribution of the Corpus • Available in the public domain Dept. of Computing Science, University of Aberdeen
Parallel Text - Data WSW 10-15 increasing 17-22 by early morning, then gradually easing 9-14 by midnight. Dept. of Computing Science, University of Aberdeen
Corpus Analyses • Meanings of Time phrases • Meanings of time phrases in terms of numerical data • required for lexical choice in summarization • No standard time phrase mappings exist • Numerical time values not mentioned in forecasts Dept. of Computing Science, University of Aberdeen
Alignment • Step 1 • Parsing the forecast texts • parser tuned for forecast text syntax • break the text into phrases • extract information such as wind speed and wind direction • parser carried forward values for the missing fields (shown later in the example) Dept. of Computing Science, University of Aberdeen
Example SSW 12-16 BACKING ESE 16-20 IN THE MORNING, BACKING NE EARLY AFTERNOON THEN NNW 24-28 LATE EVENING Dept. of Computing Science, University of Aberdeen
Alignment (2) • Step 2 • Associate each phrase with an entry in the input data set • 43% of the phrases matched with a single entry (without ambiguity) • heuristics used for improving the accuracy of alignment to 70% • Further improvements in alignment under investigation Dept. of Computing Science, University of Aberdeen
Example (2) Example Phrase VEERING SW 10-14 BY EVENING Input Data 1800 SW By evening ---------> 1800 hours Example Phrase BACKING ESE 16-20 IN THE MORNING Input Data 0600 ESE 18 0900 ESE 16 In the morning -------------> 0600 hours Dept. of Computing Science, University of Aberdeen
Results Dept. of Computing Science, University of Aberdeen
Limitations of Corpus Analysis • Quality of knowledge acquired • good in some cases • poor in many cases • required clarifications from experts • Useful when used along with other KA techniques Dept. of Computing Science, University of Aberdeen
KA Methodology Directly Ask Experts for Knowledge Initial Prototype Structured KA with Experts Corpus Analysis Initial Version of Full System Expert Revision Final System Dept. of Computing Science, University of Aberdeen
Output Text Input Data Doc. Planning Micro Planning Realisation SumTime-Mousam:Architecture • Document planning • content selection and organisation • Microplanning • selecting words and phrases • ellipsis • Realisation • output text using the words and phrases by applying grammar rules • Control Data • derived from end user profile Control Data Dept. of Computing Science, University of Aberdeen
Content Selection • What data items are worth picking up for the summary? • Reasoning from first principles - no detailed user model • Reusing data analysis techniques used by KDD community • Attractive • but not developed for communication • Adapting data analysis techniques to suit needs of communication using the Gricean Maxims Dept. of Computing Science, University of Aberdeen
Data Analysis • Expert’s View • Step Method • Report changes above thresholds (Significant changes) • Corpus View • Segmentation Method • Report changes in Slopes/ report trends Dept. of Computing Science, University of Aberdeen
Example MAGNUS / THISTLE / NW HUTTON, EAST OF SHETLAND day hour wind dir wind speed (Knots) 20-1-01 6 S 4 20-1-01 9 S 6 20-1-01 12 S 7 20-1-01 15 S 10 20-1-01 18 S 12 20-1-01 21 S 16 21-1-01 0 S 18 FORECAST FOR 06-24 GMT, 20- Jan 2001: S 02-06 INCREASING 16-20 BY EVENING Dept. of Computing Science, University of Aberdeen
Expert’s View-Step Model S 3-8 INCREASING 8-13 BY AFTERNOON AND 13-18 BY EVENING. Dept. of Computing Science, University of Aberdeen
Corpus View-Segmentation Model S 3-8 INCREASING 15-20 BY MIDNIGHT. Dept. of Computing Science, University of Aberdeen
Gricean Maxims (Grice 1975) • Maxim of Quality: Try to make your contribution one that is true. More specifically: • Do not say what you believe to be false. • Do not say that for which you lack adequate evidence. • Maxim of Quantity: • Make your contribution as informative as is required (for the current purposes of the exchange). • Do not make your contribution more informative than is required. • Maxim of Relevance: Be relevant. • Maxim of Manner: Be perspicuous. More specifically: • Avoid obscurity of expression. -Avoid ambiguity. • Be brief. -Be orderly. Dept. of Computing Science, University of Aberdeen
Application of Gricean Maxims - Example • Maxim of Quality • Try to report true values from the input data • Use linear interpolation instead of linear segmentation • Uncertainty in the input data needs to be communicated to the user Dept. of Computing Science, University of Aberdeen
Sample Data Dept. of Computing Science, University of Aberdeen
Linear Regression Vs Linear Interpolation Dept. of Computing Science, University of Aberdeen
Linear Regression Vs Linear Interpolation (2) • Linear Regression • S 03-07 INCREASING 16-20 BY MIDNIGHT • Linear Interpolation • S 06-10 INCREASING 18-22 BY MIDNIGHT • Human Written Forecast • S 06-10 INCREASING 18-22 BY MIDNIGHT • Although visually linear regression looks better forecasters do not use it. • Uncertainty • Speed values are mentioned as ranges e.g. 06-07 & 18-22 Dept. of Computing Science, University of Aberdeen
Intrinsic Evaluation of content determination • Metrics • Short - Size (Accessibility) • Accurate - Error (Informativeness) • Size Computation • measured at the conceptual level • number of wind states • Error Computation • Vertical distance from the line of approximation • combined error in wind speed and wind direction • normalized Dept. of Computing Science, University of Aberdeen
Results of Evaluation • Segmentation produces shorter summaries without losing accuracy • Details • 16.5% of cases segmentation is better than step in both size and error • 0.56% of cases the step method is better than segmentation in both size and error • 2.5% of cases segmentation is better then step error wise but worse size wise • 32% of cases segmentation is better then step size wise but worse error wise • 31% of cases segmentation is better than step error wise but equal size wise Dept. of Computing Science, University of Aberdeen
Micro-planning & Realization • Based on Parallel corpus analysis (described earlier) and • Expert KA/Revision • Details in Papers at • www.csd.abdn.ac.uk/research/sumtime/papers.html Dept. of Computing Science, University of Aberdeen
Marfors Data Editor NWP Data Text 1 Data 1 SumTime-Mousam Marfors Data Editor Pre-edited Text Edited Data SumTime_Mousam Marfors Text Editor Post-edited Text SumTime-Mousam at Weathernews (UK) Ltd. Dept. of Computing Science, University of Aberdeen
Post-edit Evaluation • Total number of forecasts analysed = 2728 • 2728 texts divided into 73041 phrases • 7608 (10%) phrases could not be aligned • Alignment failures imply that forecasters are not happy with our content determination • Which is dependent on a process called segmentation • Forecasters seem to perform more sophisticated reasoning than simple segmentation Dept. of Computing Science, University of Aberdeen
Analysis results (1) • Out of the successfully aligned phrases • 43914 phrases matched perfectly • 21519 phrases are mismatches • Detailed analysis of the mismatches Dept. of Computing Science, University of Aberdeen
The bar chart shows the detailed analysis of the mismatched phrases The pie chart shows the results of phrase level comparisons Analysis Results (2) Dept. of Computing Science, University of Aberdeen
End-user Evaluation • 73 End-users (oil company staff supporting offshore oilrigs) participated in this evaluation • used forecasts produced by the following three methods • human written weather forecasts • SumTime-Mousam generated weather forecasts • SumTime-Mousam expressing Human select content • Each participant completed a questionnaire that has two parts • Part 1 • forecast produced by one of the above three methods (anonymous) • Participant is required to answer comprehension questions based on the forecast • Part 2 • showed any two forecasts from the above three methods (anonymous) • Participant specified his/her preference for one of the two forecasts • The main result • end-users consider the SumTime-Mousam generated output linguistically better than human written forecasts • Content of SumTime-Mousam is not as good as human selected content Dept. of Computing Science, University of Aberdeen
Conclusion • SumTime-Mousam is the result of knowledge obtained from • several knowledge acquisition studies • Expert based • Corpus based • Several evaluation studies • Intrinsic evaluation • Post-edit evaluation • End-user evaluation • The development of SumTime-Mousam went through many cycles • Building novel technology requires iterative approach with multiple KA and evaluation studies Dept. of Computing Science, University of Aberdeen