200 likes | 323 Views
CS5545 Data Interpretation and Communication. Yaji Sripada Ehud Reiter. Time table. Lectures 2 lectures on Mondays in Meston 311 9:30 -10:30 11:00 -12:00 No lectures in Week 6 and Week 12 Practicals/Tutorials 1 two hour practical/tutorial on Mondays in Meston 311 14:00-16:00.
E N D
CS5545 Data Interpretation and Communication Yaji Sripada Ehud Reiter Dept. of Computing Science, University of Aberdeen
Time table • Lectures • 2 lectures on Mondays in Meston 311 • 9:30 -10:30 • 11:00 -12:00 • No lectures in Week 6 and Week 12 • Practicals/Tutorials • 1 two hour practical/tutorial on Mondays in Meston 311 • 14:00-16:00 Dept. of Computing Science, University of Aberdeen
Assessment • Two components • 25% continuous assessment • 75% end of term exam • Continuous assessment • First assignment • Weight – 12.5% • Issued in Week 5 • Due on the Friday of Week 6 • Second assignment • Weight – 12.5% • Issued in Week 11 • Due on the Thursday of Week 12 Dept. of Computing Science, University of Aberdeen
Course Organization • Three parts • Weeks 1-4 – YS • Week 5-7 – ER • Weeks 9-11 – YS+ER Dept. of Computing Science, University of Aberdeen
Reading • Weeks 1-4 • Mostly lecture notes and some research papers • Week 5-7 • Lecture notes, research papers and • Background: Ehud Reiter and Robert Dale, “Building Natural Language Generation Systems”, Cambridge University Press • Weeks 9-11 • Lecture notes and research papers Dept. of Computing Science, University of Aberdeen
Introduction • Humans have access to large volumes of data in many domains • Scientific • Complete sequence data from Human Genome Project • of 3 billion DNA units • Medical • Physiological data • 10s of parameters such as blood pressure and heart rate measured every second • Engineering • 100s of sensors on a gas turbine taking measurements every second • And many more Dept. of Computing Science, University of Aberdeen
Varying purpose/task • Different people use data for different purposes/tasks • For example, physiological data is used by • Medical staff on the ward to monitor the patient • Medical researchers for scientific explorations • Medical admin staff to archive them in patient records Dept. of Computing Science, University of Aberdeen
Varying abilities/disabilities • Not all humans are equal in using the available data • 1 in 4 adults in the UK has poor numerical skills • 1 in 7 people in the UK suffers from some form of physical disability (such as visual impairment) • Many of us just don’t have the time to use all the data at our disposal • Data from our credit card bills and utility bills • Many of us don’t have the required domain knowledge to interpret the data • Data from medical lab tests such as blood tests Dept. of Computing Science, University of Aberdeen
What we need • Novel computer technology to • (1) analyse and interpret large volumes of data • (2) communicate to us ‘the required’ information suitable to our task/purpose in a way suited to our abilities/disabilities • In this course we study • (1) issues involved in developing such novel technology • (2) currently available techniques to be used as part of the novel technology • (2) study some systems in some limited domains developed using existing technology Dept. of Computing Science, University of Aberdeen
Data Analysis and Interpretation • Data analysis • techniques from several fields are used • Statistics • Medical signal processing • Image processing • Data Mining etc • Issues with reusing data analysis methods • Choosing an algorithm from multiple algorithms available for performing a task may not be easy • Even when we find an algorithm, it may not be the best fit for use in a communication context • In other words, we may have to adapt available data mining algorithms to suit our purpose • Data interpretation • Knowledge based techniques are used • Context dependent • Varies from domain to domain Dept. of Computing Science, University of Aberdeen
Communication • Information can be presented to users either • Graphically – using visualization technology or • Textually – using Natural Language Generation (NLG) technology or • Speech – using text to speech technology or • Combinations of the above • Issues with communication • Visualization • Relatively a mature technology - a large collection of visualization techniques for different kinds of data are available • communicating high dimensional data is hard • Communicating large data sets on low resolution screens is a challenge • NLG • Communicates messages more directly • Effective for communicating over low bandwidths - SMS • Currently being developed – a few success stories in some limited domains Dept. of Computing Science, University of Aberdeen
Accessibility • Communication works • for an intended audience with their associated abilities/disabilities • with an intended task/purpose • Therefore communication should be sensitive to different users with different abilities and purpose Dept. of Computing Science, University of Aberdeen
System Building Life cycle • Several Iterations of the following phases • Knowledge Acquisition (requirements collection and analysis) • System design • Implementation • Evaluation • Differs from the normal software development life cycle • Poorly understood requirements • System design ideas still under research • Evaluation ideas too still under research Dept. of Computing Science, University of Aberdeen
Lectures 3 Parts Part 1 Data Analysis & Interpretation Basic Statistics Data analysis - Trend and pattern detection Part 2 Data Communication Visualization NLG Accessibility Part 3 Real World Applications Practicals Part 1 Basic data analysis techniques using Excel Trend and pattern detection in time series and spatial data Visualization of time series and spatial data Part 2 Document planning for summarising time series data Micro-planning for summarising time series data Course Organization – in detail Dept. of Computing Science, University of Aberdeen
In our department • Many projects aim to develop technology for • “data interpretation and communication” • It is one of the three research themes in the department • Projects • SumTime – Summarising Time Series Data • RoadSafe – Automatically generating advisory text for road maintenance vehicle routing – new project • BabyTalk – Generating textual summaries of clinical temporal data – new project • ScubaText – Generating textual reports of Scuba dive computer data • Atlas.txt – Generating textual reports of Census data for visually impaired people Dept. of Computing Science, University of Aberdeen
Example 1: SumTime-Mousam • Software developed in the department - as part of the SumTime project • Task – Automatically generates weather forecast texts in English • Input – Numerical Weather Prediction (NWP) Data – output of weather simulation software • Output – English text delivered • As an ascii file to the client • In the spoken form over a telephone line • As a text message over a mobile line (currently explored) • Operationally deployed at a weather services company in Aberdeen • Produces around 150 draft forecasts/day • Produces text in some ways better than human authors Dept. of Computing Science, University of Aberdeen
SumTime-Mousam (2) • SumTime technology • (1) Analyses NWP data Using segmentation techniques developed in the time series data mining community • (2) automatically produces the English forecast text using Natural Language Generation (NLG) technology • Majority of SumTime output texts used by oil company staff supporting oilrigs in the North Sea • Can we produce weather forecasts for a different purpose/task – say for hill climbers? • In this course, we study how data analysis/interpretation and its communication (presentation) vary with the end-user task/purpose. Dept. of Computing Science, University of Aberdeen
Example 2: GIS • Technology to store, retrieve, analyse and visualize spatial data on geographic maps • Plot delivery routes on street maps to a level of detail pinpointing even the locations of manholes and speed cameras • Plot census data such as residents’ ages, gender, income etc on country or regional maps for businesses to target their customers Dept. of Computing Science, University of Aberdeen
GIS (2) • GIS technology • (1) Analyses/interprets spatial data • (2) presents spatial data in the form of visual maps • Great for sighted users, but useless for visually impaired users • In this course, we study technology not just based on ‘what it does’, but also based on ‘to whom it does’. • Accessibility issues Dept. of Computing Science, University of Aberdeen
Summary • You learn novel technology to • Analyse and interpret large data sets by adapting data analysis techniques developed in other fields • Communicate (present) relevant information to different users with different tasks and abilities. • Relevant to E-technologies • All modern organizations • possess large volumes of data and • Communicate information to different stakeholders Dept. of Computing Science, University of Aberdeen