1 / 20

CS5545 Data Interpretation and Communication

CS5545 Data Interpretation and Communication. Yaji Sripada Ehud Reiter. Time table. Lectures 2 lectures on Mondays in Meston 311 9:30 -10:30 11:00 -12:00 No lectures in Week 6 and Week 12 Practicals/Tutorials 1 two hour practical/tutorial on Mondays in Meston 311 14:00-16:00.

aspen
Download Presentation

CS5545 Data Interpretation and Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS5545 Data Interpretation and Communication Yaji Sripada Ehud Reiter Dept. of Computing Science, University of Aberdeen

  2. Time table • Lectures • 2 lectures on Mondays in Meston 311 • 9:30 -10:30 • 11:00 -12:00 • No lectures in Week 6 and Week 12 • Practicals/Tutorials • 1 two hour practical/tutorial on Mondays in Meston 311 • 14:00-16:00 Dept. of Computing Science, University of Aberdeen

  3. Assessment • Two components • 25% continuous assessment • 75% end of term exam • Continuous assessment • First assignment • Weight – 12.5% • Issued in Week 5 • Due on the Friday of Week 6 • Second assignment • Weight – 12.5% • Issued in Week 11 • Due on the Thursday of Week 12 Dept. of Computing Science, University of Aberdeen

  4. Course Organization • Three parts • Weeks 1-4 – YS • Week 5-7 – ER • Weeks 9-11 – YS+ER Dept. of Computing Science, University of Aberdeen

  5. Reading • Weeks 1-4 • Mostly lecture notes and some research papers • Week 5-7 • Lecture notes, research papers and • Background: Ehud Reiter and Robert Dale, “Building Natural Language Generation Systems”, Cambridge University Press • Weeks 9-11 • Lecture notes and research papers Dept. of Computing Science, University of Aberdeen

  6. Introduction • Humans have access to large volumes of data in many domains • Scientific • Complete sequence data from Human Genome Project • of 3 billion DNA units • Medical • Physiological data • 10s of parameters such as blood pressure and heart rate measured every second • Engineering • 100s of sensors on a gas turbine taking measurements every second • And many more Dept. of Computing Science, University of Aberdeen

  7. Varying purpose/task • Different people use data for different purposes/tasks • For example, physiological data is used by • Medical staff on the ward to monitor the patient • Medical researchers for scientific explorations • Medical admin staff to archive them in patient records Dept. of Computing Science, University of Aberdeen

  8. Varying abilities/disabilities • Not all humans are equal in using the available data • 1 in 4 adults in the UK has poor numerical skills • 1 in 7 people in the UK suffers from some form of physical disability (such as visual impairment) • Many of us just don’t have the time to use all the data at our disposal • Data from our credit card bills and utility bills • Many of us don’t have the required domain knowledge to interpret the data • Data from medical lab tests such as blood tests Dept. of Computing Science, University of Aberdeen

  9. What we need • Novel computer technology to • (1) analyse and interpret large volumes of data • (2) communicate to us ‘the required’ information suitable to our task/purpose in a way suited to our abilities/disabilities • In this course we study • (1) issues involved in developing such novel technology • (2) currently available techniques to be used as part of the novel technology • (2) study some systems in some limited domains developed using existing technology Dept. of Computing Science, University of Aberdeen

  10. Data Analysis and Interpretation • Data analysis • techniques from several fields are used • Statistics • Medical signal processing • Image processing • Data Mining etc • Issues with reusing data analysis methods • Choosing an algorithm from multiple algorithms available for performing a task may not be easy • Even when we find an algorithm, it may not be the best fit for use in a communication context • In other words, we may have to adapt available data mining algorithms to suit our purpose • Data interpretation • Knowledge based techniques are used • Context dependent • Varies from domain to domain Dept. of Computing Science, University of Aberdeen

  11. Communication • Information can be presented to users either • Graphically – using visualization technology or • Textually – using Natural Language Generation (NLG) technology or • Speech – using text to speech technology or • Combinations of the above • Issues with communication • Visualization • Relatively a mature technology - a large collection of visualization techniques for different kinds of data are available • communicating high dimensional data is hard • Communicating large data sets on low resolution screens is a challenge • NLG • Communicates messages more directly • Effective for communicating over low bandwidths - SMS • Currently being developed – a few success stories in some limited domains Dept. of Computing Science, University of Aberdeen

  12. Accessibility • Communication works • for an intended audience with their associated abilities/disabilities • with an intended task/purpose • Therefore communication should be sensitive to different users with different abilities and purpose Dept. of Computing Science, University of Aberdeen

  13. System Building Life cycle • Several Iterations of the following phases • Knowledge Acquisition (requirements collection and analysis) • System design • Implementation • Evaluation • Differs from the normal software development life cycle • Poorly understood requirements • System design ideas still under research • Evaluation ideas too still under research Dept. of Computing Science, University of Aberdeen

  14. Lectures 3 Parts Part 1 Data Analysis & Interpretation Basic Statistics Data analysis - Trend and pattern detection Part 2 Data Communication Visualization NLG Accessibility Part 3 Real World Applications Practicals Part 1 Basic data analysis techniques using Excel Trend and pattern detection in time series and spatial data Visualization of time series and spatial data Part 2 Document planning for summarising time series data Micro-planning for summarising time series data Course Organization – in detail Dept. of Computing Science, University of Aberdeen

  15. In our department • Many projects aim to develop technology for • “data interpretation and communication” • It is one of the three research themes in the department • Projects • SumTime – Summarising Time Series Data • RoadSafe – Automatically generating advisory text for road maintenance vehicle routing – new project • BabyTalk – Generating textual summaries of clinical temporal data – new project • ScubaText – Generating textual reports of Scuba dive computer data • Atlas.txt – Generating textual reports of Census data for visually impaired people Dept. of Computing Science, University of Aberdeen

  16. Example 1: SumTime-Mousam • Software developed in the department - as part of the SumTime project • Task – Automatically generates weather forecast texts in English • Input – Numerical Weather Prediction (NWP) Data – output of weather simulation software • Output – English text delivered • As an ascii file to the client • In the spoken form over a telephone line • As a text message over a mobile line (currently explored) • Operationally deployed at a weather services company in Aberdeen • Produces around 150 draft forecasts/day • Produces text in some ways better than human authors Dept. of Computing Science, University of Aberdeen

  17. SumTime-Mousam (2) • SumTime technology • (1) Analyses NWP data Using segmentation techniques developed in the time series data mining community • (2) automatically produces the English forecast text using Natural Language Generation (NLG) technology • Majority of SumTime output texts used by oil company staff supporting oilrigs in the North Sea • Can we produce weather forecasts for a different purpose/task – say for hill climbers? • In this course, we study how data analysis/interpretation and its communication (presentation) vary with the end-user task/purpose. Dept. of Computing Science, University of Aberdeen

  18. Example 2: GIS • Technology to store, retrieve, analyse and visualize spatial data on geographic maps • Plot delivery routes on street maps to a level of detail pinpointing even the locations of manholes and speed cameras • Plot census data such as residents’ ages, gender, income etc on country or regional maps for businesses to target their customers Dept. of Computing Science, University of Aberdeen

  19. GIS (2) • GIS technology • (1) Analyses/interprets spatial data • (2) presents spatial data in the form of visual maps • Great for sighted users, but useless for visually impaired users • In this course, we study technology not just based on ‘what it does’, but also based on ‘to whom it does’. • Accessibility issues Dept. of Computing Science, University of Aberdeen

  20. Summary • You learn novel technology to • Analyse and interpret large data sets by adapting data analysis techniques developed in other fields • Communicate (present) relevant information to different users with different tasks and abilities. • Relevant to E-technologies • All modern organizations • possess large volumes of data and • Communicate information to different stakeholders Dept. of Computing Science, University of Aberdeen

More Related