1 / 28

Data Mining with AURA

Data Mining with AURA. Jim Austin University of York & Cybula Ltd. Overview. AURA Background to AURA Brief overview of its components Its implementation AURA within UK e-Science What is e-Science The DAME pilot project Use of AURA in DAME GRID issues in DM. The AURA Technology.

astrid
Download Presentation

Data Mining with AURA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining with AURA Jim Austin University of York & Cybula Ltd

  2. Overview • AURA • Background to AURA • Brief overview of its components • Its implementation • AURA within UK e-Science • What is e-Science • The DAME pilot project • Use of AURA in DAME • GRID issues in DM

  3. The AURA Technology • Neural network based associative storage • Set of tools to build fast pattern recognition systems • Aimed at unstructured data • Aimed at large datasets • Scaleable technology

  4. AURA as a basis for search • The game is to remove the chaff using AURA. • Later processes find the exact match.

  5. The storage system • Correlation Matrix Memory based • Exploits threshold logic methods • Uses distributed encoding of information • Implemented using binary ‘weights’ for efficient software and hardware implementation

  6. weights ( ) M P Inputs Threshold, T R

  7. Why is it fast? • Access only rows that are activated by inputs. • Inputs are made as sparse as possible and fixed weight. • Only need to sum over active rows (bit vectors) – ideal for most processors • Great for bit vector machines (DAP!).

  8. Use of the CMM CMM system Query Data subset Data Slow algorithm Final data

  9. CMM system Pre-process Operations Prepare data Post process CMM system

  10. Pre-processing • Implements a number of pre-processors • N-grams for text strings • CMAC for numeric data • Graphs for images and graphics • Tokens for logical data • Quantisation for time series

  11. Post processing • Data selected by the CMM must be accessed quickly. • Uses ‘best bit index’ method to match output data and recover stored data.

  12. Implementation • The AURA C++ library • Implemented on PC or workstation • Beowulf parallel cluster • Origin 2000 supercomputer • Bespoke hardware

  13. Cortex-1 AURA parallel implementation 28 dedicated PCI based processors Beowulf configuration 3.5Gb memory size

  14. UK eScience • Aims to build on the concept of Grids • To make computing and data provision as direct and simple as electrical power delivery • £110M initiative started 18 months ago • DAME is a £3.5M pilot project to demonstrate its application in the engineering field.

  15. DAME Objectives • DAME: Distributed Aircraft Maintenance Environment. • Demonstrate diagnostic capability on the GRID • Examine timeliness properties of the GRID • Demonstrate on the RR Aeroengine diagnostic problem

  16. Rolls-Royce University of Sheffield, P Fleming. University of Leeds, Peter Dew, Alison McKay. York, J Austin, J McDermid, A Wellings. University of Oxford, Lionel Tarassenko. Rolls-Royce, Derby. Data Systems & Solutions. Cybula Ltd.

  17. Engine flight data London Airport Airline office New York Airport Grid Diagnostics centre Maintenance Centre American data center European data center

  18. Diagnostic issues • The system must analyse and report • Novel engine operation • Identify any cause of events • Do this quickly • Data • Large (many Tb)

  19. Data – Zmod plots

  20. How does AURA contribute • Search technology for multi-media data • Parallel pattern match engine based on neural networks. • Built on Correlation Matrix Memories. • High performance Beowulf and dedicated hardware implementations. • Commercially sold by Cybula Ltd.

  21. Diagnostic station Engine data Novelty indication Quote Data used to identify novelty Data reduction processes Match requests Features Data to be searched for Pattern match results Data stores/ data warehouse Diagnosis AURA-G GRID

  22. CMM Data sample DM coding Simple example of processing chain Matching previous events

  23. Typical pre-processing 01101111011110111 Frequency DM coding (1 up and 0 down) Fast Preserves information Produces a binary vector Time

  24. AURA-G • This is a Globus enabled AURA implementation. • Developed under DAME • Will be available end of 2002 for use in other problems.

  25. AURA-G • Support of scalable pattern matching • Supports distributed search, across multiple CMM engines at different sites • OGSA compliant

  26. Grid Issues in Data Mining • Data provenance • Standards: • Data transparency independent of location • Managing DB/Data mining link in distributed system • OGSA DAI

  27. Conclusions • AURA is a mature component for data search and retrieval • Robust software and hardware implementation available • Applications in e-Science for Grid applications underway

  28. Contacts Jim Austin Dept Computer Science, University of York, York, YO1O 5DD. www.cs.york.ac.uk/arch austin@cs.york.ac.uk 01904 432734 01904 432767 Cybula Ltd. www.cybula.com 01377 236382 DAME : www.cs.york.ac.uk/dame

More Related