230 likes | 323 Views
ADaM System Architecture. Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific Data Mining IPAM January 14-18, 2002 Information Technology and Systems Center University of Alabama in Huntsville rramachandran@itsc.uah.edu. Talk Overview.
E N D
ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific Data Mining IPAM January 14-18, 2002 Information Technology and Systems Center University of Alabama in Huntsville rramachandran@itsc.uah.edu ITSC/University of Alabama in Huntsville
Talk Overview • Mining System Requirements • ADaM System Architecture • ADaM Plan Builder • Research directions ITSC/University of Alabama in Huntsville
Mining System Requirements: When,Where and Who • WHERE • User Workstation • Data Archive Center • Data Mining Center • WHEN • Real Time • On-Ingest • On-Demand • Repeatedly • WHO • Casual Users • Domain Experts • Mining Experts Data Mining ITSC/University of Alabama in Huntsville
Algorithm Development and Mining (ADaM) System • ADaM system developed under NASA research grant • The system provides knowledge discovery, feature detection and content-based searching for data values, as well as for metadata. • It contains over 120 different operations to be performed on the input data stream. • Operations vary from specialized atmospheric science data-set specific algorithms to different digital image processing techniques, processing modules for automatic pattern recognition, machine perception, neural networks and genetic algorithms. ITSC/University of Alabama in Huntsville
ADaM Features • Handles science data set variability • Multiple resolution/multiple scales • Variability of formats • Granularity of data • Includes spatial/temporal dimensions • Allows addition of new algorithms • Allow scientists to select and sequence different operations ITSC/University of Alabama in Huntsville
Input Output HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others... GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others... ADaM Engine Architecture Preprocessed Data Patterns/ Models Results Data Translated Data Processing Preprocessing Analysis Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others... Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others... ITSC/University of Alabama in Huntsville
Distributed Clients Web-based Workstation based Other Systems Analysis/Vis Tools Mining Engine (ADaM) Common Client API Input Modules Analysis Modules Output Modules Knowledge Base Data Stores ADaM Mining Environment Data Mining Server Mining Results Event/ Relationship Search System ITSC/University of Alabama in Huntsville
ADaM Architecture ITSC/University of Alabama in Huntsville
ADaM Miner Engine • Manages the processing of data through a series of specified operations • Loads input, processing and output modules dynamically as needed at execution time • Allows for the addition of newly developed modules without the need to rebuild the engine • Interprets a mining plan script that provides the details about specified operations and the order that they should be executed ITSC/University of Alabama in Huntsville
ADaM Miner Database • Used to store information that includes the names, locations and related metadata for input data sets available on the server • Includes information about users, jobs, mining results, and other related information • Simple relational database ITSC/University of Alabama in Huntsville
ADaM Daemon and Scheduler • Scheduler • Examines the list of jobs to be executed on the server and determines which job or jobs to execute at any given time • Queues the requests and executes them sequentially. • Daemon • Handles all network communications with the mining system • Is configured to listen on a specific port for any socket communications ITSC/University of Alabama in Huntsville
ADaM Input/Operation Filters • Input/Output Filters are data readers and writers • Operations are the algorithms • Each of the operations and (input/output) filters is implemented as a shared library • New modules may be added to the system without recompiling or relinking. • All operations/filters either produce or operate on a data collection, which provides a common format for representing scientific data. ITSC/University of Alabama in Huntsville
General Mining Steps • Select data files to be mined • “Check-In” the data files into the Miner Database • Write a “Mining Plan” consisting of sequence of input filter and operations • Execute the Mining Plan using the engine • Check and save results • Iterate ITSC/University of Alabama in Huntsville
What is Check-In? • Process of encoding information such as the names, locations and related metadata for input data sets available on the server • Create complex data hierarchy in the database ITSC/University of Alabama in Huntsville
ADaM Plan Builder: Check-In • Two Modes of Operation • General: which only requires • minimal information • Advanced: requires more • detailed information and • Allows user to set up • structured database Path to the data file Data file name Input Filter associated with the Data file Load an XML file containing existing Check-In specifications ITSC/University of Alabama in Huntsville
ADaM Plan Builder – Layout Operation Menu contains the list of operations one can select Input Menu contains the list of Input Filters one can select • Plan Menu allows one to: • Select a new plan • Load existing plan • Check-In data ITSC/University of Alabama in Huntsville
ADaM Plan Builder – Layout Panel where Mining Plan can be viewed either as text or a tree ITSC/University of Alabama in Huntsville
ADaM Plan Builder – Layout Description about the Operation/Input Filter can be viewed in this panel ITSC/University of Alabama in Huntsville
ADaM Plan Builder – Layout All the parameters needed for the Operation are described here ITSC/University of Alabama in Huntsville
ADaM Plan Builder – Layout Sample values for Operation’s parameters are shown in this panel ITSC/University of Alabama in Huntsville
ADaM Plan Builder – Layout Allows user to select the operation and add it to the Mining Plan Go Mine the data using the Mining Plan ITSC/University of Alabama in Huntsville
Research Directions • Generic Data Reader for ADaM • ESML – Earth Science Markup Language • Programmers Guide for ADaM • Distributed Mining • Grid Mining • Successful implementation and testing of the ADaM system on the NASA Information Power Grid • Mining Onboard the Space Craft • The EnVironmEnt for On-Board Processing (EVE) system ITSC/University of Alabama in Huntsville
ADaM Information • Web site: • datamining.itsc.uah.edu • ADaM Lite beta version download • Contact: rramachandran@itsc.uah.edu ITSC/University of Alabama in Huntsville