10 likes | 109 Views
Pattern Discovery Tools for Large Astronomical Surveys. Tin Kam Ho Bell Labs, Lucent Technologies tkh@research.bell-labs.com. in collaboration with David Wittman, J. Anthony Tyson University of California, Davis Samuel Carliles, Wil O'Mullane, Alex Szalay Johns Hopkins University.
E N D
Pattern Discovery Tools for Large Astronomical Surveys Tin Kam HoBell Labs, Lucent Technologiestkh@research.bell-labs.com in collaboration with David Wittman, J. Anthony Tyson University of California, Davis Samuel Carliles, Wil O'Mullane, Alex Szalay Johns Hopkins University Mirage web site: http://www.cs.bell-labs.com/who/tkh/mirage VO interface: http://skyservice.pha.jhu.edu/develop/vo/mirage Mirage (in public release since 2002) is a prototype of an analysis tool that supports pattern discovery across multi-typed data. Mirageis a Java-based tool that is organized around a command interpreter which receives action commands from textual input or a graphical user interface. The action commands are for loading data, incremental import of new entries and new attributes, simple attribute manipulation, and activating several embedded classification routines. The most important functionalities are built on simultaneous visualization of raw image data, extracted feature vectors, and classification results. The graphical display presents a stack of canvas pages. Each page can be subdivided arbitrarily, via horizontal or vertical splits, into rectangular cells. Each cell can be loaded with any particular data view module via simple drag-and-drop operations. Each module provides its own control commands to manipulate the specific method of data presentation. In addition, all view modules implement the same Java Interface "ActivePanel", which contains the following commands that, when coupled with view-specific operations, support very powerful exploration operations: getSelected() clearSelected() highlightDataEntry() colorDataEntry() clearHighlights() clearColors() changeToMonochrome() changeToColor() Early results from various uses of Mirage have been very encouraging. We have plans to refine and generalize the ideas experimented in the software, towards a more versatile tool suitable for supporting more advanced analysis of large-scale imaging databases featured in next-generation astronomical surveys. • Many large-scale sky surveys are generating data at a rate far beyond reach by traditional manual analysis. This trend is accelerating: in the near future, the Large Synoptic Survey Telescope (LSST) (http://www.lsst.org/lsst_home.shtml) will repeatedly image the entire sky visible from its site, at multiple wavelengths, producing a time-tagged imaging database of 20 petabytes and a corresponding event catalog of 150 TB, with parameters of position, time, intensity, colors, and motion. • Besides much increased data volume, databases are no more collected for a single well-defined purpose, with filters and detectors optimized for known features. Paradigm-shifting discoveries of unexpected events or correlations often result from open-ended explorations. This requires a tool which not only enables detection of the unexpected, but rapid exploration and visualization of the new phenomenon to determine if it is scientifically valuable, or a previously unidentified systematic error. • Challenges for the Analysis Tool • Versatile visualization utilities allowing many perspectives • Visualization can help verify correctness of preprocessing steps, clean up undesirable artifacts, choose relevant samples, spot explicit patterns, select useful features, and suggest algorithms and models. To support all these needs, flexibility in the choice of perspectives is critical. Moreover, a connecting architecture is needed such that data relationship can be easily tracked between different views of the data. • Support for exploratory discovery across diverse data types • Astronomical surveys contain multiple data types and incomparable groups of variables. Examples are images, spectra, light curves, and various scalar or vector parameters derived from the raw data. Relationships uncovered in each data type need to be correlated with those from others. This requires tools for modeling, building index structures, and navigation of data distributions in each data type, and methods for tracking correlations between different navigation paths. • Integration of manual and automatic pattern recognition methods • Human judgement needs to be part of the analysis loop to apply proper domain expertise. Automatic pattern recognition algorithms can process large data volumes efficiently, objectively, and consistently. They can also complement deficiencies in manual explorations due to unreliable human intuition or inability to comprehend high-dimensional vectors. But "stand-alone" algorithms are not enough. A convenient bridge is needed to connect between manual and automatic exploration tools. This includes support for rapid examination of different sampling options and feature choices, algorithmic alternatives and parameters, and facilities for checking the results for validity and interpretation, in contexts of different levels of abstraction from the raw data. • And a good tool should: • -- leverage existing visualization and analysis methods, • - enable continued growth by addition of new visualization or analysis tools, • - support interface with existing databases access tools, • - be scalable in data volume and processing speed. Mirage features: Data Visualization in Multiple, Linked Views: Show patterns in histograms, scatter plots, parallel coordinates, tables, images Selection and Tracking: Select points in any view, broadcast to all others with highlights or colors Systematic Traversal of Data Structures: Walk in histograms, cluster graphs or trees, echo in all other views Flexible Graphics Utilities: Open multiple-page plots easily with arbitrary configuration Command Scripts: Run prepared groups of operations as animations Remote Database Access: Retrieve data for analysis over WWW; VO data access via IVOA client package Work in progress: Images: FITS image panel with World Coordinates support using JSky package; Array of image panels with synchronized zooming and panning; Panel for overlay of multiple images and object markers Analysis: Connection to external libraries for automatic pattern recognition; Data structures for high-dimensional spaces Database: Join among different datasets on arbitrary common keys (e.g. RA, DEC); Coupling with VO access methods