Data Intensive Computing

Data Intensive Computing • Graph algorithms for irregular, unstructured data • John Feo, Pacific Northwest National Laboratory • graph500 and data-intensive computing • Richard Murphy, Sandia National Laboratories • Large-scale knowledge discovery • Steve Reinhardt, Microsoft • Data intensive computing at SNL • Andrew Wilson, Sandia National Laboratories • IBM's InfoSphere Streams • Roger Rea, IBM • Graph500 and Data Intensive HPC • Richard Murphy, Sandia National Laboratory • Data Analytics • Phillip Morris, Platform Computing • Data Intensive Computing • Richard Altmaier, Intel

1. Please provide a definition for "Data Intensive Computing". Please explain the difference between "finding a needle in a haystack" and "knowledge discovery".

2. "Data Intensive Computing" generally involves analyses of non-numeric data and the number of combinatorial possibilities grows rapidly. The objective of the analysis is to find in the data, meaningful relationships. How do we (a) test for convergence when not evaluating all possible combinations and (b) test for statistical significance--when the data is non-numeric?

3. "Data Intensive Computing" often involves the use of incomplete data. How does this affect the analysis process?

4. If you could design an ideal computing architecture for Data Intensive Computing, what would it look like?

Data Intensive Computing

Data Intensive Computing

Presentation Transcript

Data-Intensive Distributed Computing

Data-intensive Computing Algorithms: Classification

Data-Intensive Computing

Data Intensive Biomedical Computing Systems

Data-Intensive Distributed Computing

Petascale Data Intensive Computing

Data-Intensive Computing with MapReduce

CPS216: Data-intensive Computing Systems

Extreme Data-Intensive Scientific Computing

Data -Intensive Computing Systems

CS216: Data-Intensive Computing Systems