70 likes | 324 Views
Massive Data Analysis Lab (MassDAL). S. Muthukrishnan CS Dept. MassDAL. Agenda: Gather, manage and process massive data logs ----Web, IP/wireless traffic data, location trajectories of objects, sensor readings of physical world. Key Challenges:
E N D
Massive Data Analysis Lab(MassDAL) S. Muthukrishnan CS Dept
MassDAL • Agenda:Gather, manage and process massive data logs----Web, IP/wireless traffic data, location trajectories of objects, sensor readings of physical world. • Key Challenges: • Scale: Beyond the traditional “human” scale. Eg., IP data at a single router interface for an hour exceeds total yearly worldwide credit card transactions! • Data Collection: probes/sensors with associated data quality and communication problems. • Need breakthroughs in Mathematics, Algorithms, Systems and Engineering, to meet these challenges. • Potential: Major impact in Homeland Security, Telecom, Transportation and Society-at-large.
State of MassDAL • Mathematics and Computer Science. • Algorithmic tools for embedding vectors, strings, trees and other objects for “compact” representation. • Algorithmic tools for analyzing data summaries for heavy hitters, deviants, clustering, decision trees, etc. • Invited talks at ACM, SIAM, European conferences in Algorithms, Databases, Statistics, and Data Mining on novel models and algorithms. • Over dozen research papers in last 2 years on experience with massive data analysis. • Supported by NSF grants. Partner: MIT, DIMACS.
State of MassDAL • Science • Developing wearable sensors for tracking location of objects as well as “interactions” between objects. Measuring behavioral data. • Current partner: Telcordia. Their initial investment: $300k/3 months (est). Potential parter in works: Los Alamos National Lab. • Potential: Analysis of social networks for Epidemiology and Homeland Security, and health industry.
State of MassDAL • Engineering. • Consulting in analysis of wireless network logs. AT&T Wireless, 3rd largest in US, 20 Million customers. Terabytes/month. Fully operational, telco-grade! • Incorporated novel algorithms in operational IP network data analysis tools. Partner: Gigascope. • Developed principled approach to data cleaning and data quality monitoring for operational IP network. Partner: PACMAN. • Developed new burst-detection algorithms for text streams. Partner: DIMACS, Monitoring message streams.
Future • See http://cs.rutgers.edu/~muthu/massdal.html
Future of MassDAL • Research: Need breakthrough research in mathematics, systems, databases, algorithms, sensor networking. • Expand data domains. • Potential partners: Google, NJ auto insurance fraud data, USPTO patent data, AWS location trajectories, etc. • Build state-of-art facility at Rutgers. • Secure, 24X7, data hosting and analysis infrastructure capable of gathering and processing petabytes of data/month across domains, data sources, etc. Unique in the world! • Potential. • Every wireless, telecom, internet service provider is looking to farm out this crucial piece of their operations. Estimated market for these services: 100’s of millions in US $ per year. Crucial for NJ State. Interest from multiple VCs now.