200 likes | 380 Views
Data Analytics for Big Data. Vandana P. Janeja Information Systems Department, University of Maryland, Baltimore County, MD, USA. Big Data. What is Big Data?
E N D
Data Analytics for Big Data Vandana P. Janeja Information Systems Department, University of Maryland, Baltimore County, MD, USA
Big Data • What is Big Data? • Recently much good science, whether physical, biological, or social, has been forced to confront - and has often benefited from - the Big Data phenomenon. • Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115) Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewa- tripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress of the Econometric Society, Cambridge University Press, 115-122
Big data spans four dimensions: Volume, Velocity, Variety, and Veracity
Volume: Enterprises are awash with ever-growing data of all types, • Terabytes-petabytes-exabytes—of information. • Turn 12 terabytes of Tweets created each day into improved product sentiment analysis • Convert 350 billion annual meter readings to better predict power consumption
Velocity: Sometimes 2 minutes is too late. • For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. • Scrutinize 5 million trade events created each day to identify potential fraud • Analyze 500 million daily call detail records in real-time to predict customer churn faster
Variety: Big data is any type of data - structured and unstructured data • text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. • Monitor 100’s of live video feeds from surveillance cameras to target points of interest • Exploit the 80% data growth in images, video and documents to improve customer satisfaction
Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. • How can you act upon information if you don’t trust it? • Establishing trust in big data presents a huge challenge as the variety and number of sources grows.
Will it make a difference if some of this data is from France and some from Maryland ? Will it make a difference if some of this data is from LA and some from Baltimore ? Will it make a difference if some of this data is from Maryland and some from D.C ? Will it make a difference if some of this data is from Howard County, MD and some from Montgomery County, MD ?
According to the 2001 statistics, NJ ranks 12 in intersection fatalities with 32.1% of all state highway fatalities, and ranks 12 in pedestrian fatalities with 17.7% of all state highway fatalities (USDOT) US HIGHWAYS • 42,000 Americans Are Killed On Highways Each Year • Nearly one-third of all fatal crashes each year are caused by substandard road conditions and roadside hazards. • Motor vehicle crashes cost the United States $231 billion annually, including $21 billion from Federal and State tax revenue. • Americans Waste $67 Billion Each Year Due To Congestion Ref: http://www.house.gov/transportation/press/press2005/release9.html
CDC Officials Confirm Swine Flu Cases Up to 40; Outbreak May Worsen : ABC News 2/27/09 1pm Dr. William Schaffner, chairman of Preventive Medicine at Vanderbilt University Medical Center in Nashville, Tenn., said doctors like him have been advised by the CDC and state health department to set up a system that would test patients with flu-like symptoms and help define how widespread this outbreak is. He said the severity of the virus is hard to gauge because of the wide discrepancy in how it has affected Mexicans and Americans, and because it is occurring in places that are warm, which is very unusual. "The genetic make up of this virus has influenza experts scratching their heads," he said. "One of the things that has us worried is that could this be a virus that could continue to make mischief during the warmest parts of the year. That would be a big thing. For a respiratory virus to be active during the summer months" would be very unique.
Data Mining: Concepts and Techniques Knowledge Discovery (KDD) Process Knowledge Pattern Evaluation • Data mining—core of knowledge discovery process Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases
Big Data Framework • Automatic Parallelization • Run-time • Data partitioning • Task scheduling • Handling machine failures • Managing inter-machine communication • Completely transparent to the programmer/analyst/user
Relevant IS Courses • IS 410 Introduction to Database Design • IS 420 Database Application Development • IS 427 Introduction to Artificial Intelligence: Concepts and Applications • IS 428 Data Mining Techniques and Applications • IS 498 Special Topics • Independent studies