1 / 17

Data Analytics for Big Data

Data Analytics for Big Data. Vandana P. Janeja Information Systems Department, University of Maryland, Baltimore County, MD, USA. Big Data. What is Big Data?

gage
Download Presentation

Data Analytics for Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Analytics for Big Data Vandana P. Janeja Information Systems Department, University of Maryland, Baltimore County, MD, USA

  2. Big Data • What is Big Data? • Recently much good science, whether physical, biological, or social, has been forced to confront - and has often benefited from - the Big Data phenomenon. • Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115) Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewa- tripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress of the Econometric Society, Cambridge University Press, 115-122

  3. Big data spans four dimensions: Volume, Velocity, Variety, and Veracity

  4. Volume: Enterprises are awash with ever-growing data of all types, • Terabytes-petabytes-exabytes—of information. • Turn 12 terabytes of Tweets created each day into improved product sentiment analysis • Convert 350 billion annual meter readings to better predict power consumption

  5. Velocity: Sometimes 2 minutes is too late. • For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. • Scrutinize 5 million trade events created each day to identify potential fraud • Analyze 500 million daily call detail records in real-time to predict customer churn faster

  6. Variety: Big data is any type of data - structured and unstructured data • text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. • Monitor 100’s of live video feeds from surveillance cameras to target points of interest • Exploit the 80% data growth in images, video and documents to improve customer satisfaction

  7. Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. • How can you act upon information if you don’t trust it? • Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

  8. Analytics

  9. Is it all about algorithms

  10. Will it make a difference if some of this data is from France and some from Maryland ? Will it make a difference if some of this data is from LA and some from Baltimore ? Will it make a difference if some of this data is from Maryland and some from D.C ? Will it make a difference if some of this data is from Howard County, MD and some from Montgomery County, MD ?

  11. According to the 2001 statistics, NJ ranks 12 in intersection fatalities with 32.1% of all state highway fatalities, and ranks 12 in pedestrian fatalities with 17.7% of all state highway fatalities (USDOT) US HIGHWAYS • 42,000 Americans Are Killed On Highways Each Year • Nearly one-third of all fatal crashes each year are caused by substandard road conditions and roadside hazards.  • Motor vehicle crashes cost the United States $231 billion annually, including $21 billion from Federal and State tax revenue. • Americans Waste $67 Billion Each Year Due To Congestion Ref: http://www.house.gov/transportation/press/press2005/release9.html

  12. LA Times 4/27/09 12pm

  13. CDC Officials Confirm Swine Flu Cases Up to 40; Outbreak May Worsen : ABC News 2/27/09 1pm Dr. William Schaffner, chairman of Preventive Medicine at Vanderbilt University Medical Center in Nashville, Tenn., said doctors like him have been advised by the CDC and state health department to set up a system that would test patients with flu-like symptoms and help define how widespread this outbreak is. He said the severity of the virus is hard to gauge because of the wide discrepancy in how it has affected Mexicans and Americans, and because it is occurring in places that are warm, which is very unusual. "The genetic make up of this virus has influenza experts scratching their heads," he said. "One of the things that has us worried is that could this be a virus that could continue to make mischief during the warmest parts of the year. That would be a big thing. For a respiratory virus to be active during the summer months" would be very unique.

  14. Data Mining: Concepts and Techniques Knowledge Discovery (KDD) Process Knowledge Pattern Evaluation • Data mining—core of knowledge discovery process Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

  15. Big Data Framework • Automatic Parallelization • Run-time • Data partitioning • Task scheduling • Handling machine failures • Managing inter-machine communication • Completely transparent to the programmer/analyst/user

  16. Relevant IS Courses • IS 410 Introduction to Database Design • IS 420 Database Application Development • IS 427  Introduction to Artificial Intelligence: Concepts and Applications • IS 428 Data Mining Techniques and Applications • IS 498 Special Topics • Independent studies

More Related