90 likes | 230 Views
Use Case Simulations. CMSC 491/691 Hadoop-Based Distributed Computing Spring 2014 Adam Shook. Orient yourself based on your groups We will merge/redistribute as necessary Groups will be given a use case and 15-20 minutes to design a Hadoop-based architecture to solve the problem
E N D
Use Case Simulations CMSC 491/691 Hadoop-Based Distributed Computing Spring 2014 Adam Shook
Orient yourself based on your groups • We will merge/redistribute as necessary • Groups will be given a use case and 15-20 minutes to design a Hadoop-based architecture to solve the problem • The fun is in the details • Each group will then be given 5-10 minutes to present their problem and solution • Ask questions and stuff so I don't look mean • Powerpoint/Gsheets/Other Media make fun pictures that we can put up on the screen • Email/share with me and I will project
Questions to Answer • What is the main data source coming from and what is the “best” way to get it into my system? • How will I be storing the data to most efficiently be accessed for analysis or consumption? • How will this analysis be conducted? • How will my findings be delivered to address the problem? • What other data sources can I use to augment and refine my analysis? • Has my architecture met the customer's most critical needs?
Use Case 1 • An advertising company is building a product to sell to retailers. This product will push coupons and other special offers to consumers that are in or passing nearby a retailer’s store. • You’ve been asked to assist the advertising company in designing a data architecture to fully enable this product. Describe the types of data this advertising company would need from public sources and/or the retailer to enable ‘coupon pushing’ and deliver the right coupons to the right customers.
Use Case 2 • Your venture enterprise is looking to more accurately predict the strength and duration of regular seasonal-based illnesses and outbreaks such as the flu. You’ve already received the backing from several health care providers who are (legally) willing to share their masked patient data with you, under the caveat that this data must be safe and secure. • You are also looking at gathering data from social media and other “unconventional” sources to get accurate predictions, including free and open data from the government. These predictions can then be sold to pharmaceutical companies for proper drug manufacturing and geographical distribution. • Design a data architecture to collect, store, and analyze information to more accurately predict seasonal-based illnesses. How could this information be sold to pharmaceutical companies, healthcare providers, and individuals? How would the delivery of this information change your architecture, based on the audience?
Use Case 3 • The City is looking to leverage data to predict when and where crimes are likely to occur. Many reports are now in an electronic format, but contain a lot of unstructured text in addition to their structured codes. Traffic and ‘blue light’ cameras around the city are manually monitored (if at all) and only used to gather evidence if a crime has occurred. Mountains of old hand-written reports lay in the city’s archives. • Help the City define a data architecture to incorporate historical and present data sources to reduce crime.
Use Case 4 • An automotive insurance company is looking to leverage on-board computer systems to give their customers better discounts. These devices are installed in a customer’s car for a few months and record data about how the vehicle operates and how the customer’s drive: status of engine parts, speed, acceleration, banking, how often they drive, etc. • Design a Hadoop architecture to collect and store this data. Describe the type of analysis that could be done to better determine the rates to offer a customer. What other type of product offerings could an insurance company sell using this data? How would your architecture change to meet these goals?
Use Case 5 • An electric and gas utility company has been installing smart meters across their grid in order to get meter readings (and thus charge their customers) without requiring someone to visit the homes and read the meters. These meters generate readings ten times every second, but the data is only “collected” once a month for billing purposes. The company would like to store all the readings, but they cannot handle the vast volume (~ 1 terabyte a day) all of the meters in the grid generate. The company would like to leverage big data to more accurately monitor their power grid. • How can this time-series data be stored, analyzed, and insights be delivered? What analytics could be done to detect failures in the power grid? How can high consumers of data be detected (running a data center in your basement)? What about fraud (tampering with meters, stealing power from a neighbor, etc.)?
Use Case 6 • A telecommunications provider has numerous cellular towers spread throughout the country. A cell phone connects to the nearest cellular tower in order to send and receive phone calls. For each connected phone call, the data collected at these towers includes: • Caller’s phone number • Callee’sphone number • GPS coordinates for both parties • Current length of the in-progress call • If a cell tower becomes overloaded, the tower will drop a call from a random pair of users. This particular telecommunications company is interested in selectively choosing whose calls to drop when the time to do so arrives.