190 likes | 290 Views
Go beyond debug Wire Tap your App for knowlege. with Hadoop. Tom McCuch Solution Engineering @ Hortonworks Twitter: tmccuch Oleg Zhurakousky Principal Architect @ Hortonworks Twitter: z_oleg. The Application Development Dilemma.
E N D
Go beyond debugWire Tap your App for knowlege with Hadoop Tom McCuch Solution Engineering @ Hortonworks Twitter: tmccuch Oleg Zhurakousky Principal Architect @ Hortonworks Twitter: z_oleg
The Application Development Dilemma • Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through their applications • 80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected -- without ever being analyzed or accounted for • For the remaining 20% we do currently collect – application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT operations budgets and have constrained app development teams from keeping pace with the rate of change in the business
Example: Data Available During Ingest • Record count • Highest/Lowest record length • Average record length • Compression ratio But with a little more work. . . • Field parsing • Unique values • Unique values per field • Access to values of each field independently from the record • Relatively fast field-based searches, without indexing • Value encoding • Etc… These are cross-cutting concerns!
How do we address cross-cutting concerns without disturbing the existing process flow?
Other Enterprise Integration Patterns • Transformer Convert payload or modify headers • Filter Discard messages based on boolean evaluation • Router Determine next channel based on content • Splitter Generate multiple messages from one • Aggregator Assemble a single message from multiple
6 Key Hadoop DATA TYPES • SentimentUnderstand how your customers feel about your brand and products – right now • ClickstreamCapture and analyze website visitors’ data trails and optimize your website • Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines • GeographicAnalyze location-based data to manage operations where they occur • Server LogsResearch logs to diagnose process failures and prevent security breaches • TextUnderstand patterns in text across millions of web pages, emails, and documents Value
Financial Services Data: Server Logs Fraud Prevention Business Problem • Financial institutions are always at risk of fraud • Fraudsters test bank systems for vulnerabilities • This testing leaves subtle patterns often undetected by bank employees or law enforcement • Fraud losses costs banks millions Solution • HDP reduces the cost to detect fraudulent activity • HDP stores more types of data for longer • Analysis of data in the “data lake” exposes fraudulent patterns that would have gone undetected
Credit Request Process Flow - Before Credit Request Processing • Credit Request arrives on a Gateway • Credit Request is sent over a Channel • Credit Request Processor • Receives Request • Processes the Request • Issues a Response
Cross-Cutting Concerns • Credit Scoring • Fraud Detection • Gathering Data Available during Credit Request Process Flow
Example: Data Available During Ingest • Record count • Highest/Lowest record length • Average record length • Compression ratio But with a little more work. . . • Field parsing - unstructured data is not all that unstructured… • Unique values • Unique values per field • Access to values of each field independently from the record • Relatively fast field-based searches, without indexing • Value encoding • Etc… These are cross-cutting concerns!
Thank You! Questions & Answers Follow:@tmccuch, @z_oleg, @hortonworks