120 likes | 270 Views
Big Data Challenges in Application Performance Management. Tilmann Rabl Hans-Arno Jacobsen Serge Mankovskii XLDB Conference 2011. MIDDLEWARE SYSTEMS. RESEARCH GROUP. MSRG .ORG. Abstract.
E N D
Big Data Challenges in Application Performance Management TilmannRabl Hans-Arno Jacobsen Serge Mankovskii XLDB Conference 2011 MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG
Abstract Modern Web Data Platforms (WDPs) handle large amounts of data and activity through massively distributed infrastructures. To achieve performance and availability at Internet scale, WDPs restrict querying capability, and provide weaker consistency guarantees than traditional ACID transactions. The reduced functionality is sufficient for many web applications. High data and query rates also appear in application performance management (APM). APM has similar requirements like current web based information systems such as weaker consistency needs, geographical distribution and asynchronous processing. At the same time, APM has some unique features and requirements that make previously published research and existing architectures inapplicable.
Application Performance Management • Enterprise system architectures • Very complex distributed systems • Need of detailed monitoring • Service level agreements • Application performance management • How many transactions fail? • Where is the root cause of failure? • What is the end to end response time? • Which component is the bottleneck? • Which and how many transactions are there?
Enterprise System Architecture SAP Message Queue Database Identity Manager Application Server Client Message Broker Client Web Server Application Server Web Service Client Application Server Main Frame Client 3rd Party Database
Java Byte Code Instrumentation Program • JSR – 163 • JVM is augmented with agent • Agent can run additional code • No change of code base • Trace transactions • Measure response times • Other types of measurements • Huge number of events • Potentially for every method invocation JVM Agent Events Additional Code
APM Performance Requirements • High insert rates • Millions inserts / sec • High query rates • Thousands queries / sec • Write ratio: >99 % • Agents send data in bulks • Different periods (seconds to minutes) • Big data • 250 Bytes per record • ~ 250 MB / sec • ~ 600 TB / month
MADRID Project • Current system’s performance • YCSB results < 15K ops / sec • TPC-C results ~ 500K transactions / sec • Need for a new architecture • Massive Asynchronous DistRIbuted Data • Highly scalable • High write throughput • Apart from measurements data mostly static • Static queries • Hybrid key-value store
MADRID Architecture • Materialized Views • Static queries • Filters • Notifications • Hybrid data store • All nodes are equal • DHT style inserts • Replication for static data • Asynchronous processing View Manager Message Broker In-Memory Storage Entry Log Disk Storage
Schema Excerpt • Transaction types • No instances • Graph structure • Metric per transaction • Type of measurement • Measurements • Per transaction type • Per metric type • Can be aggregations Measurement Transaction value transaction_id min_value Transaction_name max_value no_points Transition start_time transaction_id end_time head_component metric_id tail_component Component Metric component_id metric_id machine metric_type description transaction_id
Materialized Views I • What is the average runtime of transaction XY? SELECT transaction_name, AVG(end_time - start_time) FROM Measurement ms, Metric mt, Transaction t WHERE ms.metric_id = mt.metric_id AND mt.transaction_id = t.transaction_id AND mt.metric_type = “runtime_metric” AND ms.start_timeBETWEEN “18/10/2011” AND“19/10/2011” AND t.transaction_name = “XY”
Materialized Views II • What is the average runtime of transaction XY? Metric Transaction metric_id transaction_id Transaction_name metric_type transaction_id AVG_Runtime transaction_id Transition transaction_name transaction_id Measurement metric_id head_component avg_value value tail_component time_frame min_value max_value Component no_points component_id start_time machine end_time description metric_id
Contact • TilmannRabl • University of Toronto • tilmann@msrg.utoronto.ca • Hans-Arno Jacobsen • University of Toronto • www.msrg.org • Serge Mankovskii • CA Labs • mankovskii@ca.com