210 likes | 592 Views
Real-time Stream Processing Architecture for Comcast IP Video. Strata Conference + Hadoop World 2013 Chris Lintz Gabriel Commeau. Agenda. Comcast VIPER Overview Architecture Overview Q & A. Comcast Video IP Engineering and Research (VIPER).
E N D
Real-time Stream Processing Architecture for Comcast IP Video Strata Conference + HadoopWorld 2013 Chris Lintz Gabriel Commeau
Agenda • Comcast VIPER Overview • Architecture Overview • Q & A
Comcast Video IP Engineering and Research (VIPER) Preparation Delivery Video Players Packaging Storage Transcoding Origination Samsung iOS Video Players Xbox Live Android Analysis Storm
Why Do We Focus on Real-time? • Proactively diagnose issues • Form real-time intelligence • Help deliver best possible video experience Viewership Prime Time
Video Player Analytics Protocol • Live and On Demand • JSON event objects • Key metrics • Bitrate • Frame rate • Fragments • Errors We collect and use all data in accordance with best consumer privacy practices and applicable laws
Flume: Data collection Tier • Collect, aggregate and move large amounts of data • Distributed, scalable, reliable, customizable • Multi-tier architecture
Player Sessions in Real-time • Sessions in Flume? • Technical issues: consistent hash and exactly-once semantics • Design goals • Separation of concerns • Session write-through rate?
Flume Edge Tier: Video Player Analytics End Point • Analytics events over HTTPS • HTTP Source • Re-batch with inner sink and source
Flume Mid Tier: Processing and Routing Data • Video Player Event processing • Geo-location, asset metadata, validation, to-storm • Replication channel processor: • HDFS sink • Storm sink
Bridging Flume to Storm: Flume2Storm Connector • Service discovery • Distributed, scalable and reliable • Low latency
Requirements for Read/Writes from Storm Bolts • Functionality beyond key/value stores • Real-time and historic window queries • Speed of in-memory writes and durability of disk
Utilizing MemSQL for Persistence • Distributed in-memory SQL database • ACID, highly available, fault tolerant • Aggregators route queries to leaves • Leaves are auto-sharded • Solves our intense read/writes
Achievements In Utilizing MemSQL • Complex queries in milliseconds • Fault-tolerant Storm bolt state • Joins now available outside of Storm bolts • Foreign key shards • Complex data streams • Dynamic alters without locks or down time • JSON type
Wrapping Up • Real-time at Comcast scale • Millions of video players • Horizontal scale everywhere • Aggregated metrics across US and complex analysis • Real-time API • Builds foundation • Advanced real-time analytics • Better platform for innovation • Alerts on complex objects • Supplemental real-time data back to clients • Popularity-based CDN
Thank You christopher_lintz@cable.comcast.com gabriel_commea@cable.comcast.com