Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications

Multi-user Extensible Virtual Worlds Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications

Fast archiving process Archives cached and shared with multiple clients Reduces overhead for many clients in same city Client-side transform interpolation Allows reduced transmission frequency Offloaded work to the client Generating deterministic assets during city load Performing mesh animation locally Secure handshake authentication on connect Communication Optimizations

Previous Attempts • Combining common-practice methods • Low-level optimizations • Caching and re-use • Interpolation • Prediction and smooth correction

Allows server to send updates less frequently Maintains a smooth experience Client-side interpolation Positions From Server Drawn to Screen

Primary Goals • Latency tolerance beyond state of the art • Reduce bandwidth beyond state of the art

Previous Attempts • Not good enough for our scale • Massive activity imposes bandwidth and synchronization burden beyond the norm • Need server to be further “forward in time”for higher latency tolerance • Need to find a way to lower bandwidth further • Simply reduce rate of update frames further?

Original Server Injection Model of Prediction • Interpolation smooths movement until server stalls • Client interpolates based upon expected arrival time • If server increases lag, there is nothing to interpolate to! • Inject copy of Server functionality into Client • Performs same work on subset of data for prediction • Server state may differ from prediction • Client interpolates what user sees during correction Server Client Server

Failure of Simple Injection Model • Small differences produce large changes • Collision events, House construct selection • Some states have large prediction failure consequences • Corrections become as dramatic as having allowed video to “stutter” • Does not allow us to tolerate severe internet latency in practice

Accomplished Work • Established persistent server on the IBM z10 enterprise server running 24/7 • Lowered power consumption during low utilization • Characterized latency across typical wired and wireless networks

Latency Characteristics • Large latency due to software pipelines • Exacerbated by present interpolation system • Intermittent network latency can triple shown value • Some data gathered from Internet Weather Map project

Accomplished Work • Tested server-in-client injection model for latency tolerance • Results indicate infeasible approach to increase time lag between server and client • Designed new synchronization architecture to accomplish both latency tolerance and bandwidth reduction • Presently being implemented

New Synchronization Architecture • Synchronization is a common problem • All multiplayer systems • Application-specific issues dominate problem • Generalized solution for wider applicability • Can be applied to many client-server systems • Unified design addresses both bandwidth and latency problems in real-time distributed applications

New Synchronization Architecture • States synchronized as separate streams • Server at different virtual time in all streams! • Each stream can be at different clock • Client performs prediction and correction only for streams the clock has overrun • Partial-knowledge prediction • Each stream can be abstracted differently • Values, predictor inputs for values, etc.

New Synchronization Architecture • Decouples problems of each kind of state • Each stream poses different trade-offs • Taxonomy of properties aids application • Exposes necessary model modifications

New Synchronization Architecture • State properties • Predictability (level of determinism) • How can it be computed locally? • Computational dependencies • What other states are required to compute locally? • Error magnification effect • How do errors in my computational dependencies magnify errors in locally computed state?

New Synchronization Architecture • Events given in terms of a virtual clock • Precise clock synchronization is impossible • Prevents errors from propagating forward • Virtual clock time is adjustable • Accuracy implies all clients view environment accurately at some (relatively close) point in time

Example: Object Transforms • Moving objects affected by forces • Assume no other effects for now • Physical processes accurately predictable • Architecture & executable code may differ • Float point drift must still be compensated for • Intermittent object transform updates required • Vastly improves latency & bandwidth • Only a potential due to real dependencies

Complication: Collisions • Also predictable, but errors magnify • Dissimilar collision computations can drastically affect future object transforms • Server notifies clients: collisions & misses • Clients are notified of “close calls” that miss! • Server exists forward in time from clients • Example of predictable computational dependency with error magnification

Animation: Buildings select and grab objects • Objects “pulled into place” in structure • Current selection criteria based upon proximity to building • Coupled to transforms: circular dependency! • Initially developed for selection efficiency • Not imperative, can be modified • Errors have largest magnification yet! • Objects change subsystems, alter paths

Animation: Buildings select and grab objects • Push selection further forward in time • Server notifies clients of selections • Reduces client need to predict selection • Selection criteria modified • Allows server to compute future more easily • Example of untenable error magnification • Simulation model altered to accommodate!

Last Example:Player Location • Player cyclone applies forces to objects • Creates dependency with object transforms • Problem: Clients are not “forward in time” • Player location not well predictable • Solution: loosely couple wind forces from player visual representation • Make physics manifestation more predictable

Last Example:Player Location • Common position predictor function utilized by server and all clients • Deterministic function defines position of physical manifestation • Function uses intermittent player state as input • Periodic client player velocity update enough to maintain

Last Example:Player Location • Still relatively low-latency: prediction failure results in compensation • Compensation should be soft (low error magnification) • Example of modified simulation with low error magnification • Reduces player position traffic by abstracting to velocity functions

Synchronization Examples

Where Are We On This? • Design complete but undergoing iteration • Implementation underway • No drift on test using Verlet integration kernel • Most work yet to be done • Generalizations being developed • Applies to real-time internet applications

Summary • Developing revolutionary methods for synchronization in internet applications to: • Reduce bandwidth requirements • Increasing latency tolerance • Involves independent state streams that: • Are synchronized and predicted with different methods according to their properties • Expose necessary model modifications

Multi-user Extensible Virtual Worlds End of Communication Talk Questions?

Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Combining Incremental and Parallel Methods for Large-scale Physics Simulation

Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Review of work to date

ScalableEngine • Built to handle large VR environments efficiently (massive object count, low activity) • Only physics system capable of handling Scalable City in real time • Overhead proportional to level of activity rather than environment scale or object count • Novel broad phase1 and physics pipeline2 methods published • Efficient Large-Scale Sweep and Prune Methods with AABB Insertion and Removal. IEEE-VR 2009 • Accelerating Physics in Large, Continuous Virtual Environments. Concurrency and Computation: Practice and Experience, 24(2):125134, 2012

ScalableEngine:Broad Phase CD level of activity Lower asymptotic complexity: order of magnitude performance improvement!

ScalableEngine:Full Physics Pipeline Note: only a constant number of bodies undergoing active physics computation. Excluding unnecessary work: Again lower asymptotic complexity

ScalableEngine:Multi-user System • Scalable City developed into massively multi-user client-server system. • Player count increases activity level • For multi-user, other factors matter as well • Computational efficiency • Parallelism

ScalableEngine • Best engine at handling large environments • Heavy computation similar to other software • As activity increases, advantage matters less • Regions with high activity see less benefit! • Parallelized ScalableEngine by multi-threading all aspects of computation • Improved performance, but not enough for massively-multi-player. • Traditional physics does not parallelize well.

ScalableEngine:Multithreaded Physics Limited parallelism in traditional physics methods

CLEngine • Developed new physics simulation system from scratch focused on massive parallelism • Based on work of Thomas Jakobsen1. • Design modified for parallel application • OpenCL utilized for portability to various compute devices (CPU, GPU, Accelerators)

CLEngine:Core • Object representation broken down to particles and stick constraints • Rigid body volume behavior is emergent • All constraints independently solvable • Very fine-grained, highly parallel core

CLEngine:Host Interface • OpenCL weakness: expensive communication on dedicated GPUs • Designed to reduce communication by • Keeping many contiguous stages on the card • Accelerate communication with transport kernel • Reduce communication to state deltas Coll. Det. Contact Graph Integration Coll. Det. Integration

CLEngine:Performance • 3-6 times CPU performance for single thread • Higher parallelism acceleration curve • Many optimizations still not done! • GPU targetable for extreme performance • Optimizations are more critical • Communication, local memory, vector types

CLEngine Prototype Limitations • Designed for an “all active” system • Not state-aware, no incremental processing • Does more total work than current CPU engine • We want both advantages simultaneously! • Multiple ways to achieve this • Challenges imposed by slow communication • Integrating a broad phase solution efficiently • Reporting results usefully and efficiently

Work Finished, Pt 1 • Made CLEngine & Testing Framework portable to OpenCL v1.1 systems generally • Tested on IBM/PowerPC, Ubuntu, Windows, OS X Intel • Built high-level services on CLEngine core • Allows to interface like a traditional physics engine • Ported Scalable City Server to VS2010 for OpenCL • Tools were 7 years old, OpenCL vendors didn’t support • Integrated CLEngine as run-time option for S.C. • Not operational due to incomplete interfacing options

Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Broad Phase Integration: Where we’re going and why

CLEngine Broad Phase:Options Prev. Discussed • Hash Grid • Query stage for medium size (Lots, Cyclones) • Multi-sort Sweep & Prune • Single solution for small-medium • Better performance for object clustering? • Space-filling curves • Reduce S&P sorts from two to one! • All cases: Host must deal with large objects

Work Finished, Pt 2 • Space-filling curves with S&P implemented • Morton: massive false positives • Hilbert: high false positives & false negatives • Space-filling curves generally too inaccurate • Multi-sort S&P has similar limitations to Grid • Parallel last pass inefficient w/o similar object size • Ideally precisely same object size for symmetry • Traversal must stop based on largest object size • Load balancing also affected by clustering

CLEngine Option 2:Broad Phase on Host • More flexible, generally performant • Handles all object sizes well • Thread parallel, incremental processing • CLEngine sees only relevant object subset • Active objects & objects overlapping them in B.P. • Maintained by communicating deltas • CLEngine core is simpler: process & report all • Focus optimization on communication

Work Finished, Pt 3:Host Broad Phase Design • Designed Host Broad Phase System • Communication manager being implemented • BP and state system used to consolidate delta • BP optimized for thread-parallel high activity • Doubled performance under these conditions • Free-threaded interfaces for all operations

Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games End of Physics Talk Questions?

Diagnosing Unbounded Heap Growth in C++ • Project Description: Improper memory management in software can cause performance to decline and eventual program failure. These problems can be difficult to detect during testing due to the unpredictable amount of time it can take to exhibit overt symptoms, and those symptoms may appear unrelated to memory management. The purpose of this research project is to identify causes of unbounded heap growth in C++ software beyond traditional memory leaks. • Major Accomplishments: • Heuristic perfected to yield low false positives/negatives with continuously improving accuracy over time • Identified memory problems in Google Chrome, WebKit, Ogre3D • Fixed growing data structures in Chrome and Ogre3D

Diagnosing Unbounded Heap Growth in C++ Review from last meeting

Diagnosing Unbounded Heap Growth in C++Motivation • Scalable City needs to run continuously • Many months without intervention/access • Had slow growth of memory • leading to crash after several weeks • Available analysis tools reported no leaks! • Software frees all memory correctly! • Different kind of undetected memory issue

Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications

Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications

Presentation Transcript

High Availability and Fault-Tolerance in Real-Time Databases

Real-time Infrastructure, Analysis and Applications

Internet Real Time Lab

Applications Drive Bandwidth

Internet Bandwidth monitor

Congestion Control to Reduce Latency in Sensor Networks for Real-Time Applications

Latency Lags Bandwidth (last ~20 years)

Designing Real-Time Applications

File Systems in Real-Time Embedded Applications

Applications in hydrology and real-time flood forecasting

Infinite Bandwidth, Zero Latency

RSVP Bandwidth Reduction in TSVWG

internet applications for the real time enterprise

Internet Real Time Laboratory

Latency versus Bandwidth Latency is time to complete task – eg load memory and do FLOP

Scalable Applications and Real Time Response

File Systems in Real-Time Embedded Applications

Real Time PCR:Accuracy, Advantages, and Applications