290 likes | 379 Views
Scaling Development Darrin West - Emergent. Scaling Development: Challenges. Change Scale of Team Scale of SW/codebase/content Time crunch; Long critical path Software Engineering at scale adds overhead, but allows for parallelism. Team Organization Software Engineering Processes.
E N D
Scaling Development: Challenges • Change • Scale of Team • Scale of SW/codebase/content • Time crunch; Long critical path • Software Engineering at scale adds overhead, but allows for parallelism
Team OrganizationSoftware Engineering Processes • Team size – code/content change rate • Red Team, Blue Team (pipelining) • Enforced Modularity (functional areas) • Better/reliable direct parallelism (no locking) • Reduce overhead, retain correctness/causality • Less waste, less blocking. • Configuration/Change management • Builds breaking (probabilities) • 1 % chance of break (twice a year) for each of 50 engineers…gives 60% chance of a good build (0.99^50)
Sample SE Processes • Per-Developer branches, Integration Engineer. • Continuous Integration: tests run during build. • Fast incremental builds, fast load/test times, or eng might skip. • Automatic regression • Testing • Pre-Checkin • Tools • Branch and Run. Don’t wait for everyone at end of “phase”.
Nimble (but big) Development • Iteration rate • Build, test, reliability • Lazy (minimal) content loading • Small init time • Don’t wait to optimize this • Build sophisticated asset loader early. It pays every day • Play test – reliable, quick injection of new ideas. • Design Tuning (without making a new Build) • Content iteration • Incremental development – always demoable – small sprints • Scale and break testing • Rapid application of changes to Production.
En-Nimbling Suggestions • Fast Iteration • Don’t restart the app/server just for new content. • Don’t load all content, use lazy loading. • Minimize build times: incremental build/link • Break dependencies on implementation, minimize recompile. • Fast Adaptation to change • Design for change. • Script development. • Very modular. Solid lower layers. Fewer side-effects. • Autotesting for a sure footing; easy refactoring. • Optimize later (Kent Beck – rephrased)... • Make it work at all (and start getting feedback) • Make it work right (well-formed, refactored, optimizable) • Make it work efficiently (and still right)
Independent Iterations • Individual, parallel iterations require modularity, independence of modules • Ability to merge (even content) • Invest in OH of C.M. to get some ||ism • Which OH’s are *really* needed/best? • Content/code locking? • Checkin “token”? • Pre/post checkin code reviews? • Consistent, simple, flexible approach (software architecture)
Technique:Modularity and Interfaces • A change is encapsulated/insulated • Makes system flexible • Make system resilient to breakage • Parts can be reused • Dependencies • Short include paths, faster builds, dependency checks • Acyclic: compile and link dependencies • Forward declare, not #include • Opaque pointers • Tends to “drift”. How to avoid cycles even in an emergency. • Interfaces/abstractions • Layering/levelizing: “Low level” modules should publish an interface (so impl can change) • Implementation can depend on Interface. Nothing should depend on Implementation. • Pure-virtual classes.
Modularity(2) • DLL/.so • Significant reduction in link times • Encapsulate change. • Plugin, optional loading • With Interfaces, don’t have to export symbols • Faster link and load times • Fewer global symbols • Disentangle dependencies. • Easier to understand code. • Easier to test. Easier to integrate. • One header file change doesn’t rebuild all • Less likely to rebuild base libraries. If they do change, only need to relink upper layers.
Frameworks • Sharing cycles • Message driven. Multiple message handlers. • Message oriented/not locks • Ease of understanding/debugging • Single threaded, single writer • Multi-process same as multi-threaded • Allocate units of work to more or fewer processes based on measurements/policy, not data/chance • Single paradigm: easier for an eng to move to a new part of the app. No 1 eng can know all of the app.
Lessons Learned from COM & .NET • Lifetime management (who owns this ptr?) • Interface/Abstraction • Plugin/Optional loading • Factories • Init Phases • Recompile independence • Dependency simplicity • Layers • Peers/App
Pitfalls/Alternatives • Massive Link • Circular dependencies • “Infinite” include paths • Unknowable dependencies • Bad Layering, leading to no reuse
Resources • “Large Scale C++ Software Design”, John Lakos • Some out of date ideas, some obvious, some very good! • Scale has gone up an order of magnitude • “Refactoring”, Martin Fowler • Because you won’t get it right the first time • “The Game Asset Pipeline”, Ben Carter • Good tips • Still need fundamentally new approach • Scrum: http://danube.com
Scaling Performance: Challenges • In distributed systems, not just code efficiency, but interaction patterns/timing. • Compute “Load” varies in time, location, project phase… • OH vs. granularity/parallelism trade • Blocking/Idling and Deadlock
Work/Computation Organization • Variable compute load • Can’t know: changes during development, in real time. • Design changes over life of project • SW impl, structure, function-level changes • Community does the unexpected. • Varies in different areas, time of day. • Heterogeneous host speed and memory (incremental upgrades/purchasing) • Unpredictable # processors deployed on. Cores or Hosts might go offline.
Solve Variability:Flexibility and Policy • Mainly talking about server-side • Lessons do apply to Clients • Particularly symmetric multiprocessors. • Watch out for scheduler assumptions. Where is data? • Decompose into small bits, and map • Allows explicit mapping (tunable – metrics, tools) • Automatic (if you have development time) • Separate mechanism from Policy. • Policy easy to adapt • New mechanism, or decomposition is hard. • Break it into a gazillion pieces, but re-aggregate it parametrically to minimize overheads. Be in control of ||ism, not at the mercy of your data.
Logical Processes and Messages Physical Processor1 LP LP Map to Physical Processes LP LP LP LP LP LP Physical Processor2 LP LP Entities communicate using Messages only. Entities can therefore be mapped to any processor.
Maximize Flexibility • Communicating Sequential Processes • Single-writer • Disjoint memory • Lock-free • Decompose into small bits • Uniform interaction pattern • Recombine • Beware the performance saddle - overheads • Too many processors. • Too many processes.
Early Decisions on Decomposition/Mapping • What fraction of problem in each “unit” of work, and how many processors? • % to graphics, other functions? • Interaction patterns: sparseness in space vs. even mapping to processors
Tradeoffs in the Decision • Concurrency and non-determinism to get performance. • Sacrifice serial, logical, repeatable computations. • Correctness vs. efficiency. • Level of detail • Causality • Overheads: amount of ||ism is traded against overheads. • These affect intrinsic, vs. achieved ||ism • Synthetic ||ism • Critical path
Effect of Overhead: Performance Saddle • Amdahl limit, plus scheduling/communication overhead • Ideal/real speedup (adds OH’s). • Intrinsic, synthetic, achieved …
Efficiency: Examples of Overhead • Linear increases in overhead, not logarithmic • Shared bus, shared network • Global locks (affect other processors) • Blocking/Idling • Other MMU/cache coherency limits • Ultimately, communication/synchronization OH. • Speed of light in 3d, real scale limited to O(N^3) • Yes, Amdahl says break it into a gazillion pieces to “not limit parallelism”, but that adds overhead. • What is the ratio of real work to overhead? • Optimize OH on the critical path by running PingPong (all overhead application)
Best Policy:Measuring, Reacting • Sharing “nicely”: allows multiple processes per machine. • Waste: constant looping vs. instrumentation/measurement • See remaining/available processing. • Vs. dynamic Level of Detail? • Locating bottlenecks • Change mapping (dynamic) • Load bal vs. nearness of Communication
Summary: Why Message Passing FW’s • Extracting ||ism • Efficiency: OH, lock-free, single-writer • Tunability: granular, mapped • Cost and time to build: simple, flexible… • Plan for change • Be in control. Configure. • Adding processors can make it slower, adding processes can make it slower. Find best, tune for that.
||ism Alternatives • Multi-threaded, multiple r/w • Critical sections: priority inversion, deadlock • Data access locks: slow when no contention, multi-resource races. • Stream processing • Can cost in extra data copying • Scheduling effectiveness/efficiency • Granularity and performance saddle • Functional • Limited # pieces • Long-pole, Amdahl • Pipelining (added latency) • Optimistic • Memory, complexity, debugging, I/O and GVT • Parallelizing compilers/small scale ||ism • Communication OH swamps computing, except in special cases (e.g with lots of pipelining) Think: Ease of adapting to change
Development Similarity to Parallel Processing • Can’t tell where time is going to be spent. • Enough small pieces to keep everyone busy. Mapping. • Keep parts independent (single writer) • Avoid communication and syncronization overheads. • Note immediately when someone is blocked or idle (no busy-work). • Only work on “real” stuff. No rollback. (maybe “some”) • Avoid “global” activities • Meetings, checkin “semaphores”, consensus, • Don’t wait for everyone to catch up. • Measure and react • Independent teams coordinated with “messages”. • General-purpose “processors”, reassigned to today’s task