750 likes | 922 Views
Building Scalable Web Architectures. Aaron Bannert aaron@apache.org / aaron@codemass.com. Goal. To build a reliable , scalable , cheap , flexible , extendable internet application. The Age of LAMP. What does a LAMP architecture give us?. Scalability. Grows in small steps
E N D
BuildingScalable Web Architectures Aaron Bannert aaron@apache.org / aaron@codemass.com
Goal To build a reliable, scalable, cheap, flexible, extendable internet application.
The Age of LAMP What does a LAMP architecture give us?
Scalability • Grows in small steps • Stays up when it counts • Can grow with your traffic • Room for the future
Reliability • High Quality of Service • Minimal Downtime • Stability • Redundancy • Resilience
Low Cost • Little or no software licensing costs • Minimal hardware requirements • Abundance of talent • Reduced maintenance costs
Flexible • Modular Components • Public APIs • Open Architecture • Vendor Neutral • Many options at all levels
Extendable • Free/Open Source Licensing • Right to Use • Right to Inspect • Right to Improve • Plugins • Some Free • Some Commercial • Can always customize
Free as in Beer? Price Speed Quality Pick any two.
The Glue • Routers • Switches • Firewalls • Load Balancers
Software Choices Building LAMP Software
External Caching Tier • What is this? • Squid • Apache’s mod_proxy • Commercial HTTP Accelerator
External Caching Tier • What does it do? • Caches outbound HTTP objects • Images, CSS, XML, HTML, etc… • Flushes Connections • Useful for modem users, frees up web tier • Denial of Service Defense
External Caching Tier • Hardware Requirements • Lots of Memory • Moderate to little CPU • Fast Network • Moderate Disk Capacity • Room for cache, logs, etc… (disks are cheap) • One slow disk is OK • Two Cheapies > One Expensive
External Caching Tier • Other Questions • What to cache? • How much to cache? • Where to cache (internal vs. external)?
Web Serving Tier • What is this? • Apache • thttpd • Tux Web Server • IIS • Netscape
Web Serving Tier • What does it do? • HTTP, HTTPS • Serves Static Content from disk • Generates Dynamic Content • CGI/PHP/Python/mod_perl/etc… • Dispatches requests to the App Server Tier • Tomcat, Weblogic, Websphere, JRun, etc…
Web Serving Tier • Hardware Requirements • Lots and lots of Memory • Memory is main bottleneck in web serving • Memory determines max number of users • Fast Network • CPU depends on usage • Dynamic content needs CPU • Static file serving requires very little CPU • Cheap slow disk, enough to hold your content
Web Serving Tier: Zero-copy • Performance Hint • Dedicated static content servers • Modern web servers are very good at serving static content such as • HTML • CSS • Images • Zip/GZ/Tar files
Web Serving Tier • Performance Hint • Stateless Sessions • Each connection is a fresh start • Server remembers nothing • Benefits? • Allows Better Caching • Scales Horizontally
Web Serving Tier • Choices • How much dynamic content? • When to offload dynamic processing? • When to offload database operations? • When to add more web servers?
Application Server Tier • What does it do? • Dynamic Page Processing • JSP • Servlets • Standalone mod_perl/PHP/Python engines • Internal Services • Eg. Search, Shopping Cart, Credit Card Processing
Application Server Tier • How does it work? • Web Tier generates the request using • HTTP (aka “REST”, sortof) • RPC/Corba • Java RMI • XMLRPC/Soap • (or something homebrewed) • App Server processes request and responds
Application Server Tier • Caveats • Decoupling of services is GOOD • Manage Complexity using well-defined APIs • Don’t decouple for scaling, change your algorithms! • Remote Calling overhead can be expensive • Marshaling of data • Sockets, net latency, throughput constraints… • XML, Soap, XMLRPC, yuck (don’t scale well) • Better to use Java’s RMI, good old RPC or even Corba
Application Server Tier • More Caveats • Remote Calling can introduce new failure scenarios • Classic Distributed Problems • How to detect remote failures? • How long to wait until deciding it’s failed? • How to react to remote failures? • What do we do when all app servers have failed?
Application Server Tier • Hardware Requirements • Lots and Lots and Lots of Memory • App Servers are very memory hungry • Java was hungry to being with • Consider going to 64bit for larger memory-space • Disk depends on application, typically minimal needed • FAST CPU required, and lots of them • (This will be an expensive machine.)
Database Tier • Available DB Products • Free/Open Source DBs • PostgreSQL • GNU DBM • Ingres • SQLite • Commercial • Oracle • MS SQL • IBM DB2 • Sybase • SleepyCat • MySQL • SQLite • mSQL • Berkeley DB
Database Tier • What does it do? • Data Storage and Retrieval • Data Aggregation and Computation • Sorting • Filtering • ACID properties • (Atomic, Consistent, Isolated, Durable)
Database Tier • Choices • How much logic to place inside the DB? • Use Connection Pooling? • Data Partitioning? • Spreading a dataset across multiple logical database “slices” in order to achieve better performance.
Database Tier • Hardware Requirements • Entirely dependent upon application. • Likely to be your most expensive machine(s). • Tons of Memory • Spindles galore • RAID is useful (in software or hardware) • Reliability usually trumps Speed • RAID levels 0, 5, 1+0, and 5+0 are useful • CPU also important • Dual power supplies • Dual Network
Internal Cache Tier • What is this? • Object Cache • What Applications? • Memcache • Local Lookup Tables • BDB, GDBM, SQL-based • Application-local Caching (eg. LRU tables) • Homebrew Caching (disk or memory)
Internal Cache Tier • What does it do? • Caches objects closer to the Application or Web Tiers • Tuned for your application • Very Fast Access • Scales Horizontally
Internal Cache Tier • Hardware Requirements • Lots of Memory • Note that 32bit processes are typically limited to 2GB of RAM • Little or no disk • Moderate to low CPU • Fast Network
Misc. Services (DNS, Mail, etc…) • Why mention these? • Every LAMP system has them • Crucial but often overlooked • Source of hidden problems
Misc. Services: DNS • Important Points • Always have an offsite NS slave • Always have an onsite NS slave • Minimize network latency • Don’t use NAT, load balancers, etc…
Misc. Services: Time Synchronization • Synchronize the clocks on your systems! • Hints: • Use NTPDATE at boot time to set clock • Use NTPD to stay in synch • Don’t ever change the clock on a running system!