770 likes | 928 Views
Building Scalable Web Architectures. Aaron Bannert aaron@apache.org / aaron@codemass.com. http://www.codemass.com/~aaron/presentations/ apachecon2005/ac2005scalablewebarch.ppt. Goal. How do we build a massive web system out of commodity parts and open source?. Agenda. LAMP Overview
E N D
BuildingScalable Web Architectures Aaron Bannert aaron@apache.org / aaron@codemass.com http://www.codemass.com/~aaron/presentations/ apachecon2005/ac2005scalablewebarch.ppt
Goal • How do we build a massive web system out of commodity parts and open source?
Agenda • LAMP Overview • LAMP Features • Performance • Surviving your first Slashdotting • Growing above a single box • Avoiding Bottlenecks
LAMP Overview Architecture
L A M P Linux Apache MySQL PHP (Perl?)
External Caching Tier • What is this? • Squid • Apache’s mod_proxy • Commercial HTTP Accelerator
External Caching Tier • What does it do? • Caches outbound HTTP objects • Images, CSS, XML, HTML, etc… • Flushes Connections • Useful for modem users, frees up web tier • Denial of Service Defense
External Caching Tier • Hardware Requirements • Lots of Memory • Fast Network • Moderate to little CPU • Moderate Disk Capacity • Room for cache, logs, etc… (disks are cheap) • One slow disk is OK • Two Cheapies > One Expensive
External Caching Tier • Other Questions • What to cache? • How much to cache? • Where to cache (internal vs. external)?
Web Serving Tier • What is this? • Apache • thttpd • Tux Web Server • IIS • Netscape
Web Serving Tier • What does it do? • HTTP, HTTPS • Serves Static Content from disk • Generates Dynamic Content • CGI/PHP/Python/mod_perl/etc… • Dispatches requests to the App Server Tier • Tomcat, Weblogic, Websphere, JRun, etc…
Web Serving Tier • Hardware Requirements • Lots and lots of Memory • Memory is main bottleneck in web serving • Memory determines max number of users • Fast Network • CPU depends on usage • Dynamic content needs CPU • Static file serving requires very little CPU • Cheap slow disk, enough to hold your content
Web Serving Tier • Choices • How much dynamic content? • When to offload dynamic processing? • When to offload database operations? • When to add more web servers?
Application Server Tier • What does it do? • Dynamic Page Processing • JSP • Servlets • Standalone mod_perl/PHP/Python engines • Internal Services • Eg. Search, Shopping Cart, Credit Card Processing
Application Server Tier • How does it work? • Web Tier generates the request using • HTTP (aka “REST”, sortof) • RPC/Corba • Java RMI • XMLRPC/Soap • (or something homebrewed) • App Server processes request and responds
Application Server Tier • Caveats • Decoupling of services is GOOD • Manage Complexity using well-defined APIs • Don’t decouple for scaling, change your algorithms! • Remote Calling overhead can be expensive • Marshaling of data • Sockets, net latency, throughput constraints… • XML, Soap, XMLRPC, yuck (don’t scale well) • Better to use Java’s RMI, good old RPC or even Corba
Application Server Tier • More Caveats • Remote Calling introduces new failure scenarios • Classic Distributed Problems • How to detect remote failures? • How long to wait until deciding it’s failed? • How to react to remote failures? • What do we do when all app servers have failed?
Application Server Tier • Hardware Requirements • Lots and Lots and Lots of Memory • App Servers are very memory hungry • Java was hungry to being with • Consider going to 64bit for larger memory-space • Disk depends on application, typically minimal needed • FAST CPU required, and lots of them • (This will be an expensive machine.)
Database Tier • Available DB Products • Free/Open Source DBs • PostgreSQL • GNU DBM • Ingres • SQLite • Commercial • Oracle • MS SQL • IBM DB2 • Sybase • SleepyCat • MySQL • SQLite • mSQL • Berkeley DB
Database Tier • What does it do? • Data Storage and Retrieval • Data Aggregation and Computation • Sorting • Filtering • ACID properties • (Atomic, Consistent, Isolated, Durable)
Database Tier • Choices • How much logic to place inside the DB? • Use Connection Pooling? • Data Partitioning? • Spreading a dataset across multiple logical database “slices” in order to achieve better performance.
Database Tier • Hardware Requirements • Entirely dependent upon application. • Likely to be your most expensive machine(s). • Tons of Memory • Spindles galore • RAID is useful (in software or hardware) • Reliability usually trumps Speed • RAID levels 0, 5, 1+0, and 5+0 are useful • CPU also important • Dual power supplies • Dual Network
Internal Cache Tier • What is this? • Object Cache • What Applications? • Memcache • Local Lookup Tables • BDB, GDBM, SQL-based • Application-local Caching (eg. LRU tables) • Homebrew Caching (disk or memory)
Internal Cache Tier • What does it do? • Caches objects closer to the Application or Web Tiers • Tuned for your application • Very Fast Access • Scales Horizontally
Internal Cache Tier • Hardware Requirements • Lots of Memory • Note that 32bit processes are typically limited to 2GB of RAM • Little or no disk • Moderate to low CPU • Fast Network
Misc. Services (DNS, Mail, etc…) • Why mention these? • Every LAMP system has them • Crucial but often overlooked • Source of hidden problems
Misc. Services: DNS • Important Points • Always have an offsite NS slave • Always have an onsite NS slave • Minimize network latency • Don’t use NAT, load balancers, etc…
Misc. Services: Time Synchronization • Synchronize the clocks on your systems! • Hints: • Use NTPDATE at boot time to set clock • Use NTPD to stay in synch • Don’t ever change the clock on a running system!
Misc. Services: Monitoring • System Health Monitoring • Nagios • Big Brother • Orcalator • Ganglia • Fault Notification
The Glue • Routers • Switches • Firewalls • Load Balancers
Routers and Switches • Expensive • Complex • Crucial Piece of the System • Hints • Use GigE if you can • Jumbo Frames are GOOD • VLans to manage complexity • LACP (802.3ad) for failover/redundancy
Load Balancers • What services to balance? • HTTP Caches and Servers, App Servers, DB Slaves • What NOT to balance? • DNS • LDAP • NIS • Memcache • Spread • Anything with it’s own built-in balancing
Message Busses • What is out there? • Spread • JMS • MQSeries • Tibco Rendezvous • What does it do? • Various forms of distributed message delivery. • Guaranteed Delivery, Broadcasting, etc… • Useful for heterogeneous distributed systems
What about the OS? Operating System Selection
Lots of OS choices • Linux • FreeBSD • NetBSD • OpenBSD • OpenSolaris • Commercial Unix
What’s Important? • Maintainability • Upgrade Path • Security Updates • Bug Fixes • Usability • Do your engineers like it? • Cost • Hardware Requirements • (you don’t need a commercial Unix anymore)
Features to look for • Multi-processor Support • 64bit Capable • Mature Thread Support • Vibrant User Community • Support for your devices
The Age of LAMP What does LAMP provide?
Scalability • Grows in small steps • Stays up when it counts • Can grow with your traffic • Room for the future
Reliability • High Quality of Service • Minimal Downtime • Stability • Redundancy • Resilience
Low Cost • Little or no software licensing costs • Minimal hardware requirements • Abundance of talent • Reduced maintenance costs
Flexible • Modular Components • Public APIs • Open Architecture • Vendor Neutral • Many options at all levels
Extendable • Free/Open Source Licensing • Right to Use • Right to Inspect • Right to Improve • Plugins • Some Free • Some Commercial • Can always customize