480 likes | 688 Views
Multi-tier Architectures & Distributed Databases. CP3410 Daryle Niedermayer, I.S.P., PMP. Topics:. A history of database processing Dumb Terminals & Mainframes Client-Server Multi-tier Configurations The Need for Reliability New Hardware Configurations E-commerce Considerations
E N D
Multi-tier Architectures&Distributed Databases CP3410 Daryle Niedermayer, I.S.P., PMP
Topics: • A history of database processing • Dumb Terminals & Mainframes • Client-Server • Multi-tier Configurations • The Need for Reliability • New Hardware Configurations • E-commerce Considerations • Distributed Systems
A Brief History of Database Processing • Computers as a tool of modern business only took off in the late 1950’s/early 1960’s. • For the first 20 years (~1960-1980) databases sat on a large mainframe computer. Users connected directly to the mainframe using “dumb terminals.”
What are Dumb Terminals? • They are a monitor and a keyboard and a network connection • There is no hard-drive, no CPU • They can’t do work on their own • They know enough to connect to the mainframe • Data entered by a user is sent to the mainframe for processing • The mainframe sends the results back to the terminal to draw on the screen
They are a way for users to work on the mainframe while sitting in their own offices • All processing was done by the mainframe. The terminal was just an input/output device.
What were Dumb Terminals like? • Pros: • Very fast (for its day) • Easy • Good enough for the amount of data required (which wasn’t much) • Cons: • Reports were simple; not well formatted • Everyone got to watch a black screen with green printing all day.
Client-Server Architectures(aka 2-Tier Architectures) • 1980-present • With introduction of smart workstations and PC’s processing could be shared between the mainframe and the local terminal: • Early workstations included SUN, PDP-11s (and other DEC PDP minicomputers) • Eventually IBM-386s, 486s, Pentiums
Server Roles • Store the data • Organize, index and manipulate data • Manage contention and data concurrency • Receive and process queries and other operations
Client Roles • Decide what operation to ask the server to perform • Display and format data
Some notes on Client-Server • The client has some “smarts” (unlike the dumb terminals): • Using software it decides what data it needs from the server. • It asks for that data and receives the results. • It formats or uses the results for further processing. • Examples: MySQL Browser, MS-SQL
Pros: • Shares the processing between the server and client • Both sides can play to their strengths • Only the data that is needed goes over the network • Cons: • Takes more expensive hardware at the client end • Software can be more expensive (a copy for every workstation)
A note on MS Access • MS Access can look like a Client-Server, but it usually isn’t. • Most of the time, the database sits on a fileserver, not a database server. This means that the entire file must be downloaded to the local machine before Access can use any of it. This is not the way for a client-server to behave!
However, it is possible… • MS Access can be used as a client front-end with a full Database Server handling the server side. • MS-SQL or MySQL can serve as the “back-end” server. • MS Access on the client then connects to the server using an ODBC connection.
Multi-Tier Architectures(aka n-Tier Architectures) • 1990-Present • Came with the birth of the Internet and TCP/IP • TCP/IP gives us a way for machines to communicate regardless of what application they are using • N-Tier means more than 2-Tier
Internet Applications • Internet Applications are almost always N-Tier • Need to be very scalable (quickly grow capacity) • Need to have high-availability (it’s always business hours somewhere around the world) • Need to have strong security
The Need for Reliability • In the previous slide, there were multiple Application Servers • This allows for: • The system to respond to huge differences in traffic volumes • The system to still be available even if one server crashes • More servers can be added to meet demand
Other Redundancies • Although the diagram does not show it, additional firewalls and Proxy Servers can be added for redundancies as well.
High Availability Configurations • Hardware can be configured to have “High Availability” • HA means that the hardware itself will recover from a system problem without having to wait for human intervention. • Recovery typically takes under 15 seconds.
High Availability Appliances • Firewall Appliances and Proxy Servers usually have static configurations: • Their content and configurations do not change often; • Their content and configuration only change as a result of operator input
HA Firewalls • Both firewalls are powered on with identical configurations • A “heartbeat” signal is shared between them every few seconds • If the Standby Firewall does not get a heartbeat when expected, it takes over the IP address and traffic of the Active Firewall until an operator fixes the problem
HA Databases • HA Databases are much more difficult: • How do you take over the data when it changes all the time? • How do you take over in the middle of a transaction? • How do you take over the data if it is on a hard drive inside a disabled server?
SAN to the Rescue • Storage Area Networks (SANs) store data outside of a server. • They are huge racks of disk drives that are connected to a SAN controller. • The SAN controller along with its switches is known as “the Fabric” (you’ll see why). • The SAN controller itself is also mirrored in a HA configuration.
Together with its servers, a SAN is more of a “Fabric” than a network. Any failure is immediately recoverable through other connections
Capacity • A SAN can hold terabytes (1,000 Gb) or even petabytes (1,000,000 Gb) of data for dozens or even hundreds of servers at the same time. • SAN disks are usually configured in a RAID array so that disks are mirrored. This way, if one disk fails, the data is still on at least one other disk. • Connections usually Fibre-Optics rather than copper wires to ensure high bandwidth and transmission speeds.
Back to HA Databases… • If we put our data on a SAN rather than on a hard drive inside the DB server, we can still access the data even if the DB server itself fails. • A Stand-by server then just takes over the Fabric connections of the sick server as well as it’s IP Address and Network connections.
HA Clusters • Because we’re not failing over everything (since the data is on the SAN), the DB servers only need enough disk space to boot themselves up. • We call this configuration a “Cluster” and each physical server is a “Node” in the “Cluster”
Other Advantages of Clusters • Multiple Database servers can provide load balancing for each other • We can even have 3 or more nodes with 2 or more active and the last one serving as a spare for any of the others • By manually switching in the Standby server, the Active Server can be upgraded without taking a system outage
E-Commerce Considerations • In planning an E-commerce system, you need to consider the following: • If you’re customers are all over the world, you can never unplug your system for maintenance without losing customers. • You need to manage transactional integrity across multiple Application Servers.
Transactions need to be managed across multiple web pages: • During the first dozen pages, the user puts together their shopping cart • Then the user goes to “check-out.” This involves a few more pages as they input their identity, their shipping information, and their payment details. • What if they abandon the transaction? When do you rollback?
How do you protect customers’ data? • What personal information do you store about your customers in your database? • Do you store this information “in the clear” (plaintext) or encrypted so that no one else can make use of it if your system is cracked? • How do you protect your customer’s information from your own employees?
Distributed Systems • Imagine a Database Cluster that spans the globe: • One node is in London • One node is in Tokyo • One node is in Toronto • One node is in Doha • This is a Distributed Database Management System or DDBMS
Why DDBMS? • Communications used to be expensive • Rather than have 1000 employees all over the world connect over a 56K modem to a DBMS in London, we would pay for high speed connections between each DBMS node and then have users connect to their local node (at cheaper rates)
For Example: • A modem call over a telephone line from Doha to a non-GCC country costs about $0.90/minute. • If there are 100 users in Doha, this would cost $90/minute or $5400/hour for these users to connect. • It may be cheaper to put a database server in Doha and then synchronize the data over a high-speed line.
Why This Doesn’t Work… • Outside of Qatar, international telephone line charges are now about $0.02/minute. For 100 users, this works out to $120/hour which is certainly affordable if users need dial-up.
Why This Doesn’t Work (2) • As well, High Speed Internet costs have also dropped: • 512 Kbps (effectively about 380Kbps) costs about $60USD/month in Qatar. • In Canada, 6,500 Kbps costs about $45USD. • So, it’s not a problem for everyone to connect to the database in London.
Why This Doesn’t Work (3) • DDBMS also have a great deal of difficulty with: • Synchronizing data: How do you manage concurrency across thousands of miles and different networks and telephone companies? (You thought database locking on a local machine was hard)
Networking: • DDBMS require very high speed networks. There is a lot of data to be synchronized constantly • DDBMS need very fault tolerant networks. Network paths between nodes need to be redundant and reliable • These networks are very, very expensive • Security: How do you make sure the data is being transmitted between nodes securely?
Increased Storage: You are copying data in every location. This requires duplicate hardware (SANs are not cheap) and a lot of extra disk space. • Increased demand for very specialized expertise. The knowledge of how to look after a DDBMS is not easy to come by. These people are in demand.
Where a DDBMS Makes Sense • When you can copy the same metadata across all systems but the actual data is geographically specific. • Eg. The customer and employee data for Qatar is stored in Doha and no where else; the customer and emplyee data for Europe is stored in London and no where else. If an employee transfers from London, his record is physically moved from the London to the Doha database server.
Other uses of DDBMS • Disaster Recovery planning for some HA Financial Systems as well as public health and safety systems: • Credit Card authorizations (Visa, MasterCard) • Banking Systems (ATMs) • Public Utilities (999 service and Telephone companies) • Air Traffic Control Systems
Assessment of DDBMS • There are very few reasons to have a DDBMS • They are expensive to set up and run • They have problems in managing data synchronization (making sure that all the data is up to date in all nodes) • There are usually better, cheaper options to share the data across a large geographic area.
Acknowledgements • SAN photograph:www.nasi.com/images/IBM_SAN256M.jpg • SAN Configuration:http://www.microsoft.com/library/media/1033/technet/images/itsolutions/wssra/raguide/storagedevices/igsdpg03_big.gif