360 likes | 596 Views
Chapter 3. Distributed Data Processing. Data Centers. Centralized Data Processing. Data processing is done on one or on a cluster of computers located in a central data processing facility
E N D
Chapter 3 Distributed Data Processing
Centralized Data Processing • Data processing is done on one or on a cluster of computers located in a central data processing facility • Users transmit data to the centralized data processing facility where it is processed by applications running on the computers located there • The data processing for an application does not take place on the user’s computing device
Centralized Data Processing • Centralized Computers • One or more computers are located in a central facility • Centralized Processing • All applications are run on computers in the central data processing facility • Centralized Data • Most data are stored in files and databases at the central facility • Centralized Control • The central facility is managed by a data processing or information security manager • Centralized Support Staff • Must include a technical support staff to operate and maintain the data center hardware and applications
Distributed Data Processing (DDP) • Computers are dispersed throughout an organization • Objective is to process information in a way that is most effective based on operational, economic, and/or geographic considerations • May include a central data center plus satellite data centers or it may resemble a community of peer computing facilities • Various computers in the system must be connected to one another • A DDP facility involves the distribution of computers, processing, and data
Table 3.2Potential Benefitsof Distributed DataProcessing(page 1 of 2)
Table 3.2Potential Benefitsof Distributed DataProcessing(page 2 of 2)
Data Center Computing and Storage Technologies • Mainframes • Sales continue to be strong and they are increasingly being used as a hub for enterprise infrastructure because of their potential to enhance security, ensure availability, and improve manageability • In-memory computing systems • Processors include terabyte-plus RAM capable of storing large data sets • Has the potential to revolutionize business intelligence (BI) by making it possible to bring the equivalent data warehouse into memory to enable real-time data mining and business analytics
Virtualization • The creation of a virtual (rather than actual) version of something • In computing this means creating virtual versions of operating systems, servers, storage devices, and networks • Categories: • Operating system virtualization • Server virtualization • Storage virtualization • Network virtualization
Intranets • Provides users of client devices with applications associated with the Internet but isolated within the organization • Key features: • Uses Internet-based standards such as HyperText Markup Language (HTML) and the Simple Mail Transfer Protocol (SMTP) • Uses the TCP/IP protocol suite applications and services • Includes wholly owned content that is not accessible to external users over the public Internet • Such content can also be access by authorized internal users even though the corporation has Internet connections and runs a Web server on the Internet
Extranets • Makes use of TCP/IP protocols and applications, especially the Web • Distinguishing feature is that it provides access to corporate resources by authorized outside clients • This outside access can be provided via the company’s connections to the Internet or through other data communications networks • Enables authorized outside clients with fairly extensive access to corporate resources • Typical model of operation is client/server
Cloud Computing • Encompasses any subscription-based or pay-per-use service that extends an organization’s existing IT capabilities over the Internet in real time • Enables businesses to increase capabilities or capacity without investing in new infrastructure, licensing new software, or training personnel • Forms of cloud computing: • Software as a service (SaaS) • Infrastructure as a service (IaaS) • Platform as a service (PaaS) • Managed service providers (MSP)
Distributed Applications • Two dimensions characterize the distribution of applications • Allocation of application functions within the network • One application may be split up into components that are dispersed among multiple computers • One application may be replicated on different computers • Different applications may be distributed among different computers • Whether the distribution of the application is vertical or horizontal
Other Forms of DDP • Distributed devices • ATM machines • Factory automation • Network management • Centralized systems provide management and control of distributed nodes • At least some of the computers in the distributed system must include some management and control logic to enable them to interact with the central network management system
Database Management Systems (DBMS) • Database • A structured collection of data stored for use in one or more applications • In addition to data, a database contains the relationships between data items and groups of data items • DBMS • A suite of programs for constructing and maintaining the database and for offering ad hoc query capabilities to multiple users and applications • Query language • Provides a uniform interface to the database
Database Organization • Distributed database • A collection of several different databases, distributed among multiple computers, that looks like a single database to the user • DBMS controls access • Three ways of organizing data for use by an organization: • Centralized • Replicated • Partitioned
Centralized Versus Distributed Databases • Centralized • Housed in a central computer facility • Users and applications can be at a remote location • Desirable when the security and integrity of the data are paramount • Often used with a vertical DDP organization • Distributed • Design of data organization is more understandable and easier to implement • Data can be stored locally under local control • Confines the effects of a computer breakdown to its point of occurrence • Collection of data and the number of users is not limited by a single computer’s size and processing power
Table 3.6 Advantages and Disadvantages of Database Distribution Methods
Availability and Performance Availability Performance Response time is critically important for high interactive applications Network must have sufficient capacity and flexibility to provide the required response time If time is not critical, the major network performance concern is throughput The network must be designed to handle large volumes of data • The percentage of time that a particular function or application is available for users • Can be “desirable” or “essential” • High availability requirements • Distributed system must be designed so that the failure of a single computer or device within the network does not deny access to the application • Communications links and equipment must be highly available • Some form of link and communication equipment redundancy and backup is needed
Distributed applications • Other forms of DDP • Database management systems • Centralized versus distributed databases • Replicated and partitioned databases • Networking implications of DDP • Big data infrastructure considerations Summary • Centralized and distributed organization • Technical trends leading to distributed data processing • Management and organizational considerations • Data center evolution • Client/server architecture • Intranets and extranets • Web services and cloud computing • Chapter 3: Distributed Data Processing