210 likes | 248 Views
Distributed Information Systems - an introductory discussion. Topics. What is a Distributed Information System (DIS)? Is a distributed information system a distributed database system? Is a distributed database system a distributed information system?
E N D
Distributed Information Systems- an introductory discussion DIS
Topics • What is a Distributed Information System (DIS)? • Is a distributed information system a distributed database system? • Is a distributed database system a distributed information system? • Related systems (parallel systems, cloud systems, multidatabase systems, virtual systems, …) DIS
Distributed Information System https://www.igi-global.com/dictionary/overview-ontology-driven-data-integration/8063 • A set of information systems physically distributed over multiple sites, which are connected with some kind of communication network. • A system where, applications (cooperative among one another) stay on different elaborative nodes and the information property, unique, is hosted on different elaborative nodes. DIS
Distributed Computing https://en.m.wikipedia.org/wiki/Distributed_computing • Distributed computing is a field of computer science that studies distributed systems. • A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. • A computer program that runs within a distributed system is called a distributed program. • Distributed programming is the process of writing such programs. DIS
Three significant characteristics of distributed systems • concurrency of components • lack of a global clock • independent failure of components. DIS
https://www.ibm.com/support/knowledgecenter/en/SSAL2T_8.2.0/com.ibm.cics.tx.doc/concepts/c_wht_is_distd_comptg.htmlhttps://www.ibm.com/support/knowledgecenter/en/SSAL2T_8.2.0/com.ibm.cics.tx.doc/concepts/c_wht_is_distd_comptg.html • A distributed computer system consists of multiple software components that are on multiple computers, but run as a single system. • A distributed system can consist of any number of possible configurations, such as mainframes, personal computers, workstations, minicomputers, and so on. • The goal of distributed computing is to make such a network work as a single computer. DIS
Benefits of DISover centralized systems • Scalability • The system can easily be expanded by adding more machines as needed. • Redundancy • Several machines can provide the same services, so if one is unavailable, work does not stop. Additionally, because many smaller machines can be used, this redundancy does not need to be prohibitively expensive. DIS
Distributed Database Management System (DDBMS) https://www.techopedia.com/definition/14686/distributed-database-management-system-ddbms • A distributed database management system (DDBMS) is a set of multiple, logically interrelated databases distributed over a network. They provide a mechanism that makes the distribution of data transparent to users. • A distributed database management system is designed for heterogeneous database platforms that focus on heterogeneous database management systems. DIS
A good tutorial on DIS https://www.freecodecamp.org/news/a-thorough-introduction-to-distributed-systems-3b91562c9b3c/ • A distributed system in its most simplest definition is a group of computers working together as to appear as a single computer to the end-user. DIS
These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime. • What a distributed system enables you to do is scale horizontally. DIS
Scaling horizontally means adding more computers rather than upgrading the hardware of a single one. DIS
Limit of Vertical Scaling • Vertical scaling can only bump your performance up to the latest hardware’s capabilities. • These capabilities prove to be insufficient for technological companies with moderate to big workloads. DIS
Benefits of DIS • Easy Scaling • Fault Tolerance — A cluster of ten machines across two data centers is inherently more fault-tolerant than a single machine. Even if one data center catches on fire, your application would still work. • Low latency • Distributed systems allow you to have a node in a location closer to the client, allowing the round-trip time of the traffic to be reduced. DIS
Issues and Challenges in Building Distributed Systems? • Maintaining consistency of information across the multiple nodes • For example, propagating the new information from the master to the slave does not happen instantaneously. • Trade-offs among consistency, availability, and partition-tolerant. That is, the CAP Theorem in distributed databases. DIS
CAP Theorem • The CAP theorem states that a distributed system cannot simultaneously be consistent, available, and partition-tolerant. • See https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/ for the illustrated proof. DIS
CAP Theorem • Consistency: Any read operation that begins after a write operation completes must return that value, or the result of a later write operation. • Once a client writes a value to any server and gets a response, it expects to get that value (or a fresher value) back from any server it reads from. DIS
CAP Theorem • Availability: Every request received by a non-failing node in the system must result in a response. • In an available system, if a client sends a request to a server and the server has not crashed, then the server must eventually respond to the client. The server is not allowed to ignore the client's requests. DIS
CAP Theorem • Partition-tolerance: The network will be allowed to lose arbitrarily many messages sent from one node to another. • That is, any messages G1 and G2 send to one another can be dropped. • Our system has to be able to function correctly despite arbitrary network partitions in order to be partition tolerant. • The CAP Theorem: A system cannot simultaneously have all three. • Proof by contradiction: see https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/ DIS
False assumptions that everyone makes when developing a distributed application for the first time: http://barbie.uta.edu/~jli/Resources/MapReduce&Hadoop/Distributed%20Systems%20Principles%20and%20Paradigms.pdf • The network is reliable. • The network is secure. • The network is homogeneous. • The topology does not change. • Latency is zero. • Bandwidth is infinite. • Transport cost is zero. • There is one administrator. DIS
TYPES OF DISTRIBUTED SYSTEMS • Distributed Computing Systems • Cluster Computing Systems • Grid Computing Systems • Distributed Information Systems • Transaction Processing Systems • Enterprise Application Integration • Distributed Pervasive Systems • Distributed systems of mobile and embedded computing devices • Instability is the default behavior. • Examples: Home Systems, (Personal) Electronic Health Care Systems, Sensor Networks • Others? DIS