330 likes | 340 Views
COSC 3P93 Seminar: . Distributed Computing. Brandon Visser. Distributed Computing. Seminar Overview:. Distributed Computing: What is it? Why it’s so useful It’s relation to the world of Parallel Computing How Distributed Computing Works Different DC ‘architectures’
E N D
COSC 3P93 Seminar: Distributed Computing Brandon Visser
Distributed Computing Seminar Overview: • Distributed Computing: What is it? • Why it’s so useful • It’s relation to the world of Parallel Computing • How Distributed Computing Works • Different DC ‘architectures’ • Good DC Problems; Bad DC Problems • Applications of Distributed Computing • The Future of DC Computing
Before we start: • Distributed Computing vs Grid Computing vs P2P • Grid Computing: • Computational grid • Usually focused on Dedicated workstations, servers, and mainframes • Huge datasets that run for days • Distributed Computing: • ‘subset’ of Grid Computing • Geared to pooling resources networked end-user PCs • Much more Limited in memory/CPU power • Primary usage: not distributed computing; but serving their user • Peer to Peer (P2P): • Network computing system in which all pc’s are treated as equal on the network • May share resources such as Hard Drives, CD Roms, etc • P2P: protocol for sharing MP3’s and other media over the Internet • Kazaa/Napster
Distributed Computing: What is it? • many ways to define Distributed Computing; • Been around for years • Various vendors • General DC Definition: “Distributed Computing is any computing that involves multiple computers remote from each other that each have a role in a computation problem or information processing.” • Seminar will focus on DC systems distributed across the internet • Recent technological jumps have made DC more attractive • Increased bandwidths • Extremely Fast CPUs
Why Distributed Computing? • Distributing a problem over a large network has many advantages: • Easy on the wallet • Seti@home Faq • Reliability • Raw Performance
Why Distributed Computing? (contd) • Case Study: Brock University • Utilization of Brock’s User Services Lab Computers: • Several Computer labs all containing P4 computers averaging speed between 1.6-2.4 GHz
Why Distributed Computing? (contd) • 417 Intel P4 computers with an average speed of 1.8 GHz • Using commercial DC software, equivelent to speed of: Source: http://www.ud.com ; United Devices
Why Distributed Computing? (contd) Source: www.extremetech.com
Why Not Distributed Computing? • However, we must not fool ourselves into thinking it’s the best parallel solution for any application • Central Server still needed for coordination • Finding client machines is not an automatic process • Data dependencies • Slow communication channels compared to typical parallel architectures
DC’s Relation to Parallel Computing (Contd) • Similar in concept to a Parallel Computing, but we must distinguish between the two • Parallel computing has the advantage over Distributed computing because of the close range of the processors • Communication between processors much faster • Better suited then DC for problems requiring inter-processor communication and dependent variables
DC’s Relation to Parallel Computing (contd) • Beginning to see support for parallel machines • Windows XP now has support for up to 2 CPU’s • Linux/Unix – Many CPU’s. • For now, not widespread, and Applications must be programmed with multiple CPU’s in mind. This can create platform dependencies. • Doom 3, Quake 3, Adobe Photoshop • Clustering: grouping of workstations connected together in a local-area network with applied middleware to make them act like a parallel machine.
DC’s Relation to Parallel Computing (contd) • Beowulf is the most popular example of a clustering system • Runs on Linux/Unix systems • Inexpensive form of parallel computing • Support for these systems are still fairly limited • Closest parallel architecture example to DC
How Distributed Computing Works • DC Systems Today consist of: • Lightweight software agents • Dedicated DC Management Servers • Role of Client End: • Agent notifies server when system is idle (often a screen saver) • Agent requests data from server • Computes when it has spare CPU cycles • Control given back to user immediately upon input from mouse or keyboard
How Distributed Computing Works (contd) • Important that control is returned as soon as user requests • Any delay would likely be unacceptable • Role of Distributed Computing Management Server • Divide large tasks into smaller tasks • Monitor jobs currently being run • Receive results from clients and assemble • Usually a database would help with this • If a server doesn’t hear from a client for a long time, it can: • Assumes user on machine • Send same package to another client
How Distributed Computing Works (contd) • Other things to keep in mind • Architecture requirements increase with size of network • Server • Client • Network • Security and authentication • Resource identification • Know client PC characteristics
Distributed Computing Architectures: • Several different solutions for DC available • Some commercial, some Open sourced • Current Vendors of DC Systems: • Entropia • Data Synapse • Sun • Parabon • Avaki • United Devices
Distributed Computing Architectures: • We will take a look at two types of Architectures • Entropia • DataSynapse’s LiveCluster • Entropia’s System: • Known as a “Hub and Spoke” with the Server at the hub. • No communication between individual nodes • Data communicated back and forth between server and clients as batch jobs • Works on virtually any computer with a connection to the internet (Dial up or dedicated line)
Distributed Computing Architectures (contd): Picture from www.entropia.com
Distributed Computing Architectures (contd): • Livecluster • Inter-client Communication as well as communications between client and server • Inter-client communication comes in 20 ms ‘bursts’ • Advantage of this: • Applications can be divided into tasks that have mutual dependencies • Takes some load off server • Drawbacks • Most effective on internal network or broadband internet.
Distributed Computing Architectures (contd): Picture from www.datasynapse.com
Distributed Computing Problems: Bad DC Problems: • “The closer an application is to running in real time, the less appropriate DC is” http://www.extremetech.com • Systems that run for only a couple of hours may not see much of a benefit from DC • overhead
Distributed Computing Problems (contd): • Good Dc Problems • Most appropriate applications are those which exhibit “loosely coupled, non-sequential tasks in batch processes with a high compute-to-data ratio." www.entropia.com • High compute-to-communication ratio also important • Any problem that fully extends the Course Grain Parallelism principle: “it should be possible to partition the application into independent tasks or processes that can be computed concurrently” http://www.extremetech.com
Distributed Computing Problems (contd): • Examples of good DC Problems: • Complex Modeling and Simulation techniques • Car crash simulations • Weather forecasting • AI: Exhaustive Search techniques • Life Sciences: • sequencing the human genome “ As a result of sequencing the human genome, the number of identifiable biological targets for today's drugs is expected to increase from about 500 to about 10,000. Pharmaceutical firms have repositories of millions of different molecules and compounds, some of which may have characteristics that make them appropriate for inhibiting newly found proteins. The process of matching all these "ligands" to their appropriate targets is an ideal task for distributed computing, and the quicker it's done, the quicker and greater the benefits will be. Another related application is the recent trend of generating new types of drugs solely on computers.” http://www.extremetech.com
Applications of Distributed Computing • Commercial and Non-Commercial • Commercial • Market themselves to any corporation, engineer or scientist who needs to crunch huge amounts of numbers but cannot afford a super computer • Often company promises to “pay” the end system’s users for borrowing wasted CPU cycles • Several Commercial DC Companies: • United Devices • http://www.ud.com • Parabon Computation • http://www.parabon.com/
Applications of Distributed Computing (contd) • A quick word on For-Profit or Commercial DC: • Concerns as to viability of for-profit Distributed Computing • Anyone would choose to “Get paid” for running DC software • Process Tree Network • Started a for pay DC system in January 2001 • Paid clients $12.50 per month to run their software • Parent company went bankrupt in may 2001 (lack of funding) • Perhaps Distributed Computing is best served for non-profit purposes • Personal Hobbies and interests.
Applications of Distributed Computing (contd) • DC has seen most success in volunteer-based projects • SETI@Home • Arguably most successful active project • Search for Extra Terrestrial Intelligence • Google Toolbar! • Folding@Home • Simulates protein folding • Supported by Intel • 150 000 active CPU’s • United Devices Cancer Research • distributed.net • Cryptography • Complete List of current and finished Projects: http://distributedcomputing.info/projects.html
Applications of Distributed Computing (contd) • Let’s take a closer look at Seti@Home • Homepage: http://setiathome.ssl.berkeley.edu/ • Project Statistics:
Applications of Distributed Computing (contd) • Notable Completed Projects: • RSA Factoring By Web • First large scale project to factor a 130 digit number • Completed on April 10, 1996 • Internet Animation ’99 • Proof of concept • Using DC system as a render farm • Used nothing more than email and a web page • Completed August, 1999
Applications of Distributed Computing (contd) • Safer Markets Project • Ran on entropia platform • April 2001-Jan 2002 • Goal was to find a formula which could predict stock market volatility • Soon as project ended, the site was taken down and the url was forwarded to entropia’s homepage.
Future of Distributed Computing • Distributed Computing is becoming recognized as a practical platform for solving large computational problems • Some of the biggest names in the industry are getting their feet wet and currently in the news: • IBM • World Community Grid Project • Intel • Intel Peer-to-Peer Accelerator Kit • Middleware for DC Applications
Future of Distributed Computing (contd) • Eventually, inter-communication between client nodes on large projects over the internet • Currently we share information over the net with projects such as Seti@home, but not computational resources: Too risky Thanks for your time!
Useful Links and Resources • DC Central • http://library.thinkquest.org/C007645/english/0-welcome.htm • Wikipedia • http://en.wikipedia.org/wiki/Distributed_computing • Distributed.net • http://www.distributed.net • Extreme Tech website • http://www.extremetech.com/ • Distributed Computing • http://distributedcomputing.info/