230 likes | 467 Views
Cluster Computing in the Classroom: Topics, Guidelines, and Experiences. Amy Apon Department of Computer Science & Computer Engineering University of Arkansas. Clusters and Data Engineering.
E N D
Cluster Computing in the Classroom: Topics, Guidelines, and Experiences Amy Apon Department of Computer Science & Computer Engineering University of Arkansas
Clusters and Data Engineering • A cluster is a set of whole computers connected via a network, and used as an integrated resource to solve a single application • Increase throughput for massive data processing • Inexpensive - uses commodity computers with lots of disks and disk space
Teaching Challenges • Prerequisites are difficult to establish • One course does not fit all! We propose: • Cluster teaching material organized as modules • Accessible to a variety of situations
Outline • Overview of target audience for the proposed teaching materials • Description of course modules • Problem areas • Conclusions • Acknowledgements and references
Courseware developed with: Dr. Amy Apon Dr. Jens Mache Dr. Hai Jin Dr. Rajkumar Buyya
Who our students are • Juniors, seniors, graduate students • With a variety of preparation • Operating Systems? • Maybe haven’t seen threads • Computer Networks? • Maybe haven’t seen sockets • Computer Architecture? • Maybe don’t understand how cache works
Course Units • Needed because of the diversity of institutions and student preparation • Matched to the Computing Curricula 2001 to avoid overlap with existing courses • Basic Units (have overlap with ACM Core) • Core Units (essential to cluster computing) • Extended Units (more advanced, optional)
Course Units Can Be Combined • We propose sample courses with an emphasis in one of • Architecture • Programming • Algorithms and Applications
Five Basic Units • Programming Fundamentals (PF2, PF5) • Algorithms and problem-solving • Event-driven programming (3 hours total) • Architecture and Organization 4 (AR4) • Memory system organization (1 hour) • Architecture and Organization 7 (AR7) • Multiprocessing architectures (1 hour) • Operating Systems 3 (OS3) • Concurrency (1 hour) • Net-Centric Computing 2 (NC2) • Communication and networking (2 hours)
Ten Core Units • Algorithms and Complexity 4 (AL4) • Distributed algorithms (1 hour) • Algorithms and Complexity 11 (AL11) • Parallel algorithms (3 to 7 hours) • Architecture and Organization 7 (AR7) • Multiprocessing and alternative architectures (2 hours) • Architecture and Organization 9 (AR9) • Architectures for networks & distributed systems (1-4 hours) • Operating Systems 11 (OS11) • System performance evaluation (1-2 hours)
Ten Core Units, continued • Net-Centric Computing 2 (NC2) • Communication and networking (1 hour) • Net-Centric Computing 6 (NC6) • Network management (1-2 hours) • Social and Professional Issues 9 (SP9) • Economic issues in computing (2 hours) • Software Engineering 2 (SE2) • Using API’s: Basic MPI or PVM, basic PVFS (2 hours) • Computational Science 4 (CN4) • High-performance computing (6 or more hours)
Many Choices for Extended Units! • Software Engineering (SE3), Software tools and environments • Debugging tools • Operating Systems (OS8) • Parallel file systems • Algorithms (AL11) • Advanced parallel algorithms. • Architecture and Organization (AR9) • Architecture for networks and distributed systems • Graphics and Visualization (GV9) • Intelligent Systems 4 (IS4), Advanced search • Information Management (IM8, IM9, IM10, IM11) • Distributed databases, physical database design, data mining, and information storage and retrieval on clusters • Computational Science (CN1, CN3)
Cluster Architecture Emphasis • Similar requirements as for a course in advanced computer architecture • Suited for advanced undergraduates and graduate students who have completed • Computer organization • Computer networks • Operating systems • Programming
Programming Emphasis • Suited for undergraduates with exposure to • Data structures and algorithms • Computer organization • Can use general access computer lab/LAN (if performance is not an issue) • Can use generally available programming environments
Cluster Programming Topics • Shared memory programming • Leading to a discussion of NUMA • Sockets • Leading to discussion about network overhead, low-latency protocols • Parallel programming using MPI • Middleware: Java RMI, CORBA
Algorithms and Applications • Suited for • Advanced undergraduate with a strong algorithms and programming background • Graduate students • Can be • Parallel algorithms • With a focus on topics from a particular domain
Algorithms and Applications Topics • Application Overview • Compression, data mining, image rendering, genetic algorithms,… • Techniques of Algorithm design • Partitioning, divide and conquer, communication and synchronization, … • Modeling and visualization • Performance tuning
Classroom Favorites • Build your own cluster • Using old lab machines, install PVM or MPI • Parallel matrix multiply, sort • Implement these using MPI, evaluate the performance using data of varying size, present results graphically • Term programming project • Can have students select their own!
Problem Areas • Cluster setup and administration • Cluster usage (especially for performance experiments) • Security
Conclusions • Cluster computing is a low cost approach to massive data processing • Cluster computing can be taught at the undergraduate level • Modules help to organize the material so that it is appropriate for your institution • Modules can be mixed and matched
References and Acknowledgements • Cluster Computing in the Classroom: Topics, Guidelines, and Experiences • by Amy Apon, Rajkumar Buyya, Hai Jin, Jens Mache, First International Workshop on Cluster Computing Education, Cluster.Edu 2001 • See http://citeseer.nj.nec.com/395286.html