770 likes | 876 Views
Computer Modeling and Visualization in Chemistry Spring 2007 211 Havemeyer, Columbia University Sat 10:00am – 12:30pm. Teachers: Eric Knoll, Li Tian and Robert Abel . Class Format:
E N D
Computer Modeling and Visualization in Chemistry Spring 2007 211 Havemeyer, Columbia University Sat 10:00am – 12:30pm Teachers: Eric Knoll, Li Tian and Robert Abel . Class Format: The 1st half of each class will consist of a lecture introducing you to the new material. The 2nd half will be a lab section where you can perform some of the calculations and visualizations on a computer. Many of the classes will introduce a new subject, and the course will briefly touch on a wide area of subjects in chemistry. Homework: Mostly NONE! Grading: No letter grade, but you do need to attend the classes to receive the certificate and remain in the program. Topics covered (tentative): 1. Structure of Molecules Atoms, electronic structure, bonding, molecular conformations. 2. Chemical Reactions Energies of reactions and reaction mechanisms. 3. Proteins and DNA The Protein Data Bank, protein folding, classification of proteins, gene therapy. 4. Molecular Modeling Molecular Mechanics, quantum mechanics, nanotechnology, supercomputers. The above topics may be altered depending on student interest in the given or other related subjects.
Class Description • This survey course is for students who are interested in chemistry, medicine, nanotechnology, computer science, or biotechnology and who want to discover real world applications of computer technology that go beyond typical undergraduate chemistry. The class will touch on advanced topics such as molecular mechanics and quantum chemistry, which are the foundations of the simulation software packages that are now standard research tools in areas such as organic chemistry, biochemistry and drug discovery. For the majority of the classes, students will get hands on experience using these software packages to visualize and study the structure and reactivity of organic molecules, proteins and DNA.
- Science Honors Program - Computer Modeling and Visualization in Chemistry Computational Science Eric Knoll
This presentation is for educational, non-profit purposes only. Please do not post or distribute this presentation to anyone outside of this course. Most of the slides in this presentation are from a course called “Parallel Computing” taught by Prof. David Keyes at Columbia University
Computational simulation := “a means of scientific discovery that employs a computer system to simulate a physical system according to laws derived from theory and experiment” Three “pillars” of scientific investigation • Experiment • Theory • Simulation (“theoretical experiments”)
“There will be opened a gateway and a road to a large and excellent scienceinto which minds more piercing than mine shall penetrate to recesses still deeper.” Galileo (1564-1642)(on ‘experimental mathematical analysis of nature’ appropriated here for ‘simulation science’)
An early visionary: L. F. Richardson from book on numerical weather prediction (1922)
How well can we predict the weather? Path of Hurricane Katrina 2005
Ocean SimulationsTsunami Sumatra-Andaman Earthquake Tsumi. >275,000 killed.
Experimental publication: except for the team that did the experiments, everybody believes it Computational publication: except for the team that did the computations, nobody believes it. The third pillar • Computational science has been the stuff of fantasy for generations (Galileo, Richardson, etc.) • Modern version, hosted on digital computers, has been foreseen for about 60 years (Von Neumann, 1946) • Aggressively promoted as a national agenda for about 15 years (Wilson, 1989) • Just now beginning to earn acceptance beyond the circle of practitioners
The “Grand Challenges” of Wilson • A Magna Carta of high performance computing (1989) • Supercomputer as “scientific instrument” • Attention to quality research indicators in computational science • Sample “grand challenge” – electronic structure • Prospects for computer technology • Why the NSF supercomputer centers
Wilson’s burden “In this paper, I address some of the tougher requirements on … grand challenge research to ensure that is has enduring value.” • Algorithm development • Error control • Software productivity • Fostering technological advances in computers
Wilson’s fear “… Often advocated is that because computers of a fixed performance are dropping rapidly in price, one should only buy inexpensive computers … expecting that today’s supercomputer performance will be achieved … in a few years’ time. This … would be terrible… It would violate the whole spirit of science, of pushing at the frontiers of knowledge and technology simultaneously.”
Wilson’s six examples • Weather prediction • Astronomy • Materials science • Molecular biology • Aerodynamics • Quantum field theory
Wilson’s six examples • Weather prediction • curse of dimensionality (r3 in space; r4 in time) • chaotic behavior • Astronomy • Materials science • Molecular biology • Aerodynamics • Quantum field theory
Wilson’s six examples • Weather prediction • Astronomy • need to escape limits of observational record • curse of dimensionality • Materials science • Molecular biology • Aerodynamics • Quantum field theory
Wilson’s six examples • Weather prediction • Astronomy • Materials science • Electronic structure problem: 3N-dimensional • Schroedinger way behind Newton, Maxwell • Molecular biology • Aerodynamics • Quantum field theory
Wilson’s six examples • Weather prediction • Astronomy • Materials science • Molecular biology • Conformation problem combinatorial • Protein folding “stiff” • Aerodynamics • Quantum field theory
Wilson’s six examples • Weather prediction • Astronomy • Materials science • Molecular biology • Aerodynamics • Turbulence • Full system analysis, full envelope analysis • Quantum field theory
Wilson’s six examples • Weather prediction • Astronomy • Materials science • Molecular biology • Aerodynamics • Quantum field theory • QED is perturbative • QCD is fundamentally nonlinear
A “perfect storm” for simulation (dates are symbolic) Hardware Infrastructure Applications 1686 scientific models A R C H I T E C T U R E S 1947 numerical algorithms 1976 computer architecture scientific software engineering “Computational science is undergoing a phase transition.” – D. Hitchcock, DOE
Movement towards simulation science • Standards • Tools: languages, libraries, interfaces, formats, templates • Results: validation and verification • Publications • Journals, e.g., IEEE/APS Computing in Science and Engineering • Book series, e.g., Springer’s LNCS&E • Degree programs • Approximately 50 US-based programs http://www.nhse.org/cse_edu.html • Birds-of-a-feather meetings at conferences
HPCC Bluebook (1992, OSTP) • Proposed 30% increase in federal support of HPCC (to $638M/yr) • Four major components: • High performance computing systems • Advanced Software Technology and Algorithms • National Research and Education Network • Basic Research and Human Resources
It’s not just government… • 200 of the “Top 500” computer systems in the world are operated by industry http://www.top500.org/ • 15 “Fortune 500” companies were sponsors of the NCSA • Banking: J.P. Morgan • Information: The Tribune Company • Insurance: Allstate • Manufacturing: Caterpillar, FMC, Kodak, Motorola • Merchandising: Sears • Petroleum: Phillips, Schlumberger, Shell • Pharmaceuticals: Lilly • Transportation: Boeing, Ford, SABRE
Computation vs. Theory • Computation is usually better for: • Generality (dimension, geometry, properties, boundary conditions) • Transferability of technique (to less expert users) • Theory is usually better for: • Compactness • Generalizability • Insight “The purpose of computing is insight, not numbers.” – R. W. Hamming
Computation vs. Experiment • Computation is usually better for: • Economy • Feasibility • Latency • Idealizations • Safety and/or political repercussions • Experiment is usually better for: • Reliability • Reality
Lexical soup of related terms • Computer science: the science of organizing and operating computers, including algorithms • Information science: the science of acquiring, converting, storing, retrieving, and conceptualizing information • Computational mathematics/numerical analysis: mathematics of computation, esp. focused on practical difference between real arithmetic and computer arithmetic and other resolution limitations of computers in performing well-defined mathematical operations • Computational Science (& Engineering): the science of using computers in pursuit of the natural science (& engineering), especially those aspects that are not specific to a particular discipline
Lexical soup of related terms, cont. • Scientific computing: a combination of computational science, numerical analysis, and computer architecture primarily concentrating on efficient and accurate algorithms for approximating the solution of operator (and other) equations • Computational “X” (where “X” is a particular natural or engineering science, such as physics, chemistry, biology, geophysics, fluid dynamics, structural mechanics, electrodynamics, etc.): a specialized subset of scientific computing concentrating on techniques and practices particular to problems from “X”, together with support technologies from CS&E
Clarifying examples • Computer science: architecture, systems software, data structures, algorithmic complexity, networks, software engineering, intelligent agents, profiling, benchmarking, performance modeling, performance tuning • Information science: data bases, data mining, data compression, pattern recognition • Computational mathematics: error analysis, algorithmic stability, convergence • Computational science: scientific visualization, computational steering, parallel partitioning and mapping, multidisciplinary computing
Case studies from O’Leary (1997) • Cellular radio transmission --- placement of transmitters in building to avoid dead spots (50% physics/engineering, 10% numerical analysis, 40% computer science) • Ray tracing, attenuation modeling • Image processing --- correction of Hubble images (25% astronomy, 25% signal processing, 25% mathematics, 25% computer science) • Large, ill-conditioned inverse problem • Information retrieval --- latent semantic indexing of large data bases (50% disciplinary field, 10% mathematics, 40% computer science) • Singular value decomposition • Smoke plume modeling --- predict spread of smoke and heat in burning building (25% physics/engineering, 50% mathematics, 25% computer science) • Large scale parallel, uncertainty quantification • What does Computational Chemistry Involve?
Moore’s Law In 1965, Gordon Moore of Intel observed an exponential growth in the number of transistors per integrated circuit and optimistically predicted that this trend would continue. It has. “Moore’s Law” refers to a doubling of transistors per chip every 18 months, which translates into performance, though not quite at the same rate.
Your laptop Lab engine Prefix review • “flop/s” means “floating point operations per sec”
Pipelining • Often, an operation (e.g., a multiplication of two floating point numbers) is done in several stages inputstage1stage2output • Each stage occupies different hardware and can be operating on a different multiplication • Like assembly lines for airplanes, cars, and many other products
Consider laundry pipelining Anne, Bing, Cassandra, and Dinesh must each wash (30 min), dry (40 min), and fold (20 min) laundry. If each waits until the previous is finished, the four loads require 6 hours.
Laundry pipelining, cont. If Bing starts his wash as soon as Anne finishes hers, and then Cassandra starts her wash as soon as Bing finishes his, etc., the four loads require only 3.5 hours. Note that in the middle of the task set, all three stations are in use simultaneously. For long streams, ideal speed-up approaches three – the number of available stations. Imbalance between the stages, and pipe filling and draining effects make actual speedup less.
Actually, each of these stages may be superpipelined further! IF IF IF RD RD RD OP OP OP AM AM AM WB WB WB Arithmetic pipelining • An arithmetic operation may have 5 stages • Instruction fetch (IF) • Read operands from registers (RD) • Execute operation (OP) • Access memory (AM) • Write back to memory (WB) Time Instructions …
Benefits of pipelining • Allows the computer to be physically larger • Signals need travel only from one stage to the next per clock cycle, not over entire computer
stall IF IF RD RD OP OP AM AM WB WB Problems with pipelining • Must find many operations to do independently, since results of earlier scheduled operations are not immediately available for the next; waiting may stall pipe • Conditionals may require partial results to be discarded • If pipe is not kept full, the extra hardware is wasted, and machine is slow Create “x” Consume “x”
Parallelism • Often, a large group of operations can be done concurrently, without memory conflicts • In our airplane example, each cell update involves only cells on neighboring faces • Cells that do not share a face can be updated simultaneously No purple cell quantities are involved in each other’s updates.
Parallelism in building a wall Each worker has an interior “chunk” of independent work, but workers require periodic coordination with their neighbors at their boundaries. One slow worker will eventually stall the rest. Potential speedup is proportional to the number of workers, less coordination overhead.
Benefits of parallelism • Allows the computer to be physically larger • If we had one million computers, then each computer would only have to do 8x109 operations per second • This would allow the computers to be about 3cm apart
Parallel processor configurations In the airplane example, each processor in the 3D array (left) can be made responsible for a 3D chunk of space. The global cross-bar switch is overkill in this case. A mesh network (below) is sufficient.
cpu cpu cpu cpu cpu cpu Mem Mem Mem Fast Interconnect Shared memory SMP & MPP paradigms Massively Parallel Processor (MPP) Symmetric Multi-Processor (SMP) Interconnect • two to hundreds of processors • shared memory • global addressing • thousands of processors • distributed memory • local addressing
Concurrency has also grown • DOE’s ASCI roadmap is to go to 100 Teraflop/s by 2006 • Variety of vendors engaged • Compaq • Cray • Intel • IBM • SGI • Up to 8,192 processors • Relies on commodity processor/memory units, with tightly coupled network
Japan’s Earth Simulator Bird’s-eye View of the Earth Simulator System Disks Cartridge Tape Library System Processor Node (PN) Cabinets 35.6 Tflop/s LINPACK Interconnection Network (IN) Cabinets Air Conditioning System 65m Power Supply System 50m Double Floor for IN Cables
New architecture on horizon: Blue Gene/L 180 Tflop/s configuration (65,536 dual processor chips) To be delivered to LLNL in 2004 by IBM
Four orders of magnitude in 13 years Gordon Bell Prize “peak performance”