1 / 29

Workshop: Scientific Computing at CIRA* John Forsythe / Andy Jones

Workshop: Scientific Computing at CIRA* John Forsythe / Andy Jones. Purpose: To Provide Answers to…. Part I: Lecture / Discussion CIRA Policies What are some good programming habits? What is the right language for the job? How do I read [XYZ] data? Why is my program so slow?

moesha
Download Presentation

Workshop: Scientific Computing at CIRA* John Forsythe / Andy Jones

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workshop: Scientific Computing at CIRA* John Forsythe / Andy Jones

  2. Purpose: To Provide Answers to… • Part I: Lecture / Discussion • CIRA Policies • What are some good programming habits? • What is the right language for the job? • How do I read [XYZ] data? • Why is my program so slow? • What are some recent and future developments in scientific programming? • Part II: Interactive Discussion • How do I create a program under Windows? • Why would I want to use the debugger, I’ve never needed it before? • IDL and graphics examples • Where can I find a library or numerical algorithm I need? This is not to learn the syntax of a language!

  3. People: • Karll Renken – Software, systems, IDL license • Steve Finley – Linux administration • Mike Hiatt – Hardware, software • Steve Miller – Policy questions • CIRA Policies: • CIRA infrastructure pages online (?) • YOU ARE RESPONSIBLE FOR BACKING UP YOUR SYSTEM!!! • YOU ARE RESPONSIBLE FOR BACKING UP YOUR SYSTEM!!! • Use DVD’s for entire system, automated tools for code / writing.

  4. Technical Skills List for CIRA Research Associates • What is a color table? How do I change it? Where do I get color tables? • What is a digital image? What is the difference between a satellite image for display and the actual data? • How do I draw a map? How do I add it over an image? Where do I get a map database? • Plotting numerical data: IDL vs. Excel, which is the right tool at the right time? In particular, know how to plot: scatter plot; regression with statistics; histogram with different bin sizes; plot only data where some condition is met • What is the difference between vector and raster graphics? • How do I use the debugger? In Developer Studio? In IDL? How to set breakpoints. • How do I view an HDF-EOS file? How do I look at the data as ASCII and as an image? • What does it mean to remap data? What is the difference between swath and remapped data? • How do I run Unix commands using MKS Toolkit? • FTP, moving data, reading various media (DVD, 8mm, DLT…). • What are these data formats used for and how do I read them: • HDF-EOS; McIDAS; GIF; JPEG; GriB; BUFR, ASCII; netCDF

  5. Rules for Good Programming – Regardless of Language - Separate computations and visualization!!! - Write functions and subroutines when possible - Make it right before you make it faster. - Make it fail-safe before you make it faster. - Make it clear before you make it faster. - Don't sacrifice clarity for small gains in "efficiency". - Let your compiler do the simple optimizations. - Don't strain to reuse code; reorganize instead. - Make sure special cases are truly special. - Keep it simple to make it faster. - Don't fiddle with code to make it faster - find a better algorithm. - Instrument your programs. Measure before making "efficiency" changes. You will get more life from your code by following these rules Some Adapted from Donald Knuth

  6. What is the right language for the job? A) Computation Fortran is used for: Forecast models, satellite data processing, radiative transfer, DPEAS. WRF is written in Fortran 90… C/C++ is used for: Groundstation code, linking with Fortran for special libraries (e.g. HDF-EOS). Common in “commercial” world where CS degreed employees are hired. Java is used for: Web pages, image animation tool Excel is used for: Simple plots of ASCII data Under Windows, it is easy to use operating system functions from all languages (unlike Unix). Mathematica… Symbolic math (code generation?); (COMMENTS?) Fortran and C/C++ have superior developing and debugging environments at CIRA through a common Microsoft Developer Studio Interface; same process to create / debug a C++ or Fortran program.

  7. FORmula TRANslator FORTRAN – 66 FORTRAN – 77 (“fixed form”) Fortran – 90 (“ free form”) Fortran – 95 Fortran 2000 Note “Fortran” is now a word, not an uppercase acronym (like “radar”). John Backus; leader of team that invented Fortran in 1950’s Files often named .f, .f77, .f90 Ellis, Philips and Lahey “Fortran 90 Programming” book. (Loretta Wilson has a few copies) http://www.sis.pitt.edu/~mbsclass/hall_of_fame/backus.htm

  8. The New York Times March 20, 2007 John W. Backus, 82, Fortran Developer, Dies By Steve Lohr John W. Backus, who assembled and led the I.B.M. team that created Fortran, the first widely used programming language, which helped open the door to modern computing, died on Saturday at his home in Ashland, Ore. He was 82. His daughter Karen Backus announced the death, saying the family did not know the cause, other than age. Fortran, released in 1957, was “the turning point” in computer software, much as the microprocessor was a giant step forward in hardware, according to J.A.N. Lee, a leading computer historian. Fortran changed the terms of communication between humans and computers, moving up a level to a language that was more comprehensible by humans. So Fortran, in computing vernacular, is considered the first successful higher-level language. Mr. Backus and his youthful team, then all in their 20s and 30s, devised a programming language that resembled a combination of English shorthand and algebra. Fortran, short for Formula Translator, was very similar to the algebraic formulas that scientists and engineers used in their daily work. With some training, they were no longer dependent on a programming priesthood to translate their science and engineering problems into a language a computer would understand. In an interview several years ago, Ken Thompson, who developed the Unix operating system at Bell Labs in 1969, observed that “95 percent of the people who programmed in the early years would never have done it without Fortran.” He added: “It was a massive step.” Fortran was also extremely efficient, running as fast as programs painstakingly hand-coded by the programming elite, who worked in arcane machine languages. This was a feat considered impossible before Fortran. It was achieved by the masterful design of the Fortran compiler, a program that captures the human intent of a program and recasts it in a way that a computer can process. In the Fortran project, Mr. Backus tackled two fundamental problems in computing — how to make programming easier for humans, and how to structure the underlying code to make that possible. Mr. Backus continued to work on those challenges for much of his career, and he encouraged others as well. “His contribution was immense, and it influenced the work of many, including me,” Frances Allen, a retired research fellow at I.B.M., said yesterday. Mr. Backus was a bit of a maverick even as a teenager. He grew up in an affluent family in Wilmington, Del., the son of a stockbroker. He had a complicated, difficult relationship with his family, and he was a wayward student. In a series of interviews in 2000 and 2001 in San Francisco, where he lived at the time, Mr. Backus recalled that his family had sent him to an exclusive private high school, the Hill School in Pennsylvania. “The delight of that place was all the rules you could break,” he recalled. After flunking out of the University of Virginia, Mr. Backus was drafted in 1943. But his scores on Army aptitude tests were so high that he was dispatched on government-financed programs to three universities, with his studies ranging from engineering to medicine. After the war, Mr. Backus found his footing as a student at Columbia University and pursued an interest in mathematics, receiving his master’s degree in 1950. Shortly before he graduated, Mr. Backus wandered by the I.B.M. headquarters on Madison Avenue in New York, where one of its room-size electronic calculators was on display. When a tour guide inquired, Mr. Backus mentioned that he was a graduate student in math; he was whisked upstairs and asked a series of questions Mr. Backus described as math “brain teasers.” It was an informal oral exam, with no recorded score. He was hired on the spot. As what? “As a programmer,” Mr. Backus replied, shrugging. “That was the way it was done in those days.” Back then, there was no field of computer science, no courses or schools. The first written reference to “software” as a computer term, as something distinct from hardware, did not come until 1958.

  9. In 1953, frustrated by his experience of “hand-to-hand combat with the machine,” Mr. Backus was eager to somehow simplify programming. He wrote a brief note to his superior, asking to be allowed to head a research project with that goal. “I figured there had to be a better way,” he said. Mr. Backus got approval and began hiring, one by one, until the team reached 10. It was an eclectic bunch that included a crystallographer, a cryptographer, a chess wizard, an employee on loan from United Aircraft, a researcher from the Massachusetts Institute of Technology and a young woman who joined the project straight out of Vassar College. “They took anyone who seemed to have an aptitude for problem-solving skills — bridge players, chess players, even women,” Lois Haibt, the Vassar graduate, recalled in an interview in 2000. Mr. Backus, colleagues said, managed the research team with a light hand. The hours were long but informal. Snowball fights relieved lengthy days of work in winter. I.B.M. had a system of rigid yearly performance reviews, which Mr. Backus deemed ill-suited for his programmers, so he ignored it. “We were the hackers of those days,” Richard Goldberg, a member of the Fortran team, recalled in an interview in 2000. After Fortran, Mr. Backus developed, with Peter Naur, a Danish computer scientist, a notation for describing the structure of programming languages, much like grammar for natural languages. It became known as Backus-Naur form. Later, Mr. Backus worked for years with a group at I.B.M. in an area called functional programming. The notion, Mr. Backus said, was to develop a system of programming that would focus more on describing the problem a person wanted the computer to solve and less on giving the computer step-by-step instructions. “That field owes a lot to John Backus and his early efforts to promote it,” said Alex Aiken, a former researcher at I.B.M. who is now a professor at Stanford University. In addition to his daughter Karen, of New York, Mr. Backus is survived by another daughter, Paula Backus, of Ashland, Ore.; and a brother, Cecil Backus, of Easton, Md. His second wife, Barbara Stannard, died in 2004. His first marriage, to Marjorie Jamison, ended in divorce. It was Mr. Backus who set the tone for the Fortran team. Yet if the style was informal, the work was intense, a four-year venture with no guarantee of success and many small setbacks along the way. Innovation, Mr. Backus said, was a constant process of trial and error. “You need the willingness to fail all the time,” he said. “You have to generate many ideas and then you have to work very hard only to discover that they don’t work. And you keep doing that over and over until you find one that does work.”

  10. Why is my program slow? Is my job I/O Bound? CPU Bound? RAM Bound? Did I use optimizations when I built the executable, or is it debug version? Is it interpreted? If it’s CPU Bound, where is it spending all of it’s time? Use the profiler in the Developer Studio to find out. Am I reading ASCII? (slow). Use large reads of binary data into RAM if possible. Right click on bottom toolbar to get task manager and performance monitor

  11. Why is my program slow? • Usually, an algorithm redesign is the best way to increase speed, within reason. • Avoid page faults (exceeding RAM amount )! If you hear the hard drive working during a program run, suspect page faults. • Do I really need double precision? • Look at a listing and map file for diagnosis • If it’s RAM bound, can I store certain large arrays as bytes or short integers to achieve factor or 2 – 4 RAM reduction? (Common trick with lat/lon) • Use the profiler if it’s really important to speed up. • Buying hardware should be a last resort (sometimes)…

  12. Am I using an optimizing compiler or a “parentheses counter”? • A good compiler will do more than increase speed (argument type checking, conflicting usage, standard compliance…) • Debug versions of code are not normally optimized • Fortran and C use optimizing compilers, each has better speed in certain circumstances • In IDL, things like “for loops” or “if statements” kill performance; • not so in an optimized language • B = 1 • A = 2 • C = 3 • Do k = 1, 1000 • a = a + k * 2 • Enddo • B = C * B * 2 Once you gotten your code to compile, you’ve only made the bugs harder to find. A good compiler will remove this code, since “a” has no effect on result

  13. Example of Performance: Math_arc.for vs math_arc.pro (source code on FTP) • Calculates distance between many latitude / longitude pairs using spherical geometry • CPU intensive (lots of trig functions called) • Run as IDL, Debug build and Release build • Performance difference varies, IDL performance is more comparable using whole array operations …. cos_phi = cos (phi) cos2_phi = cos_phi * cos_phi sin_phi = sin (phi) sin2_phi = sin_phi * sin_phi a = sqrt (cos2_theta * cos2_phi + sin2_phi) beta = atan2 (cos_theta * cos_phi, sin_phi) x = a * cos (beta) …. 1.6 GHz Pentium IV Run time IDL: 81 seconds Fortran debug build: 17 seconds Fortran release (optimized) build: 12 seconds

  14. Fortran 90 specific advice: • Use explicit interfaces (module with CONTAINS statement) • Always add “implicit none” • Use INTENT attribute with subroutine variables • Be systematic about variable names and comment what they are; in particular what are the units of scientific variables? • Don’t use hardcoded filenames or unit numbers. See the sys_get_unit.f90 file for example • Build the debug version first; even if you don’t look at it in the debugger. The debugger gives you an .exe file just like release, except larger file size. • Turn on compiler warnings for unitialized / unused variables; Enforce Fortran 90 or 95 standards checking. • Turn on all run-time error checks initially; then turn them off as performance is proven and speed is desired (I.e. debug vs release) • DOS: Set for_ignore_exceptions=true enables “just in time debugging” • Set linker to generate traceback information (line # where crashed)

  15. Fortran Rules: < 80 characters / line of source No hardcoded units Write (*,*) implicit none Use Modules Run time checking Build debug version first, release later if sure it works or perhaps not at all. Use run time checking, just in time debugging, traceback.

  16. B) Visualization • IDL / Matlab is used for analysis / prototyping / plotting / display / complex displays (e.g. contours, 3d surfaces…) / animations: • IDL has a 10 year heritage at CIRA, with applications developed (for instance, displaying McIDAS files with IMGBAR). Kankiewicz, Forsythe, Dostalek users among others… • Matlab is commonly used by those coming into CIRA from other institutions, and by the EE group (e.g. Neural Net). CIRA does not have a large Matlab software repository. The data assimilation group is using Matlab more… (COMMENTS?) • IDL and Matlab require a license, expensive and less portable to other institutions than Fortran or C. • We used to use NCAR Graphics, some groups still do… • Excel is used for “simple” plots, not as capable as IDL / Matlab • Most of the final results are put in Powerpoint for presentation. The RAMM Team uses McIDAS heavily, if you are working with them you may use that…

  17. CIRA Applications http://www.cira.colostate.edu/Infrastructure/Intranet/ciraapps.htm [GONE – IS THIS STILL ONLINE ?] \\BDC\Common\CIRAApps Somewhat dated but still some very useful tools (e.g. Mcviewer). • CIRA (FCL) has ~150 PC’s and a ~24-node Linux cluster • Linux cluster geared towards RAMS modeling (RAMDAS) efforts • PC’s used for multiple science roles, including realtime data collection; More storage, I/O devices available on PC’s • Good science code may be portable between systems, but beware of things like compiler settings, byte swapping…

  18. http://www.cira.colostate.edu/Infrastructure/Intranet/IMGBAR.htmhttp://www.cira.colostate.edu/Infrastructure/Intranet/IMGBAR.htm Image Bar can read many satellite formats. Can do things Paint shop, Photoshop etc can’t (e.g. overlay a map).

  19. Suggested Basic Scientific Toolkit for Windows: • In addition to standard MS Office, Acrobat, IE… • Visual C++ 6.0, Compaq Visual Fortran 6.6b • Paint Shop Pro (v 7.0) (Basic image display) • MKS Toolkit v 7.0 (Unix commands for DOS command line including tar command) • Web editor if needed (I’ve had good experiences with Dreamweaver; MS Office does an amazing job (save as HTML)). • IDL or Matlab (availability depends on licensing situation, these cost hundreds or $1000’s of dollars, may need to share a license with other users). Current IDL version is 5.6 (?), 6.X is now available. • Compaq array viewer or HDF browser • Winzip (uncompress files); • Visio (draw flow charts easily) • Batch Job Server (or MS Scheduled Tasks) Karll Renken is the CIRA POC for software NOTE: Versions are current as of August 2003, under Windows 2000. They will change. Sometimes an upgrade is not a good idea!

  20. Resources IMSL (available on Windows) (use numerical_libraries) www.netlib.orgHuge library of mathematical routines www.compaq.com/fortranUpgrades (6.6b) / Patches… http://intel.com/IDS/forums/fortranBulletin board comp.lang.fortran [.c] General Fortran [.c] newsgroup sci.math.numerical-analysisLanguage independent sci.image.processingUseful for satellite data comp.lang.idl-pvwaveIDL newsgroup Numerical RecipesClassic book on algorithms (F90 or C++) www.dfanning.com IDL advice and routines; and book

  21. Sample from comp.lang.fortran newgroup. Do a group search on google to search archives for answers. http://softwareforums.intel.com/ids/board?board.id=5 (enable cookies)

  22. What are some recent and future developments to look for? • Fortran / C++ / IDL, other languages can be linked together, try to avoid this if possible, it is an advanced topic. Must be done for some applications though. • Interface between Fortran and C will be standardized (use ISO_C_BINDING) • Fortran 2000 standard approved April 2003 (see article) • More usage of MATLAB? • Peer-to-peer job distribution (DPEAS – Andy Jones; massive processing of satellite data), and parallel processing (especially in modeling)

  23. Interactive session

  24. Defining Your Project • To define your project, you need to: • Create the project • Populate the project with files • Choose a configuration • Define build options, including project settings • Build (compile and link) the project • To create a new project: • Click the File menu and select New. A dialog box opens that has the following tabs: • Files • Projects • Workspaces • Other Documents • The Projects tab displays various project types. Click the type of Fortran project to be created. If you have other Visual tools installed, make sure you select a Fortran project type. You can set the Create New Workspace check box to create a new Workspace. • Specify the project name and location. • Click OK to create the new project. Depending on the type of project being created, one or more dialog boxes may appear allowing you to only create the project without source files or create a template-like source file. • If a saved Fortran environment exists for the Fortran project type being created, you can also import a Fortran environment to provide default project settings for the new project (see Saving and Using the Project Setting Environment for Different Projects). • This action creates a project workspace and one project. It also leaves the project workspace open. • To discontinue using this project workspace, click Close Workspace from the File menu. • To open the project workspace later, in the File menu, click either Open Workspace or Recent Workspaces. Creating a New Project (taken from online help) Win32 Console Application almost always Will build in release or debug directory; can change configuration in workspace

  25. How do I create a program? • On Windows you create a project (Win32 console application almost always); point and click. This creates a makefile which the user shouldn’t have to work with • The makefile describes the files to build, their dependiencies, compiler settings etc. • On Unix / Linux, you edit a makefile directly, including compiler settings [might have evolved…]

  26. Important Compiler / Linker Settings Files in Project

  27. The Developer Studio Environment and Debugger is Powerful! • Features • “Live” cursor • Set / remove breakpoints, run to cursor • Drag and drop variable view in watch windows • Exception settings • Visual array viewer! • All point and click The editor can also be useful for non-coding editing (e.g. deleting a column of text)

  28. Watch window If you get confusing assembly language windows, make sure the correct context is selected

  29. Student Interests Fortran (Maclay, Mazur, D’Onofrio, Rapp) IDL Matlab HDFEOS data (Seaman, Nielsen, D’Onofrio) Display of RAMDAS on Linux cluster (Seaman) netCDF (Nielsen) Managing large datasets (Rapp)

More Related