700 likes | 855 Views
Software Engineering Tools – can they be of any practical use?. Dr Chris Greenough Software Engineering Group Computational Science & Engineering Rutherford Appleton Laboratory c.greenough@rl.ac.uk. Elements of this presentation. Motivation and background The CCP Review
E N D
Software Engineering Tools – can they be of any practical use? Dr Chris GreenoughSoftware Engineering Group Computational Science & Engineering Rutherford Appleton Laboratory c.greenough@rl.ac.uk
Elements of this presentation • Motivation and background • The CCP Review • The EPSRC SLA Review • The CCP scoping study • SESP - software engineering for CSE! • Overview of programme • Interactions within CSE • A starting point – legacy software and immediate requirements • A transformation process – guided tour • Access to resources • Software tools server • Web site (www.sesp.cse.clrc.ac.uk) • QA Portal (www.qaportal.cse.clrc.ac.uk) • Priorities for the CSE science programme
Motivation - a CSE perspective "To ensure that UK researchers benefit from the best computational methods, supporting them through collaboration, software development, facilities and education” (CSE Mission statement). What does this software development require? • doing excellent science and engineering (often in collaboration) • devising or enhancing complex computational models • developing advanced & efficient numerical techniques • designing, implement & supporting quality software What does quality software mean to CSE (or what should it mean)?
The CCP Review CCP Review - Recommendations 9 & 10 • CCPs should expand their use of modern software development tools, in collaboration with computer scientists and software developers in academia and in industry. • CCPs should respond to the challenge of "GRID-enabling" existing applications and designing new projects with the GRID in mind from the outset.
The EPSRC SLA Review SLA Review - Recommendations 3 & 12 • CSED should develop mechanisms to disseminate best practice across all of the CCPs • CSED and EPSRC should work together to encourage greater involvement of the Computer Science and Mathematics academic communities.
The CCP scoping study recommendations • Develop a common and extensible framework and a technology set for CCP GUIs and data visualisation. • Use component architectures and other mechanisms for increasing code interoperability and for managing and transitioning legacy codes. • Establishing a common SourceForge-like resource, • Refresh the software skills and tools employed by the computational science community including the standards of software development. • Accelerate the use of performance tool in scientific applications on parallel computers and parallel programming tools. • Define the CCP requirements for problem-solving environments and work-flow management in GRID applications.
CCP Scoping Study - Summary • The recommendations have as common factors the adoption of new technologies and interfacing with products of the latest computer science research. • In order to retain a competitive advantage, it is crucial to reduce the lag between development in such research and associated test beds, and exploitation in actual applications.
Thoughts from Huub van Dam (apologies - taken from an Email!) There should be plenty of issues to discuss. Just to start with I include a list of things that bother me: Software tools: - Which development environments to use? - Which Code analysis and verifications tools? - Code version tools? We currently use CVS however this does have some problems ... - Which installer packages are available for which platforms… - An alternative package suggested ... is Subversion ...what are the alternatives?
More thoughts from Huub…. Programming languages: - Fortran77 vs Fortran90/95/03, C++, C#, Java ...? - Clearly there may be advantages to higher level programming languages as they allow the implied structure of a code to be made explicit through the use of classes or modules. - But how well are those languages supported in terms of tools and compilers, - how do they compare on run-time inefficiency? - how do they compare on develop time and maintenance? - and how do you transform legacy codes.
Yet more thoughts from Huub…. Scripting languages: - These languages can be useful to tie various software components together to build high level applications. - Currently we are using both Tcl and Python. Of course there are Perl and Ruby as well. The nice thing about Python is that it can be extended with native C-code compiled in dynamically linked shared libraries without having to rebuild or even relink the Python interpreter. Less clear to me is how objects in the libraries compare with native Python objects when using them in scripts.
...and more thoughts from Huub... GUI toolkits: - CSE is developing at least 4 GUIs for working with atomistic models. - Unfortunately they are all quite different. Tools used are AVS, Java, Python/Tkinter, Tcl/Tk, Perl/Tk. The problem is that Tk is rather limited in the widgets it offers ... - Alternatives include Qt, and wxWindows, but how good are those? - Above and beyond this GUIs typically require multi- threading and a object oriented approach. How well are these supported?
…and more…. Parallel programming toolkits: - Support for parallel programming can come at different levels. In short there are three different aspects to parallel programming: 1. communication 2. data distribution 3. algorithms - MPI offers only support for communication ScaLAPACK supports communication and algorithms but lets the user sort out the distribution. - Global Arrays support all three. - What else is out there and what is best to use?
…and finally... Management level tools: - Documentation recommendations - Programmers guides - Development processes - Software testing and validation processes - User requirements and user process analysis I think traditionally this is probably the weakest point within the department...
Software Engineering Support Programme (SESP) SESP activity to provide and encourage the use of up-to-date software engineering techniques and tools within computational science and engineering. The main goals of SESP are: • accelerate the introduction and widespread use of high-payoff software engineering practices and technology by identifying, evaluating, and maturing promising or underused technology and practices; • maintain a long-term competency in software engineering and technology transition; • enable the UK academic community to make measured improvements in their software engineering practices by working with them directly; • encourage the adoption and sustained use of standards of excellence for software engineering practice; • foster collaborations with other groups, in the UK, Europe and the US, that have an interest in the applications of advanced software engineering techniques in computational science.
Elements of SESP • Software Quality Assurance • Processes for Legacy Software • Evaluation of Methodologies, Tools and Technology • Integrate Design Environments • Parallelisation & vectorisation software • Symbolic Algebra Systems • Problem Solving Environments (PSE) • GUIs and user interfaces • Component technologies • ...
Software Quality Assurance • Software Quality Assurance is the basic of software engineering processes that should be undertaken by all software developers. • The process should include: • Requirements gathering • Design - software and testing • Implementation • Testing • Deployment • The initial target language for most applications is now Fortran 95 or even Fortran 2003. Although the commercial world of Software QA is dominated by C, C++ and Java, there are good Fortran tools available. • PlusFORT, ForCheck and the NAG Ware are but three examples for QA tools for use in implementation and testing. • Clearly CVS is the current tool of choice to track version - but there are others. (Using gCVS and WinCVS can ease the pain)
Processes for Legacy Software • For many applications within the science and engineering community the root language has been Fortran 77 and for some - even Fortran 66. • Software engineering has developed and languages have grown and now Fortran 95 and C provide the main modern vehicles for these applications. • To maintain and continue to develop the science encapsulated in these legacy codes a process of transformation and re-engineering must be formalised. • This can be broken into three basic steps: • standardisation, • transformation and • re-engineering. • SESP will develop a process and gather tools to aid this transformation.
Evaluation of Methodologies, Tools and Technology • The computer science community has a long history of developing new methodologies, tools and technologies to aid the development of computing applications. • These range from new languages, such as C# or JAVA, to frameworks and environments that gather these tools and processes together in an integrated form, such as the Microsoft Visual studio or CodeForge from the Unix world. • There is a growth in the use of other languages and programming models other than the procedural style of Fortran. Languages such as C++ and Object Orientation are becoming more common in numerical software.
Dissemination of Software and Results • All the results of the activity will be disseminated through a CSE Software Engineering Support Programme Web site and through seminars and workshops. The SESP web site will contain: • An overview of the aims and objectives of SESP • Detail of contacts in the programme • Summary pages on the programmes activities and findings • All technology watch and assessment reports (pdf, ps, html) • Selected software • Links to software and other software engineering pages of interest to computational scientists. • Seminars and workshops will be arranged to disseminate the results of the activity or to provide hands on experience with specific software tools. These may be arranged in association with the software vendors.
SESP Web - www.sesp.cse.clrc.ac.uk The SESP web site provides access to: • Information on software tools • Documentation on the SESP tool set • Reports & publication on software engineering • Links to public domain tools that may be of use
Legacy software - an opportunity? • Our problem: lots of software from many years of development so what do we do will it? • Two options: • throw it away and start again, or • rescue, transform and reuse. • We will consider an pragmatic approach that could be taken to retain the functionality of legacy software whilst moving its maintenance and future development processes forward. • We draw heavily on the work of Charles Norton and Viktor Decyk at the NASA Jet Propulsion Laboratory who have made great strides in developing a systematic approach for the rescue of intellectual property embedded in legacy systems.
What is Legacy Software? • Legacy software systems are programs that are: • still well used by the community • have some potential inherent value • but that were developed years ago • using early versions of languages. • Often these programs have been maintained and developed for many years by hundreds of programmers, and while many changes have been made to the software, the supporting documentation may not be current and the programming style from the dark ages. • Legacy software can be characterised informally as old software that is still performing a useful job for the community.
Software replacement? • Expensive - in staff and time! • The software represents years of accumulated experience, which is not represented elsewhere, so discarding the software will also discard this knowledge, however inconveniently it is represented. • The software may actually work well, and its behaviour may be well understood. A replacement system may perform much more badly, at least in the early days. Hence it may be worth recovering some of the good features of the legacy system. • A typical large legacy software system has many users, who typically have exploited undocumented features and side effects in the software. It may be important to retain the interfaces and exact functionality of the legacy code, both explicit and implicit. • Users may prefer an evolutionary rather than a revolutionary approach to modernising their software
A pragmatic process for legacy software • To maintain and continue to develop the science encapsulated in legacy codes a process of transformation and re-engineering must be formalised and undertaken. • This process can be broken into three basic steps: • standardisation, • transformation and • re-engineering. • The re-engineering step is broken down into four additional steps to help stage the development. • This follows Norton and Decyk at JPL.
A Step-by-step process for legacy software Legacy Software Standard-BaseCompilation Transform software into standard compliance Undesirable Features COMMON BlocksImplicit typing#def/#include Add New Capabilities Dynamic memoryInteroperabilityArray Operations Components & OO AbstractionIntegration Create Interfaces Wrappers for legacy codeInterfaces for all routines
A Step-by-step process for legacy software Legacy Software Standard-BaseCompilation Transform software into standard compliance
Standardisation • Get the base code into a conforming form. • As mention above often legacy codes are in Fortran 77 or 66 or even worst a mixture of all standards, dialects and languages. • In general the standard of research programming is limited and most often leads to not particularly portable software. • The developers often adopt mechanisms from other languages. • The curse of cpp. The # directives for the pre-processor cpp are not in the Fortran standard! #include, #define and #ifdef are three main culprits. • It is fortunate that most other directives which are added to Fortran programs are expressed as comment lines e.g HPF. You say: if it compiles on the machine I am interested in - its OK!
Program A source lines: 16505 comment lines: 2447 statements: 11485 subprograms: 84 source files: 84 include files: 6 error messages: 397 warnings: 520 informative messages: 69 Program B source lines: 65737 comment lines: 38829 statements: 55447 subprograms: 626 source files: 155 include files: 1 modules used: 37 error messages: 3988 warnings: 2732 informative messages: 1201 Some examples of typical software
Program C source lines: 17846 comment lines: 15829 statements: 15912 subprograms: 110 source files: 77 include files: 2 error messages: 533 warnings: 1964 informative messages: 377 Program D source lines: 5497 comment lines: 4401 statements: 5276 subprograms: 125 source files: 12 warnings: 318 informative messages: 15 Some examples of typical software (cont.)
Some typical errors, warnings and information (FORCHECK) • 88x[ 43 E] illegal characters in front of continuation line • 16x[ 45 W] too many continuation lines • 31x[ 46 E] undefined characters at end of statement • 690x[ 48 E] undefined characters in label field of statement • 3x[ 53 W] tab(s) used • 3x[ 60 W] fixed source form used • 1704x[ 69 E] unrecognized statement • 853x[ 71 W] nonstandard Fortran statement • 1x[ 76 E] this statement can only be used within a loop construct • 3x[ 78 E] statement out of order • 6x[ 81 E] undimensioned, or statement function out of order • 1x[ 84 I] no path to this statement • 7x[ 86 E] program unit END missing • 5x[ 90 E] unmatched parentheses • 12x[ 94 E] syntax error • 259x[ 96 W] obsolescent Fortran feature • 11x[ 98 W] deleted Fortran feature • 1x[134 E] missing apostrophe or quote
Some more.. • 120x[145 I] implicit conversion of scalar to complex • 7x[167 E] illegal usage of subscripts • 1x[183 E] illegal length or kind specification, default assumed • 1x[187 E] illegal to (re)define attribute: object is imported from module • 1x[202 E] multiple specification of data type, this one ignored • 1x[228 W] size of common block inconsistent with first declaration • 3x[241 W] nonstandard mixing of data types in EQUIVALENCE • 72x[287 E] scalar integer constant name expected • 3x[303 E] scalar logical expression expected • 31x[312 E] no value assigned to this variable • 107x[313 I] possibly no value assigned to this variable • 79x[315 I] redefined before referenced • 573x[316 W] not locally defined, specify SAVE in the module to retain data • 373x[319 W] not locally allocated, specify SAVE in the module to retain data • 6x[331 E] illegal usage of substring • 4x[335 E] data type conflict • 25x[340 I] equality or inequality comparison of floating point data • 69x[342 I] eq.or ineq. comparison of floating point data with zero constant
…and some more • 73x[343 I] implicit conversion of complex to scalar • 24x[344 I] implicit conversion of constant (expression) to higher accuracy • 4x[345 I] implicit conversion to less accurate data type • 4x[348 E] illegal usage of logical operator • 3x[351 E] illegal usage of operator • 4x[352 W] nonstandard Fortran 77 operator • 2x[378 W] pointer not locally associated, specify SAVE in the module • 5x[387 E] non-matching construct name • 1x[393 E] missing ENDIF('s) • 139x[590 E] array versus scalar conflict • 9x[616 E] input or input/output argument is not defined • 621x[625 W] nonstandard Fortran intrinsic procedure • 1x[630 E] illegal argument data type for intrinsic procedure • 3x[651 I] already imported from module • 106x[665 I] eq.or ineq. comparison of floating point data with constant • 6x[675 I] named constant not used • 3x[691 I] data-type length inconsistent with specification • 19x[699 I] implicit conversion of real or complex to integer
Tools for conformance checking • Three main tools have been used: • FORCHECK from Leiden University • NAGWare Tools • pfort for Fortran 77 • nag_modules95 for Fortran 90/95 • These tools are very fussy • They do not like cpp directives - #ifdef is not Fortran
Compilers as conformance checkers • Some compilers have conformance options. • However all compilers have extensions • Compilers will pre-process cpp directives • Good ones for syntax and conformance are: • f95 - Numerical Algorithms Group • ifc (fort) - Intel • lf95 - Lahey • You should be looking for the “-f95” or “-rigorous” flags • Include undeclared, un-initialised and unused variables.
Conditional compilation and pre-processors such as cpp • Conditional compilation is a very useful feature for developers who need to install their software on multiple platforms. • The use of a pre-processor as a configuration tool allows the maintenance of multiple configurations in a single file. • As far as software engineering tools are concerned statements such as #define and #if, that drive pre-processors such as cpp and fpp, are not part of the Fortran language and lead to syntax errors being flagged. • Even the pre-processor coco now being developed as a component of the Fortran 2003 standard, has the same difficulty. • Pre-processors provide the programmer with an excellent configuration tool but they are a nightmare to any syntax checking or standard conformance tool. • Pre-processing with cpp can cause interesting problems on compilation.
cpp directives • For transformation and analysis tools to work correctly cpp pre-processor commands must be remove. • Basically either the source is pre-processed for one specific configuration, processed and the cpp directives re-introduced as a post-processing step or the effects of the directives must be neutralised before re-engineering but the information they contain preserved in some way. • Pre-processing with cpp and a suitable sets of -Ds will lead to multiple source - as is generally the intention. • If the processing is only a standard conformance check or portability check then processing multiple sources is not a problem. • However if the processing is a transformation of the source merging the transformed sources and re-introducing the cpp directives is a serious problem.
What can be done? • Some simple filters are being developed to aid in the assessment of cpp directives and their effect on the transformation tools. These only perform some simple single-pass processing so they are currently very limited their scope. • cpp-analyse : this script scans the Fortran file and detects cpp directives, e output as diagnostics where possible conflict could arise. These areas are those in which there are variable type specifications, INCLUDE statements and USE within cpp directive block. In general executable statements within a cpp directive block will not present a problem provided the tool processing the software does not move comment statements. • cpp-remove : this script will remove (comment out) cpp directives and non-executable statements that cause specification conflicts. • cpp-restore : this script will restore cpp directives within the source on the assumption that the relative position of the comment statement has not moved. This is true in most situations in the body of a source file but may not be true at the file’s head.
Initial transformation • Fortran 77 is a subset of Fortran 90/95 • Initial transformation could be simply from the fixed form of Fortran 77 to the free form of Fortran 90/95 (There are a number of public domain tools which will do this for you - or you could do it yourself with an editor or awk!) • Automatic transformation is possible once a standard source form is adopted (Some tools are more lenient than others). • Clearly changing source form gives you little but it can present an interesting and difficult problem - Author Recognition.
Author Recognition • An interesting by product of the transformation process is the problem of author recognition. • Not surprisingly developers and authors have a mental image of their software. • They are often able to recognise where they are in a code by its appearance and structure of the code on the screen - they have a set of visual bookmarks. • These help greatly in the development of the software. • After transformation the developers of the code find that they no longer recognise the software. • For the developers this may reduce their productivity in development and hence they well be reluctant to use the newer version as the basis of future development. • Fortunately, in most tools, it is possible to set the control parameters to produce a layout that is not too dissimilar to that of the original.
A Step-by-step process for legacy software Legacy Software Standard-BaseCompilation Transform software into standard compliance Undesirable Features COMMON BlocksImplicit typing#def/#include
Serious transformation • Once in a standard form automated transformation tools can do: • Fixed-form Fortran 77 to free-form Fortran 95 - pretty printing. • Some transformation tools will perform re-structuring of the program: • Replacing computed GOTOs with IF-THEN-ELSE statements • Moving COMMON blocks into MODULES • Moving INCLUDE blocks into MODULES • Explicitly typing all variable - IMPLICIT NONE only • Generating INTERFACE blocks for all routines • Identifying where array syntax can be used
Serious transformation • What is not automated as of yet: • use of pointers/allocatable arrays • designing and implementing a module structure • providing dynamic memory allocation • parallalisation • vectorisation • wrapper and encapsulation • object orientation and user data types
Subroutine: subroutine sub1 include ’example.inc’ angle=2.0*pi/4.0 c .... c Body of subroutine c ... end Include file: real*8 pi,sqrpi,boltz integer kmaxa,kmaxb,kmaxc,minnode,msbad,mxfix common/params/kmaxa,kmaxb,kmaxc,minnode,msbad,mxfix, parameter (pi =3.141592653589793d0,sqrpi =1.7724538509055159d0) parameter (boltz =8.31451115d-1) An example of COMMON and INCLUDE
An example of COMMON and INCLUDE (using plusFORT) • Common & include: • MODULE I_example • USE F77KINDS • IMPLICIT NONE • ! PARAMETER definitions • REAL(R8KIND) , PARAMETER :: PI = 3.141592653589793D0 , & • & SQRPI = 1.7724538509055159D0 , & • & BOLTZ = 8.31451115D-1 • ! COMMON /PARAMS/ • INTEGER :: KMAXA , KMAXB , KMAXC , MINNODE , MSBAD , MXFIX • END MODULE I_example • Subroutine: • SUBROUTINE SUB1 • USE I_example • IMPLICIT NONE • REAL :: angle • angle = 2.D0*PI/4.D0 • ! Body of subrotuine • END SUBROUTINE SUB1
INTERFACE blocks • In many Fortran compilers there have been additional debugging options provided to compare the types and sizes of subprogram dummy arguments against the actual arguments. • These have provide a very important debugging tool as the correspondence of dummy to actual arguments in a subprogram call harbours many runtime errors - pass an integer into a real and strange thing can subsequently happen. • Fortran 90/95 provides a mechanism which can eliminate these types of run-time problems by providing the compiler with sufficient information the check the correspondence between actual and dummy arguments. • In general most interfaces in well designed Fortran 90/95 software will be explicit. • The compilation system will organise the information to perform actual and dummy argument checking.
INTERFACE blocks • The only problem occurs when a subprogram is defined outside the scope of the current program - most commonly in an external library. • During compilation the compiler has no automatic information about the subprogram’s interface. • In using such external libraries the programmer can do one of two things for the compiler. Either: • declare the subprograms as EXTERNAL or • provide an INTERFACE block. • The interface to these external subprograms can be provided to the Fortran 90/95 compilation systems through the INTERFACE block. • By providing an interface block the programmer describes the subprogram’s argument list to the compiler. • This re-instates the potential for actual and dummy argument checking.
INTERFACE block example • FUNCTION lafind(it,nnn,lsi,lsa) • ! • !********************************************************************* • IMPLICIT NONE • ! Dummy arguments • INTEGER :: it, nnn • INTEGER :: lafind • INTEGER, DIMENSION (nnn) :: lsa, lsi • INTENT (IN) :: it, lsa, lsi, nnn • ! Local variables • INTEGER :: i, i1, iz, k • !*** End of declarations rewritten by SPAG • ..............................…Body of function • END FUNCTION lafind
INTERFACE block example - using NAGWare nag_mkintf95 • MODULE nag_mkintf95_mod • INTERFACE • FUNCTION lafind(it,nnn,lsi,lsa) • INTEGER :: lafind • INTEGER, INTENT (IN) :: it • INTEGER, INTENT (IN) :: nnn • INTEGER, INTENT (IN) :: lsi(nnn) • INTEGER, INTENT (IN) :: lsa(nnn) • END FUNCTION lafind • END INTERFACE • END MODULE nag_mkintf95_mod
IMPLICIT variable typing • At the time the IMPLICIT statement and the implicit variable type conventions must have seemed a good thing. • All the evidence in software maintenance and re-engineering points to the fact that IMPLICIT type statement and the implicit typing conventions obscure and confuse. • Although the IMPLICIT type statement is still part of the Fortran 90/95 it is clear that the only IMPLICIT statement that really should be appear in all code is: IMPLICIT NONE • IMPLICIT typing should not be used!
Not all KINDS are the same! • The use of REAL*8, INTEGER*2 and LOGICAL*1 are all extensions to Fortran 77 and were not part of the formal standard. • In Fortran 90/95 these can replaced with equivalents: REAL (KIND=DP) or LOGICAL(KIND=1). • However it should be noted that the KIND definitions are implementation dependent - they are not defined in the standard. • For example: R8KIND=2 (REAL*8) for the NagWare f95 compiler and for the Intel ifc compiler R8KIND=8 (REAL*8). • It is a common mistake to assume that: • REAL*8 REAL(8) • or • DOUBLE PRECISION REAL(8) • REAL(8) specifies a real variable with KIND=8.