370 likes | 382 Views
Comments on Software Systems. HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org.
E N D
Comments onSoftware Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org
Design, analysis, and management of a BIG software project I • General Principles • Quality control in software development • documentation/archives • codes • Design of the architecture of a large-scale or complicated system • How to start • Methodology • Decomposition • Subtask and goal • How to choose programming language and development environment • Trend of programming language (C, C++, and Java) • Platforms (Windows, Unix, and Linux) • Is there any de facto programming language(s) for a certain type of applications (e.g. C and C++ used to be popular in real-time systems)
Design, analysis, and management of a BIG software project II • How to design a client-side (stand alone) air traffic control – a real-time client-side monitor system • Principles • Reliability • Performance • Interface between subsystem and main framework • How to design a large-scale distributed air traffic control system • Architecture • Modularity • Reusability (difficulty for us) • Design model (two-tier or three-tier) • Algorithm and performance of air traffic flow control • Training of senior system architect
Overall Remarks • Talk based on my experience which is very different from that of your company • I have developed software in a small company and in university setting with a mix of students and staff • I watched other large software activities including Apache and other open source • Preferred software model changes faster than software engineering techniques • C++ • Corba • Java • Web Services • Maybe some software engineering
General Principles I • Have a clear management structure with one person in charge of important decisions • Decisions can and should be debated • Communicate electronically and preserve records in a searchable fashion • Email possible if a clean master list but probably Wikis and Blogs are better • Equip with Search – Google web or desktop better than most built in search capabilities • Obviously use CVS or equivalent for preserving version control • Document all actions in Wiki/Blog/email
General Principles II • Computers are getting faster which implies we do not have to worry about efficiency as much • Build smaller modules • As modules decrease in size, the overhead of interacting with them increases • But smaller modules with simple functionality are much easier to build and test • So avoid pointers even more and prefer to communicate data, not pointers thereto, when communicating between modules • Use databases; not ad-hoc storage mechanisms where performance cost can tolerate
General Principles III • Test as much as you can by having others (Q/A) exercise code – especially where you need to evaluate system results (output) to see if correct • Use tools like Junit to provide automated repeatable tests • The harder tests are “where you don’t know answer” • Then I used to prepare two codes • One was “production system” with all the bells and whistles • The other had few options and just did main problem • Always test incrementally • Each module separately • Full system as it builds up
General Principles IV • Minimize configuration variables that must be changed for each installation • Rather provide a message-based and user-based interface that system can use to set operating parameters • Make each module as independents as possible; build together • Module • Documentation • User interface (portlets are an example) • Configuration interface • Store configuration data in a database that is independent of system
Web services • Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. • Web Services interact by exchanging messages in SOAPformat • The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.
PortalService Security Catalog A typical Web Service • In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) • The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python PaymentCredit Card Web Services WSDL interfaces Warehouse Shipping control WSDL interfaces Web Services
Messaging Process SOAPBody Header Process SOAPHeader Body Messaging Structure • Web Service Communication is messaging (transport protocol, routing) using SOAP protocol Service itself Serviceitself Customizable HandlerChain processesSOAP Header Invoke Other Services from Header or Body
Merging the OSI Levels • All messages pass through multiple operating systems and each O/S thinks of message as a header and a body • Important message processing is done at • Network • Client (UNIX, Windows, J2ME etc) • Web Service Header • Application • EACH is < 1ms (except forsmall sensor clients andexcept for complex security) • But network transmissiontime is often 100ms or worse • Thus no performance reasonnot to mix up places processingdone IP TCP App SOAP
Closely coupled Java/Python … Coarse Grain Service Model Service B Service A Module B Module A Messages Service B Service A 0.1 to 1000 millisecond latency Method Calls.001 to 1 millisecond Linking Modules • From method based to RPC to message based to event-based publish-subscribe Message Oriented Middleware “Listener”Subscribe to Events Publisher Post Events Message Queue in the Sky
Each Service has its own portlet Individual portlet for the Proxy Manager Use tabs or choose different portlets to navigate through interfaces to different services 2 Other Portlets
Portal Architecture Clients (Pure HTML, Java Applet ..) Aggregation and Rendering Portlet Class:WebForm SERVOGrid (IU) Web/Gridservice Computing Remoteor ProxyPortlets Web/Gridservice Data Stores Portlet Class GridPort etc. Portlet Class Web/Gridservice Instruments (Java) COG Kit Portlet Class Hierarchical arrangement Portal Internal Services LocalPortlets Clients Portal Portlets Libraries Services Resources
General Principles V • Do not spend too long documenting and prefer methods like javadoc that again are naturally associated with code • Do describe actions (as opposed to code functionality) in your Wiki/Blog/email • The quality and speed of different people varies a lot • Evaluate this and assign responsibilities according • Do not let anybody take decisions into their own hands • Debate goals and processes but once decision is made all must adhere to it • Decisions can be changed and should be if needed
General Principles VI • Evaluate carefully timing constraints • Use simplest most robust approach that satisfies time constraints • That’s why I recommend databases for configuration as this is not a time critical part of system • Note computer does one instruction in 10-6 milliseconds but a network communication takes 1-100 milliseconds • Invoking a process has about 1 millisecond overhead • Method calls 0.01 to 0.01 milliseconds • Using a database a few milliseconds • People only notice 30 milliseconds
Consequences of Rule of the Millisecond Classic Programming • Useful to remember critical time scales • 1) 0.000001 ms – CPU does a calculation • 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency • 2b) 0.001 to 0.01 ms – Overhead of a Method Call • 3) 1 ms – wake-up a thread or process either? • 4) 10 to 1000 ms – Internet delay: Workflow • So use pointers and the compute memory system when latencies of ≤ 1 millisecond but use URI looked up in a context store when longer delays allowed • Transfer data when read-only and long latency allowed • Always choose the slowest allowed methodology and remember when in doubt, Moore’s law favors computer performance and systems always get more complex and harder to maintain.
Architecture of a large System • Divide system hierarchically into parts • Interaction between parts will be messages with no conventional pointers • Can have URI’s that need to be looked up in a database (essentially) • Keep doing this until overhead prohibitive • Overhead is “surface”/”volume” for ALL systems – people, software … - and always decreases in relative importance as system gets bigger • Remember computers are going to get faster than slower so err on side of modularity versus performance • Rare to be worth optimizing performance but rather make a good design that has no bad aspects making performance unnecessarily bad • Specify data structures in XML NOT Java or C++ first • Design ATCML first specifying data structures needed in Air Traffic Control • Map to SQL for databases (don’t use XML databases) • Map to C++ or Java for programming
Philosophy of Web Service Grids • Much of Distributed Computing was built by natural extensions of computing models developed for sequential machines • This leads to the distributed object (DO) model represented by Java and CORBA • RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java • Key people think this is not a good idea as it scales badly and ties distributed entities together too tightly • Distributed Objects Replaced by Services • Note CORBA was considered too complicated in both organization and proposed infrastructure • and Java was considered as “tightly coupled to Sun” • So there were other reasons to discard • Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages
What is a Simple Service? • Take any system – it has multiple functionalities • We can implement each functionality as an independent distributed service • Or we can bundle multiple functionalities in a single service • Whether functionality is an independent service or one of many method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL • Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond” • Distributed services incur messaging overhead of one (local) to 100’s (far apart) of milliseconds to use message rather than method call • Use scripting or compiled integration of functionalities ONLY when require <1 millisecond interaction latency • Apache web site has many (pre Web Service) projects that are multiple functionalities presented as (Java) globs and NOT (Java) Simple Services • Makes it hard to integrate sharing common security, user profile, file access .. services
CPUs Clusters Compute Resource Grids Overlay and Compose Grids of Grids MPPs Methods Services Component Grids Federated Databases Databases Data Resource Grids Sensor Sensor Nets Grids of Grids of Simple Services • Link via methods messages streams • Services and Grids are linked by messages • Internally to service, functionalities are linked by methods • A simple service is the smallest Grid • We are familiar with method-linked hierarchyLines of Code Methods Objects Programs Packages
Choice of languages • One needs to evaluate real-time version but I would prefer Java to C++ or C • Java has good software development tools and current generation of programmers well trained in it • C++ allows higher performance but find out if you need this • Prefer Web Service model if performance allowed • Use message-based interaction not method based where possible • Web services if requires messages and interoperability with outside world • JDBC is message based interaction with external database • Aim at supporting both Windows or Linux platforms if possible
Client Side Air Traffic Control • Analyze all performance requirements • Remember life cycle costs are larger than build costs • Difficult consequences if contract just to build – not to maintain • Use Model View Controller architecture and separate Model and View • Control is often the interaction between Model and View • So client is not same as user module; always separate business logic from user interface • Use GIS!
Web Services and M-MVC • Web Services are naturally M-MVC – Message based Model View Controller with • Model is Web Service • Controller is Messages (NaradaBrokering) • View is rendering As Controller
I: Data Mining and GIS Grid Data Mining Grid Databases with NASA, USGS features SERVOGrid Faults NASA WMS WFS3 WFS1 WFS2 WMS handling Client requests UDDI SOAP HTTP WMS Client WMS Client
Typical use of Grid Messaging in NASA Sensor Grid GIS Grid Grid Eventing Datamining Grid
Typical use of Grid Messaging Filter or Datamining Sensor Grid Post afterProcessing Post beforeProcessing Web Feature Service NaradaBrokering Notify WFS (GIS data) Grid Database Archives Subscribe HPSearch Manages GIS Grid WS-Context Stores dynamic data GeographicalInformation System
Filter PI Data Mining Filter WS-Context WFS3 GIS Grid Databases with NASA,USGS features SERVOGrid Faults I: Data Mining Grid WFS4 Pipeline SOAP UDDI HPSearchWorkflow NaradaBrokering System Services
Architecture • Consider requirements of application along side performance of computers and networks • Remember performance of hardware will increase as will cost of people • Don’t fix number of tiers but rather build system from entities linked by messages such as services linked by SOAP • Messaging good even if not SOAP • SOAP has “container overhead” • Build a data architecture in XML for all information that will be in messages • Use pointers internally to entities • Things in messages use system metadata to look up references • i.e. database lookup not hardware memory model • As before use the slowest most general method possible • Avoid unnecessary performance • Build a fault tolerance model into initial architecture
ATC Performance and Algorithm • Find size (in latency, bandwidth) of critical requirements • Use publish-subscribe technology to support link between data sources and programs • Introduces a few (1-5) millisecond delay but much easier to build and more fault tolerant • Prefer asynchronous links as makes more modular and more robust • Performance requirements drive architecture • Build hierarchical algorithm to match hierarchical architecture
How to become a Software Architect • Work hard! • Understand modern technologies and their trends so future enhances design choices • Be able to understand system (requirements) in a clear fashion • Be able to decompose systems in a clear methodical fashion • Isolate detail into modules and use two or three level programming model
Two-level Programming I Service Data • The Web Service (Grid) paradigm implicitly assumes a two-level Programming Model • We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies • C++ Java or Fortran Monte Carlo module • Data streaming from a sensor or Satellite • Specialized (JDBC) database access • Such services accept and produce data from users files and databases • The Grid is built by coordinating such services assuming we have solved problem of programming the service
Service1 Service3 Service2 Service4 Two-level Programming II • The Grid is discussing the composition of distributed serviceswith the runtime interfaces to Grid as opposed to UNIX pipes/data streams • Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs • Such interpretative environments are the single processor analog of Grid Programming • Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately
Web Service 1 WS 2 WS N-1 Web Service N 3 Layer Programming Model Level 1 Programming inside services Application expressed in in Java Fortran C++ MPI etc. WS-* Infrastructure Level 2 Programming choosing services by virtualization Application Semantics (Metadata, Ontology) Semantic Grid Level 3 Grid Programming composing multiple services Service Workflow, Transactions, Mediation Substantial work in UK e-Science program, international semantic web community
Plethora of Standards • Javais very powerful partly due to its many “frameworks” that generalize libraries e.g. • Java Media Framework • Java Database Connectivity JDBC • Web Services have a correspondingly collections of specifications that represent critical features of the distributed operating systems for “Grids of Simple Services” • About 60 WS-* specifications introduced in last 2-3 years • These are low level with higher level standards such as access database (OGSA-DAI) or “Submit a job” built on top of these • Many battles both between standard bodies and between companies as each tries to set standards they consider best; thus there are multiple standards for many of key Web Service functionalities • Microsoft a key player and stands to benefit as Web Services open up enterprise software space to all participants • e.g. MQSeries (IBM) and Tibco have to change their messaging systems to support new open standards
The WS-* Infrastructure • Core Grid Services build on and/or extend the 60 or so WS-* Infrastructure specifications which define • Container Model, XML, WSDL … • Service Internet ( (Reliable) Messaging, Addressing) including extensions for high performance transport and representation. This is natural basis for streaming applications • Service Discovery • Workflow and Transactions • Security • Metadata and State including lifetime • Notification • Policy, Agreements • Management (service interactions) • Portals and User Interfaces