140 likes | 161 Views
Explore long-term system issues affecting statistical applications. Discuss design approaches, problems, solutions, and object-oriented design in statistical software development. Learn about the Object Paradigm, system modeling methodologies like UML, and UML features for systematic design and development. Discover UML Class Diagrams, interfaces, and components for enhanced system robustness and management of requirements.
E N D
Systems Architecture for Statistical Applications:Introduction and Overview Andrew WestlakeSurvey & Statistical Computing Wednesday 25th January 2006
Introduction • Systems Architecture for Statistical Applications • Not Features or Usability • Long-term issues that affect Statistical Systems • Ease of maintenance and enhancement • Responsiveness to developments in operating environments • Portability between computing environments • Interoperability with other related systems • Extensions by Users • Programme • Papers from developers of statistical systems • Describing different approaches • Discussing problems and solutions RSS/ASC Systems Architecture: Introduction and Overview
Some Issues • Statistical software has a small market • Limited development budgets • Early design and implementation decisions can be critical • Re-engineering is a major step • Statistical Software is different • Provides functionality for solving (a class of) problems • Not automation of tasks • More generalised than traditional application design • Need to exploit ideas and developments • Objects, components, standards, services, … * • Open source, Windows, Linux, Internet • Data warehouses & OLAP, Data mining, … • Levels of Abstraction/Generalisation • Different levels needed at different times in design and discussion • Confusion often due to discussion at the wrong (or different) levels RSS/ASC Systems Architecture: Introduction and Overview
Object-Oriented Design • Alternative way of thinking about software structure • An abstract model of programming • Developed in ’60’s and ’70’s • Greater Reliability, Ease of Maintenance • Objects have behaviour and own data • Avoidance of ‘side-effects’ • Compiler and Run-time system support • C++, Java, VB(?) … • Big influence on design of S • Academic and Commercial input • Ideas and concepts from abstract work by academics • Developed, extended and realised by commercial developers RSS/ASC Systems Architecture: Introduction and Overview
The Object Paradigm • Objects are Instances of Classes • Classes define shared structure (attributes) and behaviour (methods) • Objects have Identity, Information and State (attribute values) • Created and destroyed dynamically at run time, can be persistent • Encapsulation • Objects receive Messages invoking Behaviour • Includes changing and returning attribute values • Can only access the attributes of an object through its public methods • Inheritance • New classes can be defined as specialisations of others • Inherit structure and methods, but can alter and extend • Polymorphic Methods • Methods behave differently for different classes, so response depends on type of object receiving message • E.g object knows how to Display itself • Object sending message does not need to worry (much) RSS/ASC Systems Architecture: Introduction and Overview
System Modelling Methodologies: UML • Need recognised for systematic design and development methods • Management of complexity • Identification and control of requirements • Ease of maintenance • Feedback and validation from Users • Various conflicting systems proposed • Task force of Object Management Group: OMG • Produced the Unified Modelling Language: UML • Rumbaugh, Jacobson and Booch • Supports design from User Requirements to Code Production • Development Methodologies built around UML • Agile-, Extreme-, Feature-Driven-, Iterative-, Unified-, … Development RSS/ASC Systems Architecture: Introduction and Overview
UML Features • Formal specification of Language and Semantics for design of systems (now version 2.0) • Includes formalised diagram types and elements • Activity, Class, Component, Deployment, Sequence, State, Use Case, … Diagrams • Aggregation, Generalisation, Cardinality, Classification, Concurrency, Constraints, Dependency, Interfaces, Synchronicity, Visibility, … elements, attributes, facets • Various packages support complete development from design to code generation • Poseidon, Rational (IBM), Together (Borland), Visual Studio, … • Essentially independent of implementation language* • Can be used informally for early design stages (e.g. Visio) • Difficult to learn thoroughly • Good overview in UML Distilled, Martin Fowler (A-W, 2004) • Not perfect – some areas under-developed, some omissions • No alternative is as well established or supported RSS/ASC Systems Architecture: Introduction and Overview
A UML Class Diagram RSS/ASC Systems Architecture: Introduction and Overview
Interfaces and Components • Formal definition of Interfaces is an aspect of Encapsulation • Straight forward within a single system • Improves robustness of the system • Idea extended to distributed components and systems • Independent components on the same system, eg COM objects, Active-X • Servers and Clients on same or different systems, eg database servers (ODBC), web servers (HTML), distributed data archives (RDF) • Distributed processing on specialised servers, eg DCOM, Web Services, Grid • Difficult issues for management of communication channels • Message language, message structure and protocol, service discovery • All being resolved through industry collaboration building on academic ideas RSS/ASC Systems Architecture: Introduction and Overview
Distributed Architecture • Construct system from components that communicate through messages • May be remote – message security and transport handled by Internet (for example) • Use the best components for the job, only develop the bits no one else does • For example, use SQL Server for data store, with access control, Apache to deliver displays to users, R for statistical calculations and charts, … • Can distribute almost anything: processing power, algorithms, data, knowledge, metadata, … • Benefits • Cheaper – you only have to build your bits • Better – get the best products for the other bits • Problems • Overheads in communication – can be avoided with clever design • Have to agree on message mechanisms – or follow a standard • Cost of other components – but many are effectively free • The future of Computing Systems RSS/ASC Systems Architecture: Introduction and Overview
XML – eXtensible Markup Language • Markup Language • Text with Tags (<Field>field contents</Field>) • Identifies an Element of type Field with content field contents • Content of an element can be simple or complex • Numbers, strings, etc., or combinations of other elements • Nested Tags (elements) => multiple hierarchies • Generic syntax for languages • Tags not defined, only the language structure • XML instance document contains complex structure of information as linear text – ideal for messages and other interchange • XML is a Standard from W3C (based on SGML) • Generic tools to read and write XML in programs • Schema (XSD) for defining rules about Tag names and structure • Style sheets (XSL/T) for transforming XML to some other text form • For example, HTML for display, text script to drive a program, a different (equivalent) XML structure for another context • Can use UML to design the logical structure and specify the semantics • Can generate XML schema (XSD) • For example, hyperModel workbench, by David Carlson, www.xmlmodeling.com RSS/ASC Systems Architecture: Introduction and Overview
XML Fragment – Metadata for a model <Parameter Name="FlowWithin" ElementType="Matrix" Terminal="true"> <Tag TagName="Description">Factors associated with flow within Zones (so Destination is the same as Origin).</Tag> <Dimension ClassificationName="OriginZones"/> </Parameter> </Parameters> <Relationships> <Relationship RelType="Stochastic" Name="Estimate 1 distribution"> <Tag TagName="Description">Poisson distribution for observations in first estimate set, based on common rates.</Tag> <RelInput> <ParRef Name="Flow"/> </RelInput> <RelOutput> <VarRef Name="FlowEstimate1"/> </RelOutput> <RelStochastic> <DistPoisson> <Rate> <ParRef Name="Flow"/> </Rate> </DistPoisson> </RelStochastic> </Relationship> RSS/ASC Systems Architecture: Introduction and Overview
XML processed to HTML Relationships: RSS/ASC Systems Architecture: Introduction and Overview
Programming Languages • Is Fortran dead? • Not according to Microsoft • Have rediscovered the idea of language-independent intermediate code (runtime – LIR) • Ideal for UML modelling approach • System functionality provided at runtime level, so the same for all languages • New compilers only have to do language translation • Requires a common programming model • Or at least a subset of the runtime model • Allows closely coupled components to be written in different languages • May be the answer for legacy systems RSS/ASC Systems Architecture: Introduction and Overview