140 likes | 158 Views
Systems Architecture for Statistical Applications: Introduction and Overview. Andrew Westlake Survey & Statistical Computing Wednesday 25 th January 2006. Introduction. Systems Architecture for Statistical Applications Not Features or Usability
E N D
Systems Architecture for Statistical Applications:Introduction and Overview Andrew WestlakeSurvey & Statistical Computing Wednesday 25th January 2006
Introduction • Systems Architecture for Statistical Applications • Not Features or Usability • Long-term issues that affect Statistical Systems • Ease of maintenance and enhancement • Responsiveness to developments in operating environments • Portability between computing environments • Interoperability with other related systems • Extensions by Users • Programme • Papers from developers of statistical systems • Describing different approaches • Discussing problems and solutions RSS/ASC Systems Architecture: Introduction and Overview
Some Issues • Statistical software has a small market • Limited development budgets • Early design and implementation decisions can be critical • Re-engineering is a major step • Statistical Software is different • Provides functionality for solving (a class of) problems • Not automation of tasks • More generalised than traditional application design • Need to exploit ideas and developments • Objects, components, standards, services, … * • Open source, Windows, Linux, Internet • Data warehouses & OLAP, Data mining, … • Levels of Abstraction/Generalisation • Different levels needed at different times in design and discussion • Confusion often due to discussion at the wrong (or different) levels RSS/ASC Systems Architecture: Introduction and Overview
Object-Oriented Design • Alternative way of thinking about software structure • An abstract model of programming • Developed in ’60’s and ’70’s • Greater Reliability, Ease of Maintenance • Objects have behaviour and own data • Avoidance of ‘side-effects’ • Compiler and Run-time system support • C++, Java, VB(?) … • Big influence on design of S • Academic and Commercial input • Ideas and concepts from abstract work by academics • Developed, extended and realised by commercial developers RSS/ASC Systems Architecture: Introduction and Overview
The Object Paradigm • Objects are Instances of Classes • Classes define shared structure (attributes) and behaviour (methods) • Objects have Identity, Information and State (attribute values) • Created and destroyed dynamically at run time, can be persistent • Encapsulation • Objects receive Messages invoking Behaviour • Includes changing and returning attribute values • Can only access the attributes of an object through its public methods • Inheritance • New classes can be defined as specialisations of others • Inherit structure and methods, but can alter and extend • Polymorphic Methods • Methods behave differently for different classes, so response depends on type of object receiving message • E.g object knows how to Display itself • Object sending message does not need to worry (much) RSS/ASC Systems Architecture: Introduction and Overview
System Modelling Methodologies: UML • Need recognised for systematic design and development methods • Management of complexity • Identification and control of requirements • Ease of maintenance • Feedback and validation from Users • Various conflicting systems proposed • Task force of Object Management Group: OMG • Produced the Unified Modelling Language: UML • Rumbaugh, Jacobson and Booch • Supports design from User Requirements to Code Production • Development Methodologies built around UML • Agile-, Extreme-, Feature-Driven-, Iterative-, Unified-, … Development RSS/ASC Systems Architecture: Introduction and Overview
UML Features • Formal specification of Language and Semantics for design of systems (now version 2.0) • Includes formalised diagram types and elements • Activity, Class, Component, Deployment, Sequence, State, Use Case, … Diagrams • Aggregation, Generalisation, Cardinality, Classification, Concurrency, Constraints, Dependency, Interfaces, Synchronicity, Visibility, … elements, attributes, facets • Various packages support complete development from design to code generation • Poseidon, Rational (IBM), Together (Borland), Visual Studio, … • Essentially independent of implementation language* • Can be used informally for early design stages (e.g. Visio) • Difficult to learn thoroughly • Good overview in UML Distilled, Martin Fowler (A-W, 2004) • Not perfect – some areas under-developed, some omissions • No alternative is as well established or supported RSS/ASC Systems Architecture: Introduction and Overview
A UML Class Diagram RSS/ASC Systems Architecture: Introduction and Overview
Interfaces and Components • Formal definition of Interfaces is an aspect of Encapsulation • Straight forward within a single system • Improves robustness of the system • Idea extended to distributed components and systems • Independent components on the same system, eg COM objects, Active-X • Servers and Clients on same or different systems, eg database servers (ODBC), web servers (HTML), distributed data archives (RDF) • Distributed processing on specialised servers, eg DCOM, Web Services, Grid • Difficult issues for management of communication channels • Message language, message structure and protocol, service discovery • All being resolved through industry collaboration building on academic ideas RSS/ASC Systems Architecture: Introduction and Overview
Distributed Architecture • Construct system from components that communicate through messages • May be remote – message security and transport handled by Internet (for example) • Use the best components for the job, only develop the bits no one else does • For example, use SQL Server for data store, with access control, Apache to deliver displays to users, R for statistical calculations and charts, … • Can distribute almost anything: processing power, algorithms, data, knowledge, metadata, … • Benefits • Cheaper – you only have to build your bits • Better – get the best products for the other bits • Problems • Overheads in communication – can be avoided with clever design • Have to agree on message mechanisms – or follow a standard • Cost of other components – but many are effectively free • The future of Computing Systems RSS/ASC Systems Architecture: Introduction and Overview
XML – eXtensible Markup Language • Markup Language • Text with Tags (<Field>field contents</Field>) • Identifies an Element of type Field with content field contents • Content of an element can be simple or complex • Numbers, strings, etc., or combinations of other elements • Nested Tags (elements) => multiple hierarchies • Generic syntax for languages • Tags not defined, only the language structure • XML instance document contains complex structure of information as linear text – ideal for messages and other interchange • XML is a Standard from W3C (based on SGML) • Generic tools to read and write XML in programs • Schema (XSD) for defining rules about Tag names and structure • Style sheets (XSL/T) for transforming XML to some other text form • For example, HTML for display, text script to drive a program, a different (equivalent) XML structure for another context • Can use UML to design the logical structure and specify the semantics • Can generate XML schema (XSD) • For example, hyperModel workbench, by David Carlson, www.xmlmodeling.com RSS/ASC Systems Architecture: Introduction and Overview
XML Fragment – Metadata for a model <Parameter Name="FlowWithin" ElementType="Matrix" Terminal="true"> <Tag TagName="Description">Factors associated with flow within Zones (so Destination is the same as Origin).</Tag> <Dimension ClassificationName="OriginZones"/> </Parameter> </Parameters> <Relationships> <Relationship RelType="Stochastic" Name="Estimate 1 distribution"> <Tag TagName="Description">Poisson distribution for observations in first estimate set, based on common rates.</Tag> <RelInput> <ParRef Name="Flow"/> </RelInput> <RelOutput> <VarRef Name="FlowEstimate1"/> </RelOutput> <RelStochastic> <DistPoisson> <Rate> <ParRef Name="Flow"/> </Rate> </DistPoisson> </RelStochastic> </Relationship> RSS/ASC Systems Architecture: Introduction and Overview
XML processed to HTML Relationships: RSS/ASC Systems Architecture: Introduction and Overview
Programming Languages • Is Fortran dead? • Not according to Microsoft • Have rediscovered the idea of language-independent intermediate code (runtime – LIR) • Ideal for UML modelling approach • System functionality provided at runtime level, so the same for all languages • New compilers only have to do language translation • Requires a common programming model • Or at least a subset of the runtime model • Allows closely coupled components to be written in different languages • May be the answer for legacy systems RSS/ASC Systems Architecture: Introduction and Overview