380 likes | 392 Views
Spreadsheet Design Concepts. Simon Murphy simon.murphy@codematic.net Developer – Codematic Ltd. Patrick. Very disappointed not to be here Has made his session available for EUC attendees only at: www.sysmod.com/pobssauditaug05.pdf Be quick I guess he’ll remove it soon.
E N D
Spreadsheet Design Concepts Simon Murphy simon.murphy@codematic.net Developer – Codematic Ltd
Patrick • Very disappointed not to be here • Has made his session available for EUC attendees only at: • www.sysmod.com/pobssauditaug05.pdf • Be quick I guess he’ll remove it soon
Spreadsheet background • 5-60 Mb size • 20-200,000 formulas • 1-10,000 unique formulas • 5-10,000 lines of VBA • £1M to Billions in values • Often linked to other technologies such as OLAP, ADO, COM or .net etc. • Finance, Banking and Sales and Marketing areas • Development cost up to $1M • Active member of Eusprig – European Spreadsheet Risk group – dedicated to raising awareness of dangers and error rates in commercial spreadsheets www.eusprig.org
Agenda • Fundamental Imperative • Security • Software Development Lifecycle • Design • Technology choice • Inputs • Data • Logic/Formulas • Documentation • Summary • Any Questions
Fundamental Imperative • Manage complexity(McConnell) • Solution complexity grows at 4 x the rate of problem complexity. (Glass) • [Things] should be as simple as they can be, but no simpler (Einstein) • K.I.S.S. • This principle should drive all other work. • Easier to build, easier to test, easier to document etc…
Security • Is everybody’s concern • Spreadsheets can be used as a staging board for privilege escalation (with your login details!) • Consider SD3 +C • Secure by • Design • Default • Deployment • Communication • Threat Modelling – Assets, Threats • Threat Types – STRIDE • (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) • Threats – rate with DREAD • (Damage potential, Reproducibility, Exploitability, Affected Users, Discoverability) • Spreadsheets (all flavours) are fairly insecure • Compiled UDFs (.net, COM, XLL) and Database servers can help • Set macro security to high and use code signing certificates. • See Microsoft MOC 2840A – Implementing security for more info.
Software Development Lifecycle • Systems Development lifecycle • Requirements, • Analysis, • Logical Design, • <Technology Choice>, • Physical Design, • Construction, • Test, • Release, • Maintain. • In some shape or form. • Spreadsheet Development lifecycle – “Oh! I need a model” – clickety-click, reasonableness check, release, (Test & Maintain in live environment).
Big Design Issues • Project Scope • Business needs • Security • Performance • Maintainability • Extensibility • Availability • Scalability • Human factors • Integration • Methodologies
Design – Excel or not • Excel/VBA is often not technically the best solution • Databases better for large volumes of data • Compiled languages better for security • Spreadsheets are a 2 dimensional tool where most business problems are 5+ dimensions • It is good when considering • Initial speed of development • Cost of initial development • Current skill sets • Simple initial deployment Note: If you need to restrict Excel functionality you may be better with .net and a grid control
Assuming Spreadsheet • Design • N-Tier • Defensive Designs • Inputs • Data • Layout • Formulas • VBA • Extending Excel • Documentation
N-Tier • Basic version is 3 tier • Data layer • Logic (or analysis) layer • Presentation layer • Each of these may be further broken out (into N tiers) • Easily implemented in worksheets • Suitable for most non trivial workbooks • Powerful and flexible but adds weight
Basic design (N-tier) Model Support structure Inputs Assumptions Reference Data WorkbookStructure Logic Accountability Reports Revision history Demo
Defensive Designs • How will I test this? • Clearly identify Input areas • Group items that need updating with similar frequency • Place formulas where they are safe from accidents • Don’t use sheet protection, its weak, annoying, counter productive, and reduces peoples ability to check and understand your logic (it creates more problems than it solves) Use a compiled component if security is a genuine concern. Demo • Use cross checks • Use a (simple) consistent convention to separate items that should be treated differently (inputs/formulas or actual/forecast) • Use blanks cells around different blocks (enables ‘Current Region’) • Consider using a status sheet with summary error reports on it • Consider support structure – Index, assumptions, accountability. • Consider personal and tool comfort zone
Inputs • Never trust inputs • Always test for correctness as soon as possible • Limit inputs wherever possible (eg option buttons, or drop downs) • Use data validation but be aware of its limits demo • Help users by clearly identifying what is needed • Give clear feedback on errors or problems • Minimise input – Let Excel do the work • Have validation formulas nearby • If test is ok say “ok” (or “Row/Column check ok”) not “”
Database (Relational) • Useful for flexible and efficient storage and updating • Identify Entities (nouns), Attributes (adjectives) and relationships (verbs) in problem statement • 2 approaches • Top down (identify entities then attributes) • Bottom up (group attributes to describe entities) • Data normalisation • Basically keep related items together • Provides design flexibility • 3rd Normal Form (TNF/3NF) • All items depend on the key (a unique identifier), the whole key and nothing but the key. • More useful at the transaction processing end rather than analysis and reporting. (eg sales recording) • Analysis performance can be poor • PL demo
Database (OLAP) • Useful for flexible/powerful reporting. • The most useful concept of Dimensionality – the number of ways to describe something • In a spreadsheet it’s the row, column, worksheet, workbook, and maybe directory. • Very useful for things that may ordinarily be missed • How to describe the numbers (dimensionality) • Eg Inflation <Time?, Country?, Cost Type?> • Eg. P&L value <P&L line description, Time, Business unit, value type (actual, budget etc), currency, Company etc> • Useful for layout – time in cols or time in sheets? • Hierarchies are ways of adding up dimension elements • There may be more than one hierarchy for any given dimension • Eg time • day > week > quarter > financial year • day > calendar month > calendar year • Reporting performance can be excellent as many calculations pre-aggregated
Spreadsheet eg. of Dimensionality 2 338 is Period 8 2005 Actual Figure For Employment Costs For the North For Large Co
Pivot Tables • Probably the most important feature in spreadsheets • A superb way to manage complexity • Most users can’t access them because their data is already half pivoted. • Repeated blocks are a strong hint to use pivots • Pivot source data should have 1 one column with numbers in (slightly simplified) • Demo
Software Development Principles • Modularisation • Cohesion • Coupling • Fan in / Fan out
Modularisation • Basic idea – to break down complexity into understandable chunks (note Millers Theorem (7+/-2)) • Advantages • Simplifies and adds analysis layers • Adds flexibility • Improves robustness • Reduces dependencies • Improves testability • Disadvantages/Limitations • All Cells can be read from everywhere • Can add redundancy • Not really applicable for very small models • Use of a block of cells for one (single) task Volume Summary Volume Calcs (eg Geographic) Sales Price Total Revenue Total Volumes
Modular blocks A ‘block’ is a area of cells surrounded by empty cells, that performs some analysis
Cohesion • How interrelated a unit is • High cohesion means all elements are highly interrelated – this is good, it aids understanding and reduces range of influence • Low cohesion makes things harder to understand like random letters • Cohesion is like well normalised data – but also considers what the unit does • Cohesion test…
Cohesion 2 • A<kfj\s =]’#\kdjw487 • Give me all your chocolate
Cohesion 3 • Who can remember the first string? • The second?
Cohesion 4 • Keep different shaped data blocks apart • Either diagonally or on separate sheets • Try to make all formulas in a block similar
Coupling • How strongly 2 separate elements depend on each other • Low coupling is better, especially through clearly defined interfaces • High coupling often means hidden dependencies which generally leads to incorrect modifications (side effects) • Example: hard coded cell addresses in VBA code
Fan in / fan out • Low fan in – a ‘unit’ (cell/worksheet/VBA routine) depends on only a few other units • Good because it minimises dependencies and reduces complexity • High fan out – a ‘unit’ (cell/worksheet/VBA routine) is used by many others • Good because it minimises duplication • Example: putting VAT (Sales tax) rates in their own cells, and referring there in calculations • Example: calculating an offset once and using the result many times
Logic/Formulas • Sketch design on a whiteboard first • Don’t use IF(ISERROR(, be specific demo • Put expected part first in Ifs • Don’t start in A1 – Give yourself room to manoeuvre. Try D10 – hide the unused. • Line sheets up on first data cell rather than headers • Don’t hide rows and columns, use a separate sheet if possible • Use Goal seek and or VBA rather than circular references • Avoid more than 2 or 3 levels of formula nesting – break it out across several cells. • Use tools – many pay for themselves on first use. But don’t totally rely on them. • Place totals above and to the left of details (more flexible and robust eg links, filtering) • Build for testability
VBA Connection • VBA UDFs should get all range info via parameters • Where (non UDF) VBA uses worksheet ranges, these should almost certainly be named ranges. • Sheet1.[Inflation].value = 0.02, rather than • Sheet1.[C5].value = 0.02, which may become invalid if rows or columns are inserted or deleted • strInflationRange = “C5” is just as bad – it creates a hidden dependency that must be manually updated when worksheet changes (poor coupling) • Variable and routine names: • Use very meaningful names (8-30 chars length) • Use a simple naming convention (matched pairs) • Use scope prefixes (g, m) • Don’t use data type variable prefixes (s, str, l, lng etc) (see .net advice) • avoid abbreviations • Don’t use code comments demo make code ultra clear instead • Option Explicit on • Avoid Application.Run (non VBA – breaks error management) • routines 1 screen long max • 3-4 levels of nesting max • Use source control
Extending Excel • Excel is powerful not perfect • Leverage benefits whilst managing weaknesses by using complimentary technologies • Data: VBA, ADO, ODBC, OLAP, .net, COM, Info Bridge, XML, DDE, Web Queries, SOX Solution Accelerator • Logic: VBA, COM, .net, xll, Pivots, Filters • Search the web for vast array of free or cheap tips and tools.
Documentation • If you design for simplicity only very minor additional documentation is needed • Design and build the user Interface with the Users needs primary, the documentation will automatically be there • Design and build the business logic parts with the maintainers needs primary, the main documentation will automatically be there, expand as required. • Integral tests should explicitly clarify intent. • Reports should contain enough description to be meaningful. • External documentation is almost always so out of date its worse than useless. • Excessive documentation is too hard to plod through • Poorly targeted documentation is pointless • Working software is more useful than documentation • Sometimes documentation is important
Summary • Manage complexity • Consider security carefully • Excel/VBA not the best tool for everything • Be defensive, especially with inputs • Understand your data • Aim for cohesive models with low coupling • Use names to connect VBA to worksheets • Use complimentary technologies where appropriate • Manage documentation
Discussion points • VBA comments – Good or Bad? • My view - bad • Test before release – Good or Bad? • My view - bad
Questions? • simon.murphy@codematic.net • Spreadsheet consulting, reviewing, maintaining, rescuing, migrating, add-in development etc. • Websites • www.codematic.net • www.xlanalyst.co.uk