490 likes | 891 Views
GETTING unstuck: working with legacy code and data. Cory Foy – http://www.cornetdesign.com. Goals. What is Legacy Code? How do we change Legacy Code? Common patterns for code bases
E N D
GETTING unstuck: working with legacy code and data Cory Foy – http://www.cornetdesign.com
Goals • What is Legacy Code? • How do we change Legacy Code? • Common patterns for code bases • Does Legacy Code have to be code, or can it be something else like a really long bullet on a PowerPoint slide, or perhaps a database? • Next Steps
Legacy Code • How do you define Legacy Code? • Several definitions possible • Code we’ve gotten from somewhere else • Code you have to change, but don’t understand • Demoralizing code (Big ball of mud) • Code without unit tests
Legacy Code • Code that needs to have behavior preserved • What is behavior? • The way in which someone behaves • The way in which a person, organism, or group responds to a specific set of conditions • The way that a machine operates or a substance reacts under a specific set of conditions
Legacy Code • What’s the behavior of the following code?
Legacy Code • Does the following code add behavior?
Legacy Code • Now have we changed the behavior?
How do we change Legacy Code? • Why would we want to change the code? • Four reasons to change software • Adding a feature • Fixing a bug • Improving the design • Optimizing resource usage • Each has unique attributes
Adding a feature / Fixing a bug • Causes the following changes • Structure • Functionality (adding or replacing) • Need to be able to know the new functionality works • Need to be able to know that the system as a whole is still functioning appropriately
Improving the Design • Causes the following changes: • Structure • Note that it does functionality is not listed above • Important to be able to know that all functionality works before and after the change
Optimizing Resource Usage • Changes • Resource usage • May cause structure change • Again note that functionality is ideally not in the above list • Need to have a way to make sure functionality was not changed • Need to have a way to verify the optimization goals have been met (and stay met)
Edit and Pray • Carefully plan the changes you are going to make • Make sure you understand the code to be modified • Make the changes • Run the system to make sure the change was made • Do some additional testing to smoke test that everything seems to be functioning • Pray you don’t get a call at 2am that the system doesn’t work anymore
Cover and Modify • Verify that the system is working by running the tests • Write tests to expose the behavior you want to add or change • Write code to make the test pass • Refactor duplication • Wash, rinse, repeat • Verify the system is still working by running the tests
Feather’s Legacy Change Algorithm • Michael Feather’s discusses a Legacy Code Change Algorithm in Working Effectively with Legacy Code • Five steps • Identify change points • Find test points • Break dependencies • Write tests • Make changes and refactor • These steps have common steps and scenarios
Patterns for the Change Algorithm • Identify Change Points • One of the key areas architects and architecture comes into play • If you aren’t sure where, put it in – you can refactor later (with unit test support)
Patterns for the Change Algorithm • Identify Change Points • Scenarios • I don’t understand the code well enough to change it • Notes / Sketching • Listing Markup • Separate Responsibilities • Understand method structure • Extract Methods • Effect Sketch • Scratch Refactoring • Delete Unused Code
Patterns for the Change Algorithm • Identify Change Points • Scenarios • My application has no structure • Tell the story of the system • Naked CRC (Class, Responsibility, and Collaborations) • Conversation Scrutiny
Patterns for the Change Algorithm • Find Test Points • Where can you write tests to exercise the behavior you want to add/change? • Important to have team standards for where unit tests should go
Patterns for the Change Algorithm • Find Test Points • Scenarios • I need to make a change, what methods should I test? • Reason about effects (Effect Sketch) • Reasoning Forward (TDD) • Effect propagation • Effect reasoning • Effect analysis
Patterns for the Change Algorithm • Find Test Points • Scenarios • I need to make many changes in one area – do I have to break all dependencies? • Interception Points • Higher-Level interception points • Pinch Points (encapsulation boundary) • Pinch Point Traps
Patterns for the Change Algorithm • Break Dependencies • Generally the most difficult part of the process • Usually don’t have tests to tell if breaking dependencies will cause problems
Patterns for the Change Algorithm • Break Dependencies • Scenarios • How do I know I’m not breaking anything? • Hyperaware editing • Single-goal editing • Preserve Signatures • Lean on the compiler • Pair Programming (aka Real-Time Code Reviews)
Patterns for the Change Algorithm • Break Dependencies • Scenarios • I can’t get this class into a test harness • Irritating Parameters • Hidden Dependencies • Construction Blob • Irritating Global Dependency • Horrible Include Dependencies • Onion Parameter • Aliased Parameter
Patterns for the Change Algorithm • Break Dependencies • Scenarios • I can’t run this method in a test harness • Hidden Methods • “Helpful” language features • Undetectable Side Effect • Sensing variables • Command/Query Separation
Patterns for the Change Algorithm • Break Dependencies • Scenarios • I need to change a monster method and can’t write tests • Introduce sensing variables • Extract what you know • Break out a method object • Skeletonize Methods • Find Sequences • Extract to the current class first • Extract small pieces • Be prepared to redo extractions
Patterns for the Change Algorithm • Break Dependencies • Scenarios • It takes forever to make a change • Understanding • Lag Time • Breaking Dependencies • Build Dependencies
Patterns for the Change Algorithm • Write Tests • Tests may be more difficult to write then normal unit tests • May have less-than-ideal scenarios
Patterns for the Change Algorithm • Write Tests • Scenarios • I need to make a change, but don’t know what tests to write • Characterization Tests • Characterizing Classes • Targeted Testing • Writing Characterization Tests • Write tests for the area you’ll be making the change. Write as many as you need to understand the code. • Then write tests for the things you need to change • If converting or moving functionality, write tests to verify the behavior on a case-by-case basis
DEMO: Change Algorithm at Work • Step through a common scenario, implementing the tests as we go
Legacy Code isn’t just Code • Most applications aren’t just simple console apps • They deal with many dependencies • File Systems • Registries • Databases • Hardware
Legacy Code isn’t just Code • These dependencies can cause legacy problems of their own • Database schemas • Existing data in the tables • Business logic in the database • No access to development data that mirrors production • In other words, Legacy Data
Legacy Data • So where does this Legacy Data come from? • Flat Files • XML Documents • RDB’s • Object DB’s • Other DB’s • Application Wrappers • Your DB • Many, many sources
Legacy Data • Legacy data produces its own unique set of challenges • Data quality • Data architecture problems • Database design problems • Process-related challenges
Data Quality • Common Data Quality problems http://www.agiledata.org/essays/legacyDatabases.html#DataProblems
Data Architecture Problems • Common Architectural Problems may include: • Applications responsible for data cleansing (instead of DB) • Different database paradigms • Different hardware platforms / storage • Fragmented / Redundant / Inaccessible data sources • Inconsistent semantics • Inflexible architecture • Lack of event notification • No or inefficient security • Varying timeliness of data sources
Design Problems • There may be key design issues with the database • Database encapsulation scheme exists, but it’s difficult to use • Ineffective (or no) naming conventions • Inadequate documentation • Original design goals at odds with current project needs • Inconsistent key strategy • Design goals at odds with data storage (treating relational DBs as object DBs, etc)
Design Problems • Example • Application which presented custom forms to users • Implementers could create custom forms with custom questions and validations • Beautiful OO architecture – Forms had Groups which had Items • Everything was rendered dynamically and could be updated on the fly
Design Problems • Example • The Form, Group, Item and other “objects” were all stored as individual records in one database table • A user in the system had on average 74 forms with an average of 30 questions. With a target of 20,000 users in the database, this would lead to over 50 million rows in the one table. • We identified one stored proc as one of the main culprits. It had something like the following
Design Problems • Example • INSERT INTO @tmpTable SELECT ot.myCol FROM OtherTable ot WHERE ot.bitMask & (144567 | 99435) = 0 • This led to a full table scan for one of their most heavily used procs – degrading performance significantly (average page load time of over 7 seconds)
Working with Legacy Data • So how do you deal with legacy data? • Strategies • Avoid it • Develop Error Handling Strategy • Work Iteratively and Incrementally • Prefer Read-Only Legacy Access • Encapsulate Legacy Data Access • Introduce Data Adapters for Simple Data Access • Introduce a staging database for complex access • Adopt Existing Tools
Working with Legacy Data • We couldn’t avoid the data – the proc had to be changed • So we developed an incremental 5 step plan • Add an IsValidRecord column to the table • Update the Column based on the bitmask for each row • Change the proc to use the column instead of the bitmask • Make sure all tests are still passing • Introduce Update and Insert Triggers to automatically populate the column
Working with Legacy Data • Advantages • Required no change to application code • We could rapidly test the application • We could make incremental changes to see improvements • What made it work • Testing/QA Database with production-like data • Regression tests to insure functionality • Timing tests to show performance improvement
Process Problems • All the issues aren’t technical • Working with legacy data when you don’t have to • Data design drives your object model • Legacy data issues overshadow everything else • App developers ignore legacy issues • You choose not to refactor the legacy data sources • Politics • You are too focused on the data to see the software
Refactoring Databases • Databases should not be left out of the refactoring process • “An interesting observation is that when you take a big design up front (BDUF) approach to development where your database schema is created early in the life of your project you are effectively inflicting a legacy schema on yourself. Don’t do this.” • Scott Ambler maintains a catalog of DB Refactoring • How do you refactor a database?
Refactoring Databases • Implementing Database Refactoring in your organization • Start simple • Accept that iterative and incremental development is the norm • Accept that there is no magic solution to get you out of your existing mess • Adopt a 100% regression testing policy • Try it
Next Steps • Dealing with legacy code is hard • Integration issues • Code Issues • Political Issues • There are ways out • Important to address pain points first
Next Steps • So where can you go from here? • Working Effectively With Legacy Code by Michael Feathers • Agile Database Techniques by Scott Ambler • Refactoring Databases by Scott Ambler • http://www.agiledata.org • NUnit, JUnit, CppUnit, CppUnitLite, dbFit, Fitnesse • http://www.cornetdesign.com