390 likes | 411 Views
Learn the truth about data migration and the challenges involved in starting new IT processes from scratch. Understand the importance of preserving and transferring data accurately.
E N D
Where’s My Data?Application Migration & Integration CPTE 440 John Beckett
The “Green Field” Myth • Myth: There is no difference between starting a new IT process from scratch and migrating to a new process. • Truth: Migrating is often more difficult, because you must carry forward your data. • Truth: Managers actually know this about buildings, but fail to apply it to data.
For example: • Install new email client. • Oops: All your saved mail is lost. • Can’t convert, find, or move folders for some reason. • Oops: Your address book is gone. • Same problem • And this is an easy task! • If you put it into your work flow • Imagine a real-life scenario with multiple databases.
What Applications Need • Processing Power • Libraries & APIs • Read access to stored reference data • Write access to stored output data • Interaction with users • Maintenance
Processing Power Roughly in chronological order • Exclusive control of a computer • Control moderated by a “Monitor” or “OS” • OS time-slices between apps and users • May offload some chores to intelligent I/O or GPU • Virtualize multiple OSs in a single computer • Often includes multiple “Cores” or CPUs • Container system (e.g. Docker) Moore’s Law makes it possible IoT is an alternate migration path
Libraries & APIs Roughly in chronological order • Every instruction needed was contained within the executable code • System Intrinsics provide access to protected devices and file system to arbitrate between apps • Additional packages/APIs could be added to the system to host a given application’s needs • Virtualization protected apps from conflicting requirements • Containers provide another protection mechanism – providing OS-facing side and app-facing side
Read Access to Reference Data Roughly in chronological order • Apps accessed devices directly (cyl/sector) • OS mediates access to make it easier and less dangerous • “File System” concept arrives • Database provides a meta-data layer & security • File system extended to include sources outside the box • IOT/Services mentality extends further
Write Access to Stored Output Data • Parallels Read access • We’ll discuss some important differences later
Interaction With Users Roughly in chronological order • Bring your typed and hand-written data • Submit your punched cards • Use the terminal we’ve provided • Let your PC emulate a terminal • Run a Distributed Systems client on your PC • Move to Web/http technologies
Maintenance We’ll talk about this in CPTE 433
Solving the Email Client Issue • Find the former assets • Look in c:/Users/username/AppData • Probably the “roaming” profile • Look for the app’s name • Is the mailbox format compatible? • Perhaps you can import • Or…Copy to the new location • A messy process – but experience helps
The Bad News • You probably aren’t going to do anything “green field” unless you are starting a new company. • Even then, you’ll probably start small and do a larger-scale version later. • So you’ll almost always be migrating data at implementation time.
Data Migration ApproachesIt’s all in the Timing • Static • Hand Entry • OCR • Semi-Dynamic • Phased Roll • Process-Based • Live • Screen Scraping • EDI/EAI Connection
Migration AuditingHow Do You Know It’s Right? • Totals • Number of transactions, amount of money • Codas • Identifiable batches that you can compare • May have to be time periods • Statistical Methods • Are we seeing reasonable ratios (inc/expense, etc) • Select sample transactions at random, follow the full path
Method: Hand Entry • Read from paper, enter into keyboard • No electronic connection to old system needed. • Cleanup of old data done at data entry. • Appears to be simplest for computer people. • What about data entry error cleanup? • May be lots of work for clients. • Maybe not as much as you think • Default if other methods fail. • Probably used for some components anyhow. • How are you going to “prove” it is correct?
Hand Entry Choice Factors When to use • Small amount of data • Data is stable during conversion period • No other way to do it • Conceptually difficult to map former data onto new system • Perhaps the way the new system works is why you are changing! When not to use • Lots of data • Data changes during conversion period • Quality typing unavailable
Method: OCR • Optical Character Recognition • Depends on clear printouts • Extremely difficult to detect errors since they don’t come in any predictable pattern • Beckett’s take: Consider this a variant of hand entry requiring extensive error checking
Method: Phased Roll • “Roll” data from old system to new system on a per-component basis • For example: This week we’ll “roll” Payroll. Next month we’ll “roll” Accounts Payable. • Requires building a roll process for each set of data • Inter-Relationships can be complex • Key issue is which goes first and what data must be folded back to original system meanwhile
Campus Shop Roll (circa 1977) • Administrative system was being copied to new system every night • Campus Shop system was ready to go on new system (and old was unusable) • “Your ID card will work at the Campus Shop tomorrow” • Individual could be entered manually • Campus shop transactions copied back to old system at end-of-month statement time This is actually a very simple example. Real life is usually far more complex.
Phased Roll Choice Factors When to use When not to use Data relationships change during conversion period Data is tightly coupled between systems E.g. If the Campus Shop data needed to be available on the old system up-to-the-minute • Data and data relationships are stable during conversion period • People could tolerate the ID card delay • Data is loosely coupled between systems • Only one file needed to come back from Campus Shop, monthly
Method: Iterative • Like phased roll, except that: • A complete conversion is done repeatedly (daily?) • New systems are available only for read-access until all systems are complete • This is a Beckett-designed approach. Life is rarely so simple that you can do it this way – but I’ve done it twice
Iterative Method Choice Factors When to use • Large amount of data • Especially if large part is foundational to others (e.g. Academic Records) • Codas available • You are needing to do performance balancing • Data is not stable during conversion period • Need to show early success When not to use • Conversion matrix is too complex • No specific foundational data • Time required for conversion run is long • Extensive hand-manipulation is necessary
Method: Screen-Scraping • “Live” process emulates old workstations, captures data. • Usually requires some sort of “middle-ware” product • Requires former method to be character-mode (less likely in the future) • You could, however, use an HTML Mash-up
Screen-ScrapingChoice Factors When to use • No alternative access is available • Former system is frozen • Occasional errors are: • Detectable • Correctable • Acceptable if detected & corrected When not to use • Direct access to data is available • Former system is under development • Zero tolerance for data errors • Problem: No feedback Any time you have a better alternative
Method: Hand-Coded Interface • Write a program for each connection between source and destination • Or add module into existing program • Have the program invoked either continuously or periodically to transport across
Hand-CodedChoice Factors When to use • Very few interfaces • You can achieve the “liveliness” needed When not to use • Many interfaces are required
Method: EAI Electronic Applications Integration • “Middleware” product used to keep current data on both sides. • Use EAI language (or modules you program) to define interfaces
EAIChoice Factors When to use • Integrating with system under development • You need live data • Bandwidth is available • Interfaces work When not to use • Bandwidth and/or server overhead is intolerable • “Snapshot” view is preferred • Extreme security concerns about updating
Myths of EAI • Myth: • (N(N-1))/2 interfaces required for hand coding. • Only N interfaces required for EAI • Truth: • You don’t have all the modules sending data to each other. (e.g. Grades to Billing) • EAI doesn’t excuse you from designing all your interfaces. • But…EAI systematizes the process, and that is a very good thing – probably as important as the magic that was promised!
HTML Mash-Up • Write server-side code that reads a Web site • Parse the site to get the data you want • Present it in a useful manner to users
Choice Factors for Mash-Up Yes No The source data site changes often You have no further resources to spend on the project Downtime is intolerable • The source data site is stable • The mash-up is not more complex than you care to maintain • Downtime is tolerable if the source changes • Until you fix the problem, because you have no warning If a Mash-Up smells like screen scraping, you’re right!
What’s the Best Way? • It’s a designer’s choice: • Meet the needs • Accept the risks • Accept the costs
The Good News • This is in the middle a CSA/CIS person’s expertise. • Modest programming skills will help. • That’s what we mean by “scripting” • Success at data migration is a key to moving up in the organization. • It’s a bullet point you can add to your resume’.
Case Study: Hazel & Co.“Because you have other things to do” • Maid service • Began with three maids working from one location • Now has 500 locations with 3,000 maids • 200 locations company-owned • 300 locations franchised
Business Challenges • Quality service from minimum-wage people • Connect with external sources of info • Arrest records • Payroll data • Out-source (or in-source by our co. for franchisees)
History • Ran out of a checkbook register • Excel • Quickbooks • Got a payroll system • Got a General Ledger/Accts Pay system • Wrote conversion system to grab data from PR, GL, AP -> put into Quickbooks
Wrote a Web-based system that analyzes current info from PR, GL, AP – makes it look like Quickbooks • Requests for more / better info: • Write add’l modules for new system that gives that information • Write modules that create Excel sheets with desired analyses