460 likes | 804 Views
Data Conversion. Managing the Human Side of a Technical Process. CS474 – Mar 2, 2006. Introduction and Objectives. Primary objectives Help you to understand the process of converting from one database form to another Add some strategies to your toolkit for handling data conversion issues
E N D
Data Conversion Managing the Human Side of a Technical Process CS474 – Mar 2, 2006
Introduction and Objectives • Primary objectives • Help you to understand the process of converting from one database form to another • Add some strategies to your toolkit for handling data conversion issues • Provide you with an appreciation of the challenges of implementing a database system.
Conversion is not magic. What is data conversion? • Conversion is change. • Data conversion is change from one data form or format to another.
Data conversion • Requires thought and planning. • Requires asking some difficult questions • Value • Cost • Effort • Time • Requires work!
Data conversion vs data migration • Migration usually involves moving files/tables from one computer platform to another or one operating system to another. • Migration usually leaves the data intact. For example, if I was storing the name of a school as a 30 character value, it will remain a 30 character value. • Usually, underlying data structure remains the same.
Data conversion • Data conversion involves changing some of those underlying values. • Data conversion may pull apart or draw together data stored in different data files or different tables within the same database. • Data conversion may involve substantial changes to the way the data looks and behaves • Data conversion is involved when moving from one database application to another or one application system to another.
Data versus Information • Data is the raw values stored on the computer. • Information has meaning. • A programmer can write a program to move data. • An knowledgeable user must guide the programmer to insure that the data is moved in a way that has meaning.
Consider the datum – 3.172 • What is it? • In an academic setting it might be a GPA. • In weather, it might be a temperature. • In physics, it might be a measurement. • Data is simply a value…Information gives that value meaning.
Conversion • Involves manipulating data in one system and trying to preserves its information value in another.
Exercise • Do a manual data conversion.
What do you need to design the conversion? • Legacy (or source) data. • New (or target) database. • What else?
Tasks • Review the legacy data. Look at each of the samples and look for similarities but also differences in the data. • Begin with the setup tables. Assume that where appropriate we have foreign key constraints on the data. • Then fill in the other tables. Id and StudentId are synonyms. • Key fields are underlined. Notice the only surrogate key is the id number. • Ask questions where you are not sure of the semantic (or meaning) of the data.
What do you need for design? • Legacy (or source) data. • New (or target) database. • Data dictionary for both systems. • Knowledgeable user on the legacy system. • System support people on the legacy system.
Challenges you encountered. • Data in several different format. • Different values for the same item. • Data “forced” because it didn’t fit. • We had description, but what codes to use? • Or maybe we had codes, but what descriptions to use?
Data Conversion is a Cooperative Effort • The vendor who is charged with carrying out the conversion brings in a thorough knowledge of the new software system and its data requirements. • The user community brings in a (sometimes) thorough knowledge of what data they capture and how they use it.
Vendor goals in a conversion • Control size of project • Keep the process profitable • Keep the customer satisfied
Organization goals in a conversion • Keep costs low • Have minimal impact on the organization • Come out with a much better system • Have all the needed data in place
Conversion truism The very best job the conversion specialist can do is the very minimum that the customer expects.
A generic organization chart – Administrative Staff Executive Staff President/VP level Project Manager Departmental Directors – functional offices Departmental Directors – IT Support Staff Office Managers Clerical Staff Key User Key IT staffers
Executive Staff • What do they want out of the project? • $$$$ • Service/effectiveness increases • Must authorize the expenditures for the major projects. • Usually are directly involved in the purchase decision.
Director Level • Need more detailed information • Want little disruption in operation of their areas • Will be held to their budgets • Must authorize overtime • Usually involved in the purchase decision
Office manager • Usually is very close to the data. • Bridge between money people and “line staff” • Concerned about getting the day to day work done. • May not be involved in the purchase decision
Clerical staff • Limited resources. • Limited time. • WIIFM • Typically not involved in the purchase decision
How do we get them all to work together • Initial presentation (Conversion and project) • All are invited to attend • Includes president on down to individual end user • All hear the same message(s) • Schedule time for each group individually • Provide attention • Listen to their needs • Provide some solutions • Follow-up with written reports – There is no question later on about who heard what.
Data conversion provides a unique opportunity • To make data more useful • consistent code use • cross department data sharing • To clean house • remove data that you are not using • remove inconsistent data • change values which are undesirable (i.e.. random number vs. SSN as id’s).
Message • Cooperation among departments is necessary • Loss of data is okay if it is planned, and leads to a good outcome for the school.
Types of conversion • Manual (Time and effort) • Records are re-keyed into the system • Mechanical ($$) • Programs move and manipulate data • No conversion • Conscious decision • Typical (balance) • All three in some part
Message • In some cases you will need to re-key information. (Workload effect) • In some cases, you may decide to pay someone to write programs. (Cost effect) • And in some cases not converting some data is the best choice. • These are all possible choices to handle the needs across the institution.
Who does conversion? • You • Hired temps • 3rd party programmers • Combination • MESSAGE – There will be work involved of the line staff, the people that know the data and how it is used.
Some important terms • Legacy system – The old system or systems which are being converted. May be called “source” data. • New system is called the “target”. • Mapping – Process of developing the specification for the conversion. The Design • Translation – Process of changing one value to a new value on the basis of a cross referencing table.
Determine scope – under organization control Mapping (specification) – roadmap of the process Programming (implementation) – done off-site First full pass – validation activities involve the entire staff Validation and correction – its okay to find errors Final “go-live” run – we also talk about downtime, again so that all players know the ground rules Follow-up support Conversion life cycle
Show the places in the process where the organization must plan resources. Remind people of “down-time”. Sometimes go-live is not planned around all of the offices activity. Primary message is work! Plan for people to be needed to perform tasks periodically in the process. Message
How do we decide what to convert? • Cost effectiveness • Data integrity • Cross institutional consistency and concerns
Cost effectiveness • How many records are in each file/table? • Do you really use all of your data? • Do you really need all of your data? • How much time/money are you willing to invest? $$$ Scope Time
This message goes a long way • primarily with the executive and director level. • puts the customer in control • each person who is concerned begins to understand that they will have a say in how things are done • each person also begins to realize that they may have to trade off some data for value in the process
Do you havedata integrity? • Standards for names, addresses, use of id numbers, etc.? • Standards enforced? • Cross-institutional consistency? • Major codes • School codes • Prefix/suffix • System enforced consistency (codes, foreign keys).
What can we do about problems? • Data removal – Identify the bad values and do not bring those over. • Data combination and data translation – Use translation tables (old value to new value mapping) to take differing values and bring them under one approved set of values. • Example, multiple major codes or free format values. • Different uses of prefixes in different offices.
In Summary • To co-opt the user community – invite all to become involved. • Provide some consistent simple messages that they can hold on to throughout the process. Reinforce those messages • Provide good rationale for why the hard decisions need to be made (provide instruction and ammunition for the arguments). • Put the responsibility for decisions into the hands of the organization.
This is not being done TO them but WITH them.
Technical solutions to some problems • Mixture of values – translation tables. • Example, course names – put out all of the variants and let users choose. • Duplicate rows – algorithm for determining potential duplicates and combining their information. • Columns used for multiple purposes – translation tables • Setup data inconsistent in tables – usually we do setup as a manual conversion and then build translations to fit.
Go live process considerations • Criticality of the system • Normal “downtime” of the organization • Plan for the worst case • What do we do if?..... Backup and recovery • Preparation – confidence level with process time • Phased or all at once?