How to migrate your data in seven steps
Finding a data migration process that balances cost and rapid delivery
Historically, data migration projects have had a tendency to fail. According to Bloor Research, as many as 60 percent do not succeed.* How can you prevent failure when migrating your data? Follow this seven-step process, then download this SAS white paper to learn more.
1) Source system exploration
The first phase of a data migration project is to identify and explore the source data. The most appropriate route for identification is to group data, customer names, addresses and product descriptions based on the target model.
Although the source systems may contain thousands of fields, some might be duplicates or not be applicable to the target system. In this stage, it is critical to identify which data is required and where it is, as well as what data is redundant and not required for the migration.
Conversely, if the initially identified sources do not contain all of the data required for the target model, a gap is
identified. In this case, you may have to consolidate data from multiple sources to create a record with the correct set of data to fulfill the requirements of the target.
Using multiple data sources allows you to add another element of data validation and a level of confidence in your data.
At the end of this phase, you will have identified the source data that will populate the target model. You will also have identified any gaps in the data and, if possible, included extra sources to compensate. Optimally, you will have broken down the data into categories that enable you to work on manageable and possible parallel tasks.
2) Data assessment
The next logical phase is to assess the quality of this source data. If the new system fails due to data inconsistencies, incorrect or duplicate data, there is very limited value in migrating data to the target system. To assess the data, we recommend profiling the data.
Data profiling is the process of systematically scanning and analyzing the contents of all the columns in tables of interest. Profiling identifies data defects at the table and column level. Data profiling is integral to the process of evaluating the conformity of the data and ensuring compliance to the requirements of the target system.
The profiling functions include examining the actual record value and its metadata information. Too many data migration initiatives begin without first examining the quality levels of the source data. By including data profiling early in the migration process, the risks of project overruns, delays and potentially complete failures are reduced.
Through the use of data profiling, you can:
- Immediately identify whether the data will fit the business purpose.
- Accurately plan the integration strategy by identifying data anomalies up front.
- Successfully integrate the source data using an automated data quality process.
The output of this phase of the project is a thorough understanding of the data quality in the source systems, identification of data issues and a list of defined rules to be built to correct them. You will have identified and defined your data quality rules and mappings from the sources to the target model. At this point you will also have a good idea, at a high level, of the design of the integration processes.
3) Migration design
In the migration design phase, the main tasks will be to define the technical architecture and design of the migration
processes. In addition, you will define the testing processes and how to transition them to the production system. You will also determine whether there will be a parallel run, a zero-downtime migration, or whether you will be expecting to complete the migration and simply “decommission” the old system.
In this phase, it’s important to put your plans for the next four steps down on paper. Include timelines, technical details and any other concerns or approval requirements so that the entire project will be documented.
4) Migration build
Be careful about using a “just enough” development approach with your data migration, simply because the migration will be executed only once and there will be limited reuse of the code. These assumptions explain why data migrations are prone to failure.
A typical sequence to follow when developing a migration is to subset the data and test one category of data at a time (e.g., product or customer). In the case of larger projects, you can develop and test each category in parallel. Testing the migration solution is usually an iterative approach. Start by checking the components individually in small subsets to ensure the mappings, transformations and data cleansing routines are working. Then, increase the data volumes and eventually link all of the components together into a single migration job.
Output from this phase results in a fully tested data migration process that is scalable, reliable and can deliver the migration within the allocated time.
After comprehensive testing, the time comes to run the migration. In the majority of cases, the source systems are shut down while the migration executes. To minimize impact, this is likely to occur over a weekend or public holiday. In some cases, where the source applications are required to run 24/7, 365 days a year, a zero-downtime migration approach may be needed. This requires a tool to provide the initial load processes, with additional data synchronization technology to capture any changes that took place during the initial load; then, it requires synchronizing the source and target data after the initial load finishes.
At some point after the data has been migrated, decide when to move to the new system, and where appropriate, retire the old system. During the execution phase, audit trails and logs will be created to ensure that all data has been correctly migrated and, when appropriate, that the correct synchronization has been achieved. Finally, after reviewing the audit trails and logs, you will be prepared to make the decision to transition users to the new system. Note that although audit trails and logs are invaluable, it also may be worth profiling the current source and target systems to ensure synchronization is correct.
In the case of a zero-downtime migration, this becomes more complex because updates on the systems may be occurring while a synchronization check is occurring. As a result, there may always be some delay in the synchronization. In this case, tools can verify synchronization by taking into account the lag between source system changes and data replication into the target system.
As part of the design process, a system retirement policy will be created to address the old system. There will also be ongoing data quality enhancements. Remember that because not all source systems will be retired, data quality issues may be identified and will need rectifying in several of the source systems. You will also need
to manage ongoing improvements and monitor the data quality of the new system.
Before you know it, there will be the inevitable next migration project. At this point, it is well worth identifying existing components that can be reused, as well as possible improvements that can be applied to the next migration solution.
David Barkaway is the Data Integration Solution Manager for the SAS Global Technology Practice.
David Barkaway, Data Integration Solution Manager for the SAS Global Technology Practice
THE R’S OF DATA MIGRATION
A migration solution must have the following characteristics:
Robust and resilient: manage all aspects of the data extraction, transformation, cleansing, validation and loading into the target – and manage high volumes of data, errors in source and target connections, and disk space and memory problems.
Rapid: execute efficiently and take advantage of existing source or target facilities to enable rapid processing.
Reporting: provide progress indicators during migration and reconcile the completed process.
Recoverable: recover from the point of failure when necessary.
Reusable: ability to reuse components of the migration in other projects, including transformation functions, error handling and data cleansing routines.