The Right Way to Migrate Data to the Cloud

By: Morpheus Data

More and more of your organization’s data will reside on public cloud servers, so have a plan in mind for getting it there.

When you’re migrating your company’s data from your premises to the cloud, the standard IT model is to do it half way: Move some of your data to public cloud services, and leave some of it in the data center. After all, IT has a well-earned reputation for doing things carefully, deliberately, and in stages.

And that’s exactly how your data-migration project will fail … Carefully. Deliberately. In stages.

Since the dawn of computers people have been moving data between discrete systems. What has changed is the tremendous amount of data being migrated, and the incredible diversity in the type and structure of data elements. We’ve come a long, long way from paper tape and ASCII text. The traditional elements of data migration are evident in SAP’s description of its BusinessObjects migration service:

  1. Purge and cleanse data.

  2. Devise mapping/conversion rules.

  3. Apply rules to extract and load data.

  4. Adjust rules and programs as testing dictates.

  5. Load test using a subset of data selected manually.

  6. Test and validate using a large amount of automatically extracted data.

  7. Load all data into the “acceptance system.”

  8. Load all data into the pre-production system.

  9. Validate the converted data, and get sign-off from users/owners.

  10. Load all data into the production system, and get final sign-off from all stakeholders.

Avoid the ‘Code, Load, Explode’ cycle of expensive migration failures

Cloud migration smashes the old-style iterative conversion process to bits. The reason is simple: The frequency of data migrations is increasing, but costs and migration failure rates are rising just as fast.

A May 5, 2015, article by Andre Nieuwendam on PropertyCasualty360 describes the “Code, Load, Explode” cycle of data-migration and extract-transform-load (ETL) projects. You develop the programming logic, run a test load, it fails, you adjust your data assumptions, retest, it fails again, etc. The problem with the ETL approach is that it can’t accommodate the almost-infinite number of data variables the migration is likely to encounter.

data-migration-chart

 

The ETL process for migrating data from source databases to JSON files. Source: Code Project

Even with the increasing prevalence of unstructured data, migrations deal more with records, or groups of related data elements, rather than with the discrete elements themselves. Many migration problems are caused by identical data elements with very different meanings based on the unique context of each record.

Say good-bye to what you know and hello to what you don’t

The lack of easy-to-use migration tools is exacerbating the growing complexity of data migration. As Computerworld’s Ben Kepes explains in an October 7, 2015, article, building a public cloud infrastructure bears little resemblance to devising a data-center architecture. The quintessential example of this is Netflix, whose public cloud is based on massive redundancies, planning for failure, agile development, and “nimble” monitoring and management. Kepes points out that these attributes are sorely lacking in the typical in-house network architecture.

Netflix is far from the only cloud-migration success story, however. In an October 8, 2015, post, Data Center Knowledge’s Yevgeniy Sverdlik describes successful cloud-migration projects at General Electric and Capital One. One thing the two companies have in common is that they both have to collect data from millions of widely distributed end points. This demonstrates a principal benefit of a public-cloud architecture over in-house IT: Your data lives closer to your users/clients/customers.

data-migration-chart-2

 

Two big advantages of a public-cloud infrastructure are elasticity (you pay for only the resources you use) and faster access to your data. Source: AutomationDirect.com

In GE’s case, the network end points are wind turbines, aircraft engines, and manufacturing equipment of every description. During the company’s three-year migration to a cloud infrastructure, it will reduce the number of data centers it maintains from 34 to only four, all of which will be used to house GE’s most sensitive, valuable data. The data from the other centers will be migrated to Amazon Web Services.

Sverdlick quotes IDC researcher Richard Villars speaking at the recent AWS re:Invent conference in Las Vegas: “People just want to get out of the data center business.” This fact is evident in Capital One’s plans to reduce its data-center count from eight in 2014 to five in 2016 and three in 2018. Two principle benefits of migrating data and apps to the cloud are faster software deployments, and the elasticity to increase or decrease use on demand (such as for Black Monday).