Data migration projects are journeys and so as a starting point we categorise checklists into Scoping, Discovery and Delivery.
The first question to answer is around objectives and scoping - how does a cloud data migration provide data transformation?
Does it deliver:
For example, this line of questioning could have the output that “our fraud detection model would deliver an additional £3m per annum of benefit if we could run over 24 months of data rather than 6 months of data”.
Following stage one, we can look a bit more in depth by asking:
For example, “solving for use case A will deliver requirements for 5 other use cases and allows us to track sales on our eCommerce store within 3 seconds, allowing us to manage inventory more effectively and saving £5m in working capital”.
Finally, we should make sure to:
Focus on building the end to end use case and only build the capabilities required to support, for example if the first use case doesn’t contain PII then some of the data governance and security capabilities may not be initially required, and you can introduce them only when they are needed for support.
By following the above strategy, you should be able to achieve the results you need from your data migration. However, it is still worth bearing in mind the three main challenges that you can face - and seeing whether these apply to you or your migration strategy:
We can take a closer look at these three areas to address how businesses can adopt a more successful mindset when looking to migrate data.
Too often, data migration projects are technology-led activities, in which teams want to experiment with the latest and greatest tools.
Given the rapidly changing technological landscape over the past 10 years, it has been easy for organisations to have fallen into the trap of migrating multiple times to the latest technologies. For example, they may have first moved from a Data Warehouse to Hadoop on-premises (in itself a time-consuming activity), then from Hadoop to a cloud provider, and then repeatedly between different offerings from that same provider - such as DB on IaaS, then moving to DB on PaaS, and then onto serverless DB options.
Whilst it is perfectly sensible to keep abreast of technologies, it is far more important to keep a finger on the pulse of how the business currently uses technology, what the pain points that originate from that are, and how technologies may solve these going forwards.
We’ve seen several organisations operate Hadoop-on-IaaS as their target state for a data migration activity. The two most commonly cited reasons for this is that either the Hadoop-as-a-Service offerings from Cloud providers lack the enterprise-grade security required, or that the organisation would want to be able to port its data platform with ease.
This can be problematic as it effectively decreases the value of the cloud provider to simply an outsourced infrastructure provider, and ignores the large number of innovations in the public cloud in this very space - such as serverless data pipelines with Dataflow, PB-scale Data Warehousing with BigQuery, and scalable data storage with no limits with Google Cloud Storage.
This might sound a little odd, but many organisations plan out their data migration activities by looking at the breadth of data, or at least range of data platforms, and iteratively working their way through this data backlog. This poses a number of challenges:
Remaining aware of these challenges while following the template of the checklist can help ensure your success in data migration.