6 things to consider when migrating your data warehouse from Netezza
IBM recently announced that it was ending support for IBM PureData for Analytics, its name for Netezza, forcing clients to migrate onto a different data warehouse solution. Migrating between data warehouses is a potentially complicated business.
Here are our top six tips for migrating off Netezza, or any data warehouse, and onto a new solution such as our analytics database:
1. Migrating data warehouses is a process of two halves – data and code. And you need to carefully consider both.
On one hand, you have to migrate the data in the database with a laser focus on data integrity, on the other you need to migrate the codebase, which can be tricky if a database has made significant use of procedural languages. This is because contrary to SQL, there isn’t a standard for database procedural languages. Even when there’s a common code base, such as Netezza’s stored procedure language, NZPLSQL which is a derivative of Postgres’ PL/pgSQL language, there are differences and extensions that would make a lift-and-shift migration fail. Which leads us onto testing.
2. Test, test, test before you proceed. Prepare a run-book for the migration.
It’s important to test every database interaction and touch-point on the database by external applications before migrating. The most robust way to do this is to build a QA or development server to test your applications before migrating. Also, factor in some load testing too so you can be confident there won’t be any unexpected surprises when you move it into production.
Once the test process is complete and the process is understood, ensure there’s a full run-book for the migration, and that each stage of the process is documented, logged, and auditable.
3. Anticipate potential reasons for failure.
By anticipating failures, you can take steps to mitigate them. Failures could happen for a number of reasons: there could be problems with the data transfer – the copy process could fail, the server could crash, the target storage device could become unreachable, or data could be corrupted during the migration. This could result in a partial copy, or data integrity issues.
If you’ve robustly tested the new codebase, you shouldn’t have too many problems there. But developers should be on standby in case a new bug crops up.
Finally, you should run the migration when it’s likely to succeed – and the least likely to cause disruption if there is a problem. Running the data migration on Monday morning when the network is at maximum capacity and reports are at their most urgent probably isn’t the best idea. Find a quiet period after business hours such as at the weekend to ensure success – but double check maintenance jobs which might interfere with the migration are disabled or delayed.
4. You don’t have to do it all at once.
Sometimes it’s a better solution to migrate the data gradually, a database or a table at a time, and run the two systems in parallel. You don’t have to lift-and-shift the whole data warehouse all at once, and you may find it’s lower risk to schedule it in a series of moves over a number of weeks – or even months. This also means you can choose slow tables or queries to tackle first and start enjoying performance benefits straight away.
5. Make the most of the move. Clean up the data and reduce database vendor lock-in.
Moving the data is a great excuse to take a good look at it. Do you have the most appropriate database schema? Is there duplication in the data or are there any data quality issues? Migrating the data is a good excuse to have a spring clean of the data.
In the process of rewriting queries, it’s worth considering how you can make a future migration easier. For instance, consider sticking to ANSI SQL queries or use Loading...UDFs written in a standard, open language like R, Loading...Python or Loading...Java. This reduces your lock-in to the platform, which means you stay agile in the future.
6. Use commodity hardware – it’s cheaper both in the short and the long-term.
Enjoy the benefits of using off-the-shelf commodity hardware. Although specialised appliances can offer innovative tricks to get around various performance bottlenecks, hardware development moves at such speed that your custom appliance will become obsolete in a few years.
With commodity hardware you enjoy multiple benefits:
- It’s cheaper when you initially buy it
- It’s cheaper to upgrade or repaid it
- It’s cheaper to replace with the next generation of hardware
With careful planning, you can ensure your data migration project is brought in on time and on budget, and with the right decisions, you can reduce the total cost of ownership.