Article updated: 20 May, 2021
IBM has ended support for IBM PureData for Analytics, its name for Netezza. At Exasol, we’d like to help lessen the upcoming migration pain.
Here are our essential tips for migrating off Netezza, or any data warehouse, and onto a new solution such as the Exasol analytics database.
1. There are two parts to migration: data and code. Both deserve careful consideration
Data needs to be migrated with a laser focus on data integrity, while code migration is all about procedural languages. This is because contrary to SQL, there is no standard for database procedural languages. Even when there’s a common code base, such as Netezza’s stored procedure language (NZPLSQL), which is a derivative of Postgres’ PL/pgSQL language, there are differences and extensions that would make a lift-and-shift migration fail. This leads us to the next item, testing.
2. Test, test, test before you proceed. Prepare a runbook for the migration
Before migration, document and test every database interaction and touchpoint with external applications. The most robust way to do this is to build a QA or development server to test your applications. Also, factor in some load/performance testing so you can be confident there won’t be any unexpected surprises when you move into production.
Once the test process is complete and agreed upon, ensure there’s a full runbook for the migration, and that each stage of the process is documented, logged, and auditable.
3. Anticipate potential reasons for failure
By anticipating failures, you can take steps to mitigate them. Failures happen for several reasons that could result in partial copy or data integrity issues:
- Copy process failure
- Server crash, or target storage device is unreachable
- Data corruption.
Comprehensive testing will mitigate these issues, but developers should be on standby in case a new problem crops up.
Run the migration when it’s the least likely to cause disruption should a problem occur. Running the data migration on Monday morning when the network is at maximum capacity and reports are due isn’t the best timing. Find a quiet period after business hours or on the weekend to ensure success. Make sure maintenance jobs which might interfere with the migration are disabled or delayed.
4. You don’t have to do it all at once
It’s better to migrate the data gradually, a database or a table at a time, and run the two systems in parallel. You don’t have to lift-and-shift the whole data warehouse all at once, and you may find it’s lower risk to schedule it in a series of moves over several weeks – or even months. This also means you can choose slow querying tables to tackle first and start enjoying performance benefits straight away.
5. Make the most of the move. Clean up the data and reduce database vendor lock-in
Moving the data is a great excuse to take a good look at it. Do you have the optimal database schema? Is there duplication in the data or are there any data quality issues? Migrating the data is a good excuse to have a “spring cleaning” of the data.
In the process of rewriting queries, consider how you can make future migrations easier. For instance, consider sticking to ANSI SQL queries or use Loading...UDFs written in a standard, open language like R, Loading...Python, or Loading...Java. This reduces your lock-in to the platform, which means you stay agile in the future.
6. Use commodity hardware – it’s cheaper both in the short and the long-term
Although specialized appliances can offer innovative tricks to get around various performance bottlenecks, hardware development moves at such speed that your custom appliance will become obsolete in a few years.
With commodity hardware you enjoy a couple of benefits:
- It’s less expensive when you initially buy it
- You don’t need specialized system administrators
- It’s easier to replace with the next generation of hardware