This question originates more from an "enterprise architecture" point of view. (Reason being, our company is just starting to get into data management, and actually having DBA's -- I know, horrific that it took so long to get to data management and having DBA's, but bear with me).
I'll lay some background first.
I am researching on how to move data from a legacy system to the new replacement system we are building. This data is more "lookup" data than what i would call "business", "customer", or "domain" data (if that makes sense). I guess the way I think of it is this is data that provides end-user's a choice and is then associated to the other types of data.
The challenge is that until we roll off our legacy systems we don't want to maintain the (legacy) data in two systems. We will need to maintain new data that can't work with the legacy system in the new system (so in that sense we will have dual maintenance). But the data that is common we want to initially push or pull from the legacy system.
This means some form of ETL, whether that be manually coded, or whether we use a product. (From a code perspective that means for us either RPG-since the legacy system is iSeries and DB2 for i-or Java-since the new system is Java EE-based with DB2 on AIX). From a product standpoint that probably means IBM InfoSphere DataStage (as that is product we have chosen to use for ETL for warehousing).
Now to the actual point of my question: The legacy system has data in non-normalized tables that rely on natural keys. The new system is in normalized tables that look more "object oriented" in their design (because of the heavy use of Hibernate). These tables rely more on the traditional integer-based primary key.
I am wondering, how does one best map from a system with natural keys to one with integer-based keys? How do you maintain this mapping so you know when to update records and when to insert new records into the integer-based system.
Any recommendations or best practices in general?
And then as an add on: Anything specific to RPG, Java, or InfoSphere DataStage?
Any ideas are appreciated, as these may be incorporated into our company's best practices we are attempting to build.