Traditionally ETL (extract/transform/load) products such as Informatica, Ascential and others have fulfilled the role of getting data out of source systems, dealing with inconsistencies between these source systems (transform) and then loading the resultant transformed data into a set of database tables (perhaps an operational data store, data marts or directly to a data warehouse).
However in the process of doing the "transform" a number of issues crop up. Firstly, you are embedding essentially what is a set of business rules (how different business hierarchies like product classifications actually relate) directly into the transformation rules. This is a dark place should you want to make sense of them in other contexts. If the rules are complex, which they may well be, then you can create a Frankenstein's monster of transform rules that become difficult to maintain, in a set of metadata that may be hard to share with other applications.
Moreover this is a one-way process. Once you have taken your various product hierarchies (say) and reduced them to a lowest common denominator form, then you can certainly start to analyze the data in this new form, but you have lost the component elements to all intents and purposes. These different product hierarchies did not end up different without some reason; they may reflect genuine market differences in different countries, for example. Moreover they may contain a level of richness that is lost when you strip everything down to a simpler form.
Ideally in a data warehouse you would like to be able to take an enterprise view, but also retain the individual perspectives of different business units or countries. For example it may be interesting to see the overall figures in the format of a particular business line or country. Now of course there are limitations here, since data from other businesses have not have sufficient granularity to support the views required, but in some cases this can be fixed (for example by providing additional allocation rules) and at least you have a sporting chance of doing something useful with the data if you have retained its original richness. You have no chance if it is gone.
Hence there is a strong argument to be made for an "ELT" approach, whereby data is copied from source systems pretty much untouched into a staging area, and then only from there is transformation work done on it to produce cross-enterprise views. If this staging area is controlled by the data warehouse then it is possible to provide other, alternate views and perspectives, possibly involving additional business metadata at this stage. The only real cost in this approach is some extra storage, which is hardly a major issue these days. Crucially, the transformation logic is held within the data warehouse, which is open to interrogation by other applications, and not buried away in the depths of a proprietary ETL format. Moreover, the DBMS vendors themselves have added more capability over the last few years to deal with certain transformations; let's face it, a SQL SELECT statement can do a lot of things. Since the DBMS processing is likely to be pretty efficient compared to a transformation engine, there may be performance benefits also.
This approach has been taken by more modern ETL tools like Sunopsis
, which is explicitly ELT in nature. Intriguingly, Informatica added an ELT option in Powercenter 8 (called the "PowerCenter 8 pushdown optimization"), which suggests that this approach indeed is gaining traction. So far, good on Sunopsis for taking the ELT approach, which I believe is inherently superior to ETL in most cases. It will be interesting to see whether Ascential also respond in a future release.