Wednesday, April 05, 2006

Integration or management?

I enjoyed a thoughtful article by Colin White, long-time database expert and general guru, about data integration and MDM. In it he defines an architecture for data integration that splits out the technologies from the different techniques and supporting applications. He also points out a couple of important things: one is that CDI is not a very useful term, since it implies that it is only about integrating customer data. This is very true, and applies to MDM in general. It is not the case that CDI just means taking half a dozen separate customer data sources and somehow banging them together. In reality it will not be practical to end up with one master source for customer (or pretty much any master) data, so what is important is to be able to to catalog what is out there, map the differences in definitions so that sense can be made of these differences, and a process defined so that changes can be propagated in a controlled way to the various sources. This is much more than just synchronizing updates between SAP and Siebel. Beyond simple things like names and addresses, you also need to consider how customer data is to be classified e.g. into different a market segments or demographic groups. Changing this classification is a non-trivial business process that will require input from various people within (and possibly beyond) marketing, and will likely involve multiple versions that need to be discussed, tested and modified before being published, and then propagated into the various operational systems. As Colin says, "management" is a much more appropriate term here than "integration": this may seem an esoteric point, but names matter (if you doubt this, ask Vauxhall, whose "Nova" car means "no go" in Spanish)

Another point well made by Colin is how the term "real time" is regularly abused. Since most business intelligence requires some form of analysis, it is rare indeed that it needs to be truly "real time" e.g. looking at the buying patterns in a retail store by branch may usefully show all sorts of things (which items are moving, which promotions are working etc) yet this information has no more meaning of you get it at 14:15 than if you get it at 14:05, or indeed at 11:32. Having it a few minutes more "real time" adds no meaning, yet will cost dramatically more in terms of IT complexity and cost. I would argue that there are very few things indeed in business intelligence that truly require real-time data feeds. Certain operational queries may need this e.g. checking a customer's credit rating, or looking at overall trading exposure before placing a trade, but these are a small subset of what is usually termed business intelligence.


