Tuesday, March 14, 2006

The unbearable brittleness of data models

An article in CRM Buyer makes an important point. It highlights that a key reason why customer data integration projects fail is the inflexibility of data model that is often implemented. Although the article turns out to be a thinly disguised advert for Siperian, the point is very valid. Traditional entity-relationship modeling typically is at too low level of abstraction. For example courses on data modeling frequently give examples like "customer" and "supplier" as separate logical entities. If your design is based on such an assumption, then applications based on this will struggle if one day a customer becomes a supplier, or vice versa. Better to have a higher level entity called "organization", which can have varying roles, such as customer or supplier, or indeed others than you may not have thought of at the time of the modeling. Similarly, rather than having an entity called "employee" it is better to have one called "person", which itself can have a role of "employee" but also other roles, perhaps "customer"for example.

This higher level of data modeling is critical to retaining flexibility in systems, removing the "brittleness" that so often causes problems in reality. If you have not seen it, I highly recommend a paper on business modeling produced by Bruce Ottmann, one of the world's leading data modelers and whose work has found its away into a number of ISO standards. Although Bruce works for Kalido, this whitepaper is not specific to Kalido but rather discusses the implications of a more generic approach to data models.

I very much hope that the so-called "generic modeling" approach that Bruce recommends will find its way into more software technologies. Examples where it does are Kalido and Lazy Software, and, although in idea rather than product form, in the ISO standard 10303-11, which covers a modeling language called Express that can be used to represent generic data models. It came about through work originated at Shell and then extended to a broader community of data modelers, including various academics, and was particularly aimed at addressing the problem of exchanging product models; it is known as STEP. However the generic modeling ideas developed with this have much broader application than product data. Given the very real advantages that generic modeling offers, it is to be hoped that more software vendors pick up on these notions, which make a real difference to the flexibility of data models, and hence improve the chances of projects, such as CDI projects, actually working in practice.

2 Comments:

Anonymous Karen Lopez said...

The generic concept you are looking for is Business Party / Party Role in a data model. This generalization allows modellers to support customers that become suppliers and vice versa.

However, the 2 biggest problems modellers have in getting this generalization into their database designs are:

1)Business change approval. changing business processes to actually recognize that the supplier is an existing customer is not just a technical problem -- it is much more of a business challenge.

2) Developer resistance: Many developers prefer very specific, application or window-driven tables. When presented with a generalization, they will resist the added complexity of having a concept transformed from a database object to a data value.

In my two decades of experience, I have always found that it is not the modeller that wants more specific structures, it is the business and developers who feel more comfortable with a very specific, tightly coupled database structure.

I'd love to produce models that follow your post.

3:11 PM  
Blogger Andy Hayler said...

You are certainly right in that developers will always prefer to follow a less generic model. This is even more the case when the model is highly generic, as in the generic entity framework used in the ISO work that I reference in the blog. In Kalido we store master data in an ultra generic form, but then generate a "normal" star schema from this so that developers can see something familiar for reporting purposes. It may be that the resistance you accurately describe will make it hard for generic models to take root outside software packages, where their generic nature can effectively be shielded from developers.

1:53 AM  

Post a Comment

Links to this post:

Create a Link

<< Home