Friday, March 31, 2006

A bit poor

You may recall my blog on SAP's farcical claims about its software's impact on company profitability. It looks like someone with more time on their hands than me actually checked up on the figures and found these lacking, in addition to the lack of logic in the original claim. Nucleus Research, who are noted for their rigor with numbers, found that in fact that SAP customers (identified by being listed on SAP's web site) were 20% less profitable than their peers, rather than 32% more profitable. Of course this is not quite the same thing, but it is amusing: it suggests that only SAP's identified reference customers are relatively unprofitable. Perhaps the ones who keep quiet are doing OK? As I noted earlier, the SAP claim was deliberately skewed to exclude all financial institutions (which share the twin characteristics of being highly profitable and rarely using SAP) while anyhow the notion that the choice of your ERP systems provider is a cause of either good or bad profits is both logically flawed and also deeply amusing to those of us who have watched companies spend billions implementing SAP to little obvious effect in terms of hard business benefits.

Good on Nucleus for poking further holes in this especially egregious piece of over-marketing. Bruce Brien, CEO of Stratascope, the company that did the market research for SAP, reacted by sayng:“They’re making an implication that my numbers can’t prove, but it’s a marketing message. Companies do that all the time,” he says. Oh well, that's all right then.

Cognos recovers somewhat

Cognos announced its full year results, notably seeing a recovery in license revenues to USD 118M in their fourth quarter (i.e. Q1 2006) after the disappointing Q4 2005 results. It was also important to note that the company closed 18 deals over a million dollars in size, which was another marked improvement on the previous quarter. Profit margins were a healthy 18%. Still, license revenue was actually down compared to the same quarter a year ago (USD 130M) while overall revenues at USD 253M for the quarter was slightly down on the same period last year. Actually shrinking is not generally a cause for celebration in a software company, so it is a measure of just how bad Cognos' previous quarter was that these results were generally greeted with relief.

This (relative) recovery all bodes well for the broader sector, and indicates that Cognos' stumble at the end of 2005 was to do more with company-specific issues (limited deployment of its new product line) than with any general slow-down in the business intelligence market (which just about every analyst predicts will grow at a healthy clip in 2006). In the medium term, Cognos faces the same issues as other BI suppliers: the relative saturation of the market, and the ever-growing threat from Microsoft.

The ratchet goes up a notch

Back last year I wrote about the creeping progress of Microsoft into the business intelligence arena. In CBR Madan Sheina, (one of the smartest analysts in the industry by the way), examines the latest move in this direction, the SQL Server 2005 suite's enhanced business intelligence offerings. The new ETL offering SSIS (previously DTS) will be of interest, although its SQL Server ties may limit the take-up of this relative to database-neutral offerings. However the new Analysis Services and Reporting Services promise to ratchet up the pressure on the pure-play BI players, Business Objects, Cognos and the rest. I have long argued that the most ubiquitous BI tool is actually Excel, and that given that people already know this, an ideal BI tool for many users would be one which magically got the data they wanted out of a data warehouse directly into an Excel pivot table. Yes, there will always be a subset of power users for who this is not enough, but in the vast majority of cases this will actually do the trick. Other tools (visualization, data mining etc) would be relegated to niches if this were to happen significant niches perhaps, but niches nonetheless.

Business Objects has done well because of its semantic layer, the "universe", which overlays something closer to a business view on top of data marts and warehouses; this imposes some maintenance overhead but this is acceptable to users since it represents the data in a more business-like form. However Business Objects has always struggled with its OLAP capability relative to competitors. Cognos by contrast, had the best OLAP tool out there in Powerplay, but a rather ordinary reporting offering. These two vendors pretty much carved up the market between them, though in a growing market there was enough room for other tools like Microstrategy, Actuate etc as well. Microsoft's new suite poses a potent threat to most of these BI vendors, since most users do not use more than a tiny fraction of the features of a BI tool, so adding more features just to stay ahead of Microsoft is ineffective; the end users simply don't need more features. With its low price point and "good enough" features, the Microsoft tools are likely to gradually eat into the market share of the independent vendors. Nothing dramatic will happen overnight, and the curious restraint of Microsoft from serious marketing of its tools to the enterprise will also slow progress. What was the last time you saw a webinar or advert for Analysis Services? Compare and contrast with Business Objects, which is a marketing machine.

However, just like a pack of hunting dogs wearing down a large prey animal, the Microsoft tools can just edge up on the BI vendors in reach with each release, secure in their Office base that they control what users really want: Excel.

Thursday, March 30, 2006

Iteration is the key

Ken Pohl writes a thoughtful article on the issues of project management of a data warehouse project, and how this can differ from other IT projects. As he points out, a data warehouse project is unusual in that it is essentially never finished - there are always new sources to add, new types of analysis the customers want etc (at least there are if the project is a success: if it failed then at least you won't have too many of those pesky customer enhancement requests).

As the article points out, a data warehouse project is ideal for an iterative approach to development. The traditional "waterfall" approach whereby the requirements are documented at ever greater levels of detail, from feasibility through to requirements through to functional specification etc is an awkward approach. I have observed that in some companies the IT departments have a rigid approach to project management, demanding that all types of projects follow a waterfall structure. This is unfortunate in the case of data warehouse projects, where end-users are often hazy on requirements until they see the data, and where changing requirements will inevitable derail the neatest functional specification document (see diagram).
Given a 16 month average elapsed time for a data warehouse project (TDWI) it is almost certain that at least one, and possibly several, major changes will come along that have significant impact on the project, which in a waterfall approach will at the very least cause delays and may put the entire project at risk.

By contrast a data warehouse project that bites off scope in limited chunks, while retaining a broad and robust enterprise business model, can deliver incremental value to its customers, fixing things as needed before the end users become cynical, and gradually building political credibility for the warehouse project. Of course the more responsive to change your data warehouse is the better, but even for a traditional custom build it should be possible to segment the project delivery into manageable chunks and deliver incrementally. The data warehouse projects which I have seen go wrong are very often those which have stuck to a rigid waterfall approach, which makes perfect sense for a transaction processing system (where requirements are much more stable) but is asking for trouble in a data warehouse project. Ken Pohl's article contains some useful tips, and is well worth reading.

Unifying data

I can recall back in the early 1990s hearing that the worlds of structured and unstructured data were about to converge. A decade on, and despite the advent of XML, and that prospect still looks a long way off. It is like watching two people who have known each either for years and are attracted to each other, yet never seem to find a way of getting together. Some have argued that the data warehouse should simply open up to store unstructured data, but does this really make sense? When DBMS vendors brought out features allowing them to store BLOBS (binary large objects) the question should have been asked: why is this useful? Can I query this and combine it usefully with other data? Data warehouses deal with numbers (usually business transactions) that can be added up in a variety of ways, according to various sets of business rules (such as cost allocation rules, or the sequence of a hierarchy), which these days can be termed master data. The master data gives the transaction data "structure". A Powerpoint slide or a word document or an audio clip tends not to have much in the way of structure, which is why document management systems place emphasis on attaching keywords or tags to such files in order to give them structure (just as web pages are given similar tags, or at least they are if you want them to appear high up in the search engines).

You could store files of this type in a data warehouse, but given that these things cannot be added up there is little point in treating them as transactions. Instead we can consider them to be master data of a sort. Hence it is reasonable to want to manage them from a master data repository, though this may or may not be relevant to a data warehouse application.

I am grateful to Chris Angus for pointing out that there is a problem with the terms 'structured data' and 'unstructured data'. Historically the terms came into being to differentiate between data that could at that time be stuffed in a database and data that could not. That distinction is nothing like as important now and the semantics have shifted. The distinction is now more between data constrained by some form of fixed schema and whose structure is dictated by a computer application v data/documents not constrained in the same way. An interesting example of "unstructured data" that is a subject in its own right and needs managing is a health and safety notice. This is certainly not just a set of numbers, but it does have structure, and may well be related to other structured data e.g. HSE statistics. Hence this type of data may well need to be managed in master data management application. Another example is the technical data sheets than go with some products, such as lubricants; again, these have structure and are clearly related to a traditional type of master data, in this case "product", which will have transactions associated with it. Yet another would be a pharmaceutical regulatory document. Hence "structure" is more of a continuum than a "yes/no" state.

So, while the lines are blurring the place to reconcile these two worlds may not be in the data warehouse, but in the master data repository. Just as in the case of other master data, for practical purposes you may want to store the data itself elsewhere and maintain links to it e.g. a DMBS might not be an efficient place to store a video clip, but you would want to keep track of it from within your master data repository.

Wednesday, March 29, 2006

Microsoft MDM? Don't hold your breath

At a conference this week at which Microsoft explained how it intends to unify its rambling applications offerings, Mike Ehrenberg (architect for Microsoft's MBS products) mentioned that Microsoft was "investigating"an MDM product offering. It should be said that Microsoft should be in an excellent position to understand the problem of inconsistent master data, at least within their own portfolio of business software products. Through a series of acquisitions they have assembled no less than five distinctly overlapping products for SMEs, and have manifestly failed to explain how any of this resembles a strategy. This mess has enabled innovative newcomers like Ataio make steady progress in what should really be Microsoft's natural turf, as customers have been bemused by Microsoft's seeming inability to articulate which technologies they were really intending to invest in. The answer, it seems, is all of them - MSFT will "converge" their five products "no sooner than 2009" (unofficially, 2011 is a target date I have heard from an insider). The most amusing line in the article was: "The MBS products, Gates said, "have more head room for growth than just about any business we're in." This is about as backhanded a compliment as one can think of: I have heard that Microsoft management is very unhappy about the lack of progress in this division, so this comment is like saying to a sports team that just came bottom of the league "we now have more room to improve than anyone".

Microsoft seems perennially to struggle in the enterprise software market, despite its vast resources, huge brand and marketing clout. It essentially stumbled into the DBMS marketplace; I have it on good authority that Gates originally approached Larry Ellison with a view to bundling Oracle as the DBMS on Windows NT, and it was only after being spurned that Microsoft decided to launch SQL Server out of the ashes of the Sybase code-base it had purchased (this is a piece of hubris that Oracle may live to regret). In Excel and Analysis Services Microsoft has the most ubiquitous business intelligence software out there, yet has hardly any mind-share in this market. Perhaps it is just not in Microsoft's DNA to really relish the enterprise software market, when its business model is above all about high volume, and large enterprises demand endless tinkering and specialization of software to their specific needs.
Based on the train-wreck that is Microsoft's enterprise applications strategy, I wouldn't count on a strong MDM product entry any time soon.

Tuesday, March 28, 2006

Data warehouse v master data repository

Bill Inmon notes that "Second-generation data warehouses recognize the need for tying metadata closely and intimately with the actual data in the data warehouse". This is indeed a critical point, and is at the heart of why all those enterprise data dictionary projects in the 1990s (and even 1980s; sad to say I am old enough to have been involved with one in the 1980s) failed. Because the dictionaries were just passive catalogs, they were of some use to data modelers but otherwise there was little incentive to keep them up to date. In particular, the business people could not see any direct benefit to them, so after the initial project went live the things quietly got out of date. In order for such initiatives to succeed it is critical that the business metadata (more important than the technical metadata) is tied into the actual instances of master data, so that the repository does not just list the product hierarchy structure (say) but also lists the product codes that reside within this structure. Ideally, the repository would act as the primary master source of master data for the enterprise, and serve up this data to the various applications that need it, probably via an automated link using middleware such as Tibco or IBM Websphere. Not many companies have taken it to this stage, but there are applications at BP and Unilever that do, for example.

However one important architectural point is that you may not want the data warehouse to actually manage all the master data directly; instead it may be better to have a separate master data repository. The reason for this apparently odd approach is that in a data warehouse you want the data to be "clean" i.e. validated, conforming to the company business model etc. On the other hand master data may have separate versions, drafts (e.g. draft three of the planned new product catalog) that need to be managed, and potentially "dirty" master data that is in the process of being improved or cleaned up. Such data has no place in a data warehouse, where you are relying on the integrity of the numbers.

Hence a broader picture may see an enterprise data warehouse alongside a master data repository, the latter feeding a "golden copy" of master data to the warehouse, just as it will feed the same golden copy to other applications that need it. With such an approach, and current technology, those old enterprise modeling skills might just come in handy.

Incidentally, spring is definitely in the air in Europe. The sun is out in London, there is a spring in people's step, and the French have called a general strike.

Monday, March 27, 2006

Putting lipstick on a caterpillar does not make a butterfly

Oracles recent repackaging of its BI offerings appears to be just that: a repackaging of existing technologies, of which of course they now have a lot. Peoplesoft had EPM, which had a mediocre reputation, but they did better with Siebel, who had astutely acquired nQuire, a good product that was relabeled Siebel Analytics. Oracle also has Discoverer, a fairly blatant rip-off of Business Objects, a series of pre-built data marts for Oracle apps as well as assorted older reporting tools developed along the way, like Oracle ReportBuilder, which seems to me strictly for those who secretly dislike graphical user interfaces and yearn for a return to a command prompt and "proper" programming. This assortment of technologies has been placed into three "editions", but you can scour the Oracle website in vain for anything which talks about the actual integration of these technologies at anything below the marketing/pricing level. Hence it would seem that customers will still essentially be presented with a mish-mash of tools of varying quality. Perhaps more R&D is in the works to integrate the various BI offerings properly, but it seems that for now Oracle still has some work to do in presenting a coherent BI picture. Business Objects and Cognos will not be quaking in their boots.

Tuesday, March 21, 2006

When did "tactical" become a dirty word?

A new report from Butler Group bemoans the "tactical" use of business intelligence tools on a piecemeal, departmental basis, calling for an enterprise-wide approach. However it rather misses the point about why this state of affairs exists. The author reckons "Business will only be able to improve its information services, and obtain real value from the ever-increasing data silos that it continues to generate, when it accepts the significant advantages to be gained from integrating and standardizing its approach to the management of its BI technology services." Or, to paraphrase: why on earth are you deploying separate departmental solutions, you bunch of dimwits?"

As I have discussed previously on this blog. There are actually several reasons why most BI initiatives are departmental, often using different tools. It is not that the business people are a crowd of masochists. The first reason is that a lot of BI needs are legitimately local in nature, specific to a department or operating unit. It is dramatically easier for a department to set up a data mart that has just its own data, and stick on top of that a reporting tool like Business Objects or Cognos, than it is to wait for the IT department to build an enterprise warehouse, which takes 16 months on average to do, costs 72% of its build costs every year to support, and then usually struggles to keep up with changing business requirements.

So it is not a matter of "accepting the significant advantages" of an enterprise approach. Everyone accepts that such an approach would be preferable, but the IT industry has made it very, very difficult to actually deliver in this promise, and people naturally fall back on "tactical" (i.e. working) solutions when grandiose ones fail. Ideally you would like an enterprise data warehouse, deployed in a few months, that can deal with business change instantly, and can at the same time both take an enterprise view and respect local departmental business definitions and needs, which will differ from those of central office. The trouble is, most companies are not deploying data warehouses like this, but are still stuck in a "build" timewarp, despite the existence of multiple packaged data warehouses which can be deployed more rapidly, and in at least one case can deal with change properly. Until this mindset changes, get used to a world with plenty of "tactical" solutions.

Monday, March 20, 2006

The data warehouse market breaks into a trot

The latest figures from IDC (who, by the way, are by far the most reliable of the analyst forms when it comes to quantitative estimates) is that the data warehouse market will grow at a 9% compound rate from now through to 2009, reaching USD 13.5 billion in size (up from USD 10 billion today), as reported in an article on the 17th of March. Gartner also reckon that this market is growing at twice the pace of the overall IT market (their estimates are slightly lower, but would trust IDC' more when it comes to figures). It would be interesting to see the proportion of this that is packaged data warehouse software (see the recent report by Bloor) but unfortunately they don't split out the data in this way. This figures does not include services, but based on other analyst estimates this market is at least three times this size; there never seems to be any shortage of need for systems integrators.

Given all the billions spent on ERP systems in the last ten years or so, it is about time that more attention was paid to actually trying to make sense of the data captured in these and other transaction processing systems, which for a long time have consumed the lion's share of IT development budgets. After all, there is likely to be more value in spotting trends and anomalies in the business than in merely automating processes that were previously manual, or in just shifting from one transaction processing system to another.

Thursday, March 16, 2006

Should ETL really be ELT?

Traditionally ETL (extract/transform/load) products such as Informatica, Ascential and others have fulfilled the role of getting data out of source systems, dealing with inconsistencies between these source systems (transform) and then loading the resultant transformed data into a set of database tables (perhaps an operational data store, data marts or directly to a data warehouse).
However in the process of doing the "transform" a number of issues crop up. Firstly, you are embedding essentially what is a set of business rules (how different business hierarchies like product classifications actually relate) directly into the transformation rules. This is a dark place should you want to make sense of them in other contexts. If the rules are complex, which they may well be, then you can create a Frankenstein's monster of transform rules that become difficult to maintain, in a set of metadata that may be hard to share with other applications.

Moreover this is a one-way process. Once you have taken your various product hierarchies (say) and reduced them to a lowest common denominator form, then you can certainly start to analyze the data in this new form, but you have lost the component elements to all intents and purposes. These different product hierarchies did not end up different without some reason; they may reflect genuine market differences in different countries, for example. Moreover they may contain a level of richness that is lost when you strip everything down to a simpler form.

Ideally in a data warehouse you would like to be able to take an enterprise view, but also retain the individual perspectives of different business units or countries. For example it may be interesting to see the overall figures in the format of a particular business line or country. Now of course there are limitations here, since data from other businesses have not have sufficient granularity to support the views required, but in some cases this can be fixed (for example by providing additional allocation rules) and at least you have a sporting chance of doing something useful with the data if you have retained its original richness. You have no chance if it is gone.

Hence there is a strong argument to be made for an "ELT" approach, whereby data is copied from source systems pretty much untouched into a staging area, and then only from there is transformation work done on it to produce cross-enterprise views. If this staging area is controlled by the data warehouse then it is possible to provide other, alternate views and perspectives, possibly involving additional business metadata at this stage. The only real cost in this approach is some extra storage, which is hardly a major issue these days. Crucially, the transformation logic is held within the data warehouse, which is open to interrogation by other applications, and not buried away in the depths of a proprietary ETL format. Moreover, the DBMS vendors themselves have added more capability over the last few years to deal with certain transformations; let's face it, a SQL SELECT statement can do a lot of things. Since the DBMS processing is likely to be pretty efficient compared to a transformation engine, there may be performance benefits also.

This approach has been taken by more modern ETL tools like Sunopsis, which is explicitly ELT in nature. Intriguingly, Informatica added an ELT option in Powercenter 8 (called the "PowerCenter 8 pushdown optimization"), which suggests that this approach indeed is gaining traction. So far, good on Sunopsis for taking the ELT approach, which I believe is inherently superior to ETL in most cases. It will be interesting to see whether Ascential also respond in a future release.

Tuesday, March 14, 2006

The unbearable brittleness of data models

An article in CRM Buyer makes an important point. It highlights that a key reason why customer data integration projects fail is the inflexibility of data model that is often implemented. Although the article turns out to be a thinly disguised advert for Siperian, the point is very valid. Traditional entity-relationship modeling typically is at too low level of abstraction. For example courses on data modeling frequently give examples like "customer" and "supplier" as separate logical entities. If your design is based on such an assumption, then applications based on this will struggle if one day a customer becomes a supplier, or vice versa. Better to have a higher level entity called "organization", which can have varying roles, such as customer or supplier, or indeed others than you may not have thought of at the time of the modeling. Similarly, rather than having an entity called "employee" it is better to have one called "person", which itself can have a role of "employee" but also other roles, perhaps "customer"for example.

This higher level of data modeling is critical to retaining flexibility in systems, removing the "brittleness" that so often causes problems in reality. If you have not seen it, I highly recommend a paper on business modeling produced by Bruce Ottmann, one of the world's leading data modelers and whose work has found its away into a number of ISO standards. Although Bruce works for Kalido, this whitepaper is not specific to Kalido but rather discusses the implications of a more generic approach to data models.

I very much hope that the so-called "generic modeling" approach that Bruce recommends will find its way into more software technologies. Examples where it does are Kalido and Lazy Software, and, although in idea rather than product form, in the ISO standard 10303-11, which covers a modeling language called Express that can be used to represent generic data models. It came about through work originated at Shell and then extended to a broader community of data modelers, including various academics, and was particularly aimed at addressing the problem of exchanging product models; it is known as STEP. However the generic modeling ideas developed with this have much broader application than product data. Given the very real advantages that generic modeling offers, it is to be hoped that more software vendors pick up on these notions, which make a real difference to the flexibility of data models, and hence improve the chances of projects, such as CDI projects, actually working in practice.

Monday, March 13, 2006

ETL moves into the database

With SQL Server 2005 Microsoft has replaced its somewhat limited DTS ETL offering with SQL Server Information Services (SSIS), which is compared with IBM's offerings (based on the Ascential acquisition). I have written previously about the shrinking of the ETL vendor space, and the enhanced Microsoft offering will merely accelerate this. Oracle has its Warehouse Builder technology (despite its name this is really an ETL tool) as well as Microsoft and IBM, and as these tools improve it will be tough for the remaining ETL vendors. Informatica has broadened into the general data integration space, and seems to be doing quite well, but there are not many others.

Sunopsis is innovative with its "ELT" approach which sensibly relies on, rather than competes
with, the native DBMS capabilities, but it remains to be seen how long it can flourish, given that the DBMS ETL capabilities will just keep getting better and eat away at its value. The surreal Ab Initio is reportedly doing well at the high volume end, but given its secretive nature it is hard to say anything with certainty about this company other than its business practices and CEO are truly eccentric (a fascinating account of its predecessor Thinking Machines can be found at the following link). Data Junction has a strong reputation and is OEMed by many companies (it is now part of Pervasive Software). There are a few other survivors, like ETI, who have just recapitalised their company after struggling for some years , but it is hard to see how ETL can remain a sustainable separate market in the long term. Indeed Gartner has recently stated that they will are to drop their "magic quadrant" for ETL entirely.

The future of ETL would appear to be in broader offerings, either as part of wider integration software or as just a feature of the DBMS.

Thursday, March 09, 2006

The hollowing out of ERP

Now that there are effectively two enterprise ERP vendors bestriding the world, it may seem that they can just sit back and count the spoils. Both have huge net profit margins derived from their market leadership, so it may seem churlish to contemplate their eventual demise, yet a number of factors are combining that should cause a few flutters in Redwood City and Walldorf. Consider for a moment what a transaction system application such as ERP actually does, or used to do:
  • business rules/workflow
  • master data store
  • transaction data store
  • transaction processing
  • user interface
  • (and perhaps some business content e.g. pre-built reports)
This edifice is under attack, like a house being undermined by termites. Transaction processing itself has long been mostly taken care of elsewhere, by old-fashioned TP monitors like IBM CICS or by new-fashioned TP monitors like BEA's Weblogic or IBM Websphere. These days there are alternate workflow engines popping up, like Biztalk from Microsoft, or even a slew of open source ones. Moreover, more than half of the ERP functionality purchased is unused. The storage of data itself is of course done in the DBMS these days (though SAP tries hard to blur this line with its clustered table concept). As the idea of separate master data hubs catches on e.g. customer data hubs like Siperian's, or product data hubs, or more general ones, and the serving up of such data is possible through EAI technology, then this element too is starting to slip away from the ERP vendors. The user interface for update screens should hardly be that complicated (though you'd never guess it if you have ever had the joy of using SAP as an end user), and these days can be generated from applications e.g. from a workflow engine or a master data application. This does not leave a great deal.

If, and it is a big if, SOA architecture takes off, then you will also be able to plug in your favorite cost allocation module (say) from a best of breed vendor, rather than relying on the probably mediocre one of your ERP supplier. Combine this with the emergence of "on demand" hosted ERP services from emerging companies like Ataio and Intacct as alternatives, and the vast ERP behemoth looks a lot less secure up close than it may do from a distance. If the master data hubs and business workflow engines continue to grow in acceptance and chip away further at key control points of ERP vendors, then at some point might it be reasonable to ask: exactly what is it that I am paying all those dollars to ERP vendors for?

This line of reasoning, even if it is very early days, explains why SAP and Oracle have been so anxious to extend their product offerings into the middleware space, with Netweaver and Fusion respectively. This is also what SAP has been trying to falteringly launch an MDM application (the rumor is that after the botched initial SAP MDM, the buy-in of A2i isn't going that well either; maybe a third attempt is in the works?) and Oracle has been keen to promote its customer hub.

Of course it is too soon to be writing the obituaries of ERP yet, but a combination of evolving technologies is starting to illuminate a path for how you would eventually migrate away from dependence on the giant ERP vendors, rather than endlessly trying to consolidate on fewer vendors, and fewer instances of each. Now that would be radical thinking.

Wednesday, March 08, 2006

Information as a service?

I see in our customer base the stirrings of a movement to take a more strategic view of corporate information. At present there is rarely a central point of responsibility for a company's information assets; perhaps finance have a team that owns "the numbers" in terms of high level corporate performance, but information needed in marketing and manufacturing will typically be devolved to analysts in those organizations. Internal IT groups may have a database team that looks after the physical storage of corporate data, but this group rarely have responsibility for even the logical data models used within business applications, let alone how those data models are supposed to interact with one another. Of course things are complicated by the fact that application packages will have their own version of key data, and may be the system of record for some of it. Yet how to take a view across the whole enterprise?

organizationally, what is needed is a business-led (and not IT-led) group with enough clout to be able to start to get a grip on key corporate data. This team would be responsible for the core definitions of corporate data, its quality, and being the place that people come to when corporate information is needed. In practice, if this is not to become another incarnation of a 1980s data dictionary team, then this group should also have responsibility for applications that serve up information to multiple applications, and this last point will be an interesting political battle. The reason that such a team may actually succeed this time around is that the technologies now exist to avoid the "repository" (or whatever you want to call it, of master data being a passive copy. Now the advent of EAI tools, enterprise buses, and the more recent master data technologies (from Oracle, Kalido, Siperian, IBM etc) mean that master data can become "live", and synchronized back to the underlying transaction systems. Pioneers in this area were Shell Lubricants and Unilever, for example.

However technology is necessary, but not sufficient. The team needs to be granted ownership of the data, this notion sometimes being called "data stewardship". Even if this ownership is virtual, it is key that someone can arbitrate disputes over whose definition of gross margin is the "correct" one, and who can drive the implementation of a new product hierarchy (say) despite that fact that such a hierarchy touches a number of different business applications. It is logical that such a group would also own the enterprise data warehouse, since that (if it exists) is the place where much corporate-wide data ends up right now. This combination of owning the data warehouse and the master data hub(s) would allow infrastructure applications to be developed that can serve up the "golden copy" data back to applications that need it. The messaging infrastructure already exists to allow this to happen.

A few companies are establishing such groups now, and I feel it is a very positive thing. It is time that information came out if its back-room closet and moves to centre stage. Given the political hurdles that exist in large companies, the ride will not be smooth, but the goal is a noble one.

Monday, March 06, 2006

Broaden your horizons

At a talk at the recent TDWI show, consultant Joshua Greenbaum, an analyst with Enterprise Applications Consulting (who?) managed to bemoan the cost of data warehouses, but then demonstrates a seeming lack of knowing exactly what one is by claiming that the alternative is to do "simple analyses of transactional data". Well Joshua, that is called an operational data store, and indeed it has a perfectly respectable role if all you want to do is to look at a single operational system for operational purposes. However a data warehouse fulfils quite a different role: it takes data from many different sources, allows analysis across these inconsistent sources and also should provide historical context e.g. allowing comparisons of trends over time. You can't do these things with an operational data store.

Hence it is not a case of "ODS good, data warehouse bad" - instead both structures have their uses. Of course Joshua is right in saying that data warehouse success rate is not great, but as I have written elsewhere, it is not clear whether data warehouse projects are really any worse than IT projects in general (admittedly, that is not setting the bar real high). Perhaps Joshua was misquoted, but I would have expected something more thoughtful from someone who was an analyst at Hurwitz. Admittedly he was an ERP (specifically SAP) analyst, so perhaps has a tendency to think of operational things rather than things wider than ERP. Perhaps he is suffering from the same disease that seems to affect people who spend too much time on SAP.

Interest in MDM grows

Last week I was a speaker at the first CDI (customer data integration) conference, held in San Francisco. Although the CDI institute (set up by Aaron Zornes, ex META group) started off with customer data integration, looking at products like Siperian and DWL, the general movement towards MDM as a more generic subject has overtaken it, and indeed Aaron mused in his introductory speech whether they may change the title to the MDM Institute. For a first conference it was well attended, with 400 people there and supposedly 80 being turned away due to unexpectedly high demand. There were the usual crowd of consultants happy to advise expertly on a topic they had never heard of a year ago. Most of the main MDM vendors put in an appearance e.g. IBM/Oracle/I2 (but no SAP) as well as specialists like Siperian and Purisma, plus those like HP who just have too big a marketing budget and so have a booth everywhere, whether or not they have a product (those printer cartridges generate an awful lot of profit).

The conference had a rather coin-operated feel, as sponsoring vendors duly got speaker slots in proportion to the money they put in - with IBM getting two plenary slots, but there were at least a few customer case studies tucked away amongst the six concurrent conference tracks. My overall impression was that MDM is a bit like teenage sex: everyone is talking about it, people are eager to know all about it but not that many are actually doing it. As time passes and MDM moves into adolescence there will presumably less foreplay and more consummation.
Further conferences are planned in London, Sydney and Amsterdam, demonstrating if nothing else that plenty of vendors are willing to pay Aaron to speak at the shows.