Thursday, June 29, 2006

A multiple view of the truth

Ragy Thomas highlights the plight of marketers in his recent article in DM News. As he points out, marketers are usually poorly served by corporate IT systems. A critical thing for them is to get a good understanding of their customers, is that they can craft carefully targeted campaigns, yet corporate CRM systems have failed to deliver the much talked about, yet elusive "single customer view". He sets out a sensible approach which is essentially a recipe for master data management in the specific context of customer data. In particular he talks about the need to document the existing data sources, to map the differences between them, to set up processes to improve data quality and then to consider how to deal with integrating and automating the processes.

I would add that such an initiative should sit within the broader context of an enterprise-wide approach to master data management, else we will see a new generation of silos developing, with a customer hub, a product hub, a hub for various other types of data, all in themselves duplicating and so having to be synchronized with data in underlying transaction systems. Nobody wants to see different technologies and approaches used for the marketing data, for the finance master data, for the supply chain data etc. That marketing departments are having to resort to such initiatives shows just how hollow the "single view of the customer" promises from giant application vendors have turned out to be.

Wednesday, June 28, 2006

Operational BI and master data

In an article in the beye network Mike Ferguson makes an interesting observation which many seem to have missed. A current "theme" is that business intelligence needs to be embedded into operational systems. This innocent sounding notion is of course entirely unrelated to the need for BI vendors to sell more licenses of their software in what has become a slightly saturated market. As I have written about earlier, it is by no means clear that everyone in an organization needs a BI tool, whatever the wishes of BI vendor sales staff. So trying this new tack of bundling BI in with operational systems (which many people do need, or at least have to use) is a cunning ploy. However, as Mike Ferguson notes, if this is done without a basis for common master data and common business rules, then all that will happen is that we will get lots of competing BI reports, all with different numbers and results. The whole notion of data warehousing was created in order to resolve the differences in definitions that exist between operational systems. Each separate ERP implementation will usually have slightly different data, never mind all the other operational systems. By going through a process of trying to get to an enterprise-wide common subset of definitions (Mike calls it a "shared business vocabulary") then these differences can be understood, and data quality improved back in the source systems. Without such an underlying basis we merely have snapshots of operational data without resolving the data quality issues, and without resolving the inconsistencies. In other words, we will be pretty much back where we were before data warehouses.

There certainly may be valid examples where it makes sense to embed some simple BI capability on the top of operational data, especially in the context of operational reporting where you are only interested in the data in that particular system. However as soon as you want to take a cross functional or enterprise view, then that pesky data inconsistency and data quality has to be dealt with somehow. Putting complex BI tools in the hands of the front-line staff doing order entry is not going to resolve this issue -it may just confuse it further.

Tuesday, June 27, 2006

Classifying MDM products

In his article on the IBM MDM conference Philip Howard makes the distinction between different MDM approaches, which is useful but I feel does not go quite far enough. As he says, at one extreme you have "hubs" where you actually store (say) customer data, and hope that this is treated as the master source of such data (a non trivial project then usually ensues). At the other end you have a registry, which just lists the various places where master data is stored and maps the links to the applications which act as the master, and a repository, which goes further in picking out what he calls the "best record" and is sometimes called the "golden copy" of data. The distinction between a repository and a regiistry is a subtle one which Bloor makes. I feel that there are other aspects which are useful to categorize though, beyond just the storage mechanism. Some MDM products are clearly intended for machine to machine interaction, synchronizing master data from a hub back to other systems e.g. DWL (now bought by IBM). However there are other products which focus on the management of the workflow around master data (managing drafts, authorizing changes, publishing a golden copy etc), and so deal more with human interaction around master data. Kalido MDM is one example of the latter. This is another dimension which it is useful to classify tools by, since a customer need for synchronized customer name and address records between operational systems is very different from workflow management.

The article notes that IBM does not score well in the recent Bloor report on MDM, but hopes for better things in the future. Certainly IBM did something of a shopping spree once they decided to tackle MDM, and bought a PIM product, a CDI product and a few others while they were at it, so it is perhaps not surprising that it is difficult to see an overall strategy. I absolutely concur with Philip Howard in that MDM needs to be treated as an overall subject and not artificially segmented by technologies that deal with product, customer or whatever. In one project at BP we manage 350 different types of master data, and it is hard to see why a customer can reasonably be expected to buy 348 more technologies to go beyond product and customer. This example illustrates the absurdity of the technology per data type approach which is surprisingly common amongst vendors.

Software is hard to rearchitect, and customers should always look carefully at vendor claims of some overall gloss on top of multiple products, compared to something which was designed to handle master data in a generic fashion in the first place.

Monday, June 26, 2006

Mergers and Measurement

Margaret Harvey points out in a recent article that the effort of integrating the IT systems of two merged companies can be a major constraint and affect the success of the merger. Certainly this is an area that is often neglected in the heat of the deal. But once the investment bankers have collected their fees and an acquisition or merger is done, what is the best approach to integrating IT systems? What is often missed is that, in addition to different systems e.g. one company might use SAP for ERP and the other Oracle, the immediate problem is that the two companies will have completely different coding systems and terminology for everything, from the chart of accounts, through to product and asset hierarchies, customer segmentation, procurement supplier structures and even HR classifications. Even if you have many systems from the same vendor, this will not help you much given that all the business rules and definitions will be different in the two systems.

To begin with the priority should be to understand business performance across the combined new entity, and this does not necessarily involve ripping out half the operational systems. When HBOS did their merger, both Halifax and Bank of Scotland had the same procurement system, but it was soon discovered that this helped little in taking a single view of suppliers across the new group given the different classification of suppliers in each system. To convert all the data from one system into the other was estimated to take well over a year, but instead they put a data warehouse system in which mapped the two supplier hierarchies together, enabling a single view to be taken even though the two underlying systems were still in place. This system was deployed in just three months, giving an immediate view of combined procurement and enabling large savings to be rapidly made. A similar appraoch was taken when Shell bought Pennzoil, and when Intelsat bought Loral.

It makes sense initially to follow this approach so that a picture of operating performance can quickly be made, but at some point you will want to rationalize the operational systems of the two companies, in order to reduce support costs and eliminate duplicated skill sets. It would be helpful to draw up an asset register of the IT systems of the two companies, but just listing the names and broad functional areas of the systems covered is only of limited use. You also need to know the depth of coverage of the systems, and the likely cost of replacement. Clearly, each company may have some systems in much better shape than others, so unless it is case of a whale swallowing a minnow, it is likely that some selection of systems from both sides will be in order. To be able to have a stab at estimating replacement costs, you could use a fairly old but useful technique to estimate application size: function points.

Function points are a measure of system "size" that does not depend on knowing about the underlying technology used to build the system, so applies equally to packages and custom-build systems. Once you know that a system is, say, 2000 function points in size, then there are well established metrics on how long it costs to replace such a system e.g. for transaction systems, a ballpark figure of 25-30 function points per man month can be delivered, which does not really seem to change much whether it is a package or in-house. Hence a 2000 function point transaction system will cost about 80 man-months to build or implement, as a first pass estimate. MIS systems are less demanding technically than transaction systems (as they are generally read only) and better productivity figures can be be achieved here. These industry averages turned to be about right when I was involved in a metrics program at Shell in the mid 1990s. At that time a number of Shell companies counted function points and discovered productivity of around 15 - 30 function points per man month delivered for medium sized transaction systems, irrespective of whether these were in-house systems or packages. Larger projects had lower productivity, smaller projects have higher productivity, so delivering a 20,000 function point system will be a lot worse than a 2,000 function point system in terms of productivity i.e. fewer function points per man month will be delivered on the larger system. Counting function points in full is tedious and indeed is the single factor that has relegated it to something of a geek niche, yet there are short cut estimating techniques that are fairly accurate and are vastly quicker to do that counting in full. By using these short-cut techniques a broadly accurate picture of an application inventory can be pulled together quite quickly, and this should be good enough for a first pass estimate.

There are a host of good books that discuss project metrics and productivity factors which you can read for more detailed guidance. The point here is that by constructing an inventory of the IT applications of both companies involved in a merger you can get a better feel for the likely cost of replacing those systems, and hence make a business case for doing this. In this way you can have a structured approach to deciding which systems to retire, and avoid the two parties on either side of the merger just defending their own systems without regard to functionality or cost of replacement. Knowing the true costs involved of systems integration should be part of the merger due diligence.

Further reading:

Software Engineering Economics
Controlling Software Projects
Function Points

Friday, June 23, 2006

Conferences and clocks

Those who are getting on a bit (like me) may recall John Cleese's character in the 1986 movie Clockwise, who was obsessed with punctuality. I am less neurotic, but what does distress me is when conference organizers let their schedule slip due to speaker overruns. I speak regulalrly at conferences, and this is a recurring problem. At a conference in Madrid a few weeks ago they managed to be well over an hour behind schedule by the time they resumed the afternoon session, while the otherwise very useful ETRE conferences are famed for their "flexible" schedule. At a large conference this is beyond just irritating, as you scramble to find speaker tracks in different rooms, all of which may be running to varying degrees behind schedule and starting to overlap.

This poor timekeeping is depressingly normal at conferences, which makes it all the nicer when you see how it should be done. I spoke yesterday at the IDC Business Performance Conference in London, which had an ambitious looking 14 speakers and two panels squeezed into a single day. If this was ETRE they would have barely been halfway through by dinner time, yet the IDC line-up ran almost precisely to time throughout the day. It was achieved by the simple device of having a clock in front of speaker podium ticking away a countdown, so making it speakers very visibly aware of the time they had left. I recall a similar device when I spoke at a Citigroup conference in New York a couple of years ago, which also ran like clockwork.

The conference was a case study in competent organization, with good pre-event arrangements, an audio run-through for each speaker on site, and speaker evaluation forms (some conferences don't even bother with this). The attendees actually bore a distinct resemblance to those promised, both in quality and number; recently some conference organizers seem have had all the integrity of estate agents when quoting expected numbers. The day itself featured some interesting case studies (Glaxo, Royal Bank of Scotland, Royal Sun Alliance, Comet) and a line-up of other speakers who mostly managed to avoid shamelessly plugging their own products and services (mostly). Even the lunch time buffet was edible.

In terms of memorable points, it seems that the worlds of structured and unstructured data are as far part as ever based on the case studies, whatever vendor hype says to the contrary. Data volumes in general continue to rise, while the advent of RFID presents new opportunities and challenges for BI vendors. RFID generates an avalanche of raw data, and a presenter working with early projects in this area reckoned that vendors were completely unable to take advantage of RFID so far. Common themes of successful projects were around the need for active business sponsorship and involvement in projects, the need for data governance and stewardship and for iterative approaches giving incremental and early results. Specific technologies were mostly (refreshingly) in the background in most of the speeches, though the gentleman from Lucent seemed not to have got the memo to sponsor speakers about not delivering direct sales pitches. With Steve Gallagher from Accenture reckoning that BI skills were getting hard to find, even in Bangalore, it would suggest that performance management is moving up the business agenda.

Well done to Nick White of IDC for steering the day through so successfully. If only all conferences ran like this.

Wednesday, June 21, 2006

Wilde Abstraction

Eric Kavanagh makes some very astute points in an article on TDWI regarding abstraction. As he rightly points out, a computer system that models the real world will have to deal with business hierarchies such as general ledgers, asset hierarchies etc that are complex in several ways. To start with there are multiple valid views. Different business people have a different perspective on "Product" for example: a marketer will be interested in the brand, price and packaging, but from the point of view of someone in distribution the physical dimensions of the product are important, in what size container it comes in, how it should be stacked etc. Moreover, as Eric points out, many hierarchies are "ragged" in nature, something that not all systems are good at dealing with.

The key point he makes, in my view, is that business people should be presented with a level of abstraction that can be put in their own terms. In other words the business model should drive the computer system, not the other way around. Moreover, as the article notes, if you maintain this abstraction layer properly then historical comparison becomes possible e.g. comparing values over time as the hierarchies change. Indeed the ability to reconstruct past hierarchies is something that I believe is increasingly important in these days of greater regulatory compliance, yet it is often neglected in many systems, both packages and custom-built. The key points he makes on the value of an abstraction layer:

- the abstraction layer shields the application from business change
- business-model driven, with the ability to have multiple views on the same underlying data
- time variance built in
- the layer can be a platform for master data management

neatly sum up the key advantages of the Kalido technology, and indeed sums up why I set up Kalido in the first place, since I felt that existing packages and approaches failed in these key areas. It is encouraging to me that these points are starting to gain wider acceptance as genuine issues that the industry needs to better address if it is to give its customers what they really need. To quote Oscar Wilde "There is only one thing in the world worse than being talked about, and that is not being talked about." I hope these key issues, which most designers of computer systems seem not to grasp, get talked about a lot more.

Monday, June 19, 2006

How to eat an elephant

Robert Farris makes some good observations in his recent article in DM Review. He points out that many companies have ended up with business intelligence being distributed throughout the company e.g. in various subsidiaries and departments, and this makes it very difficult to take a joined up view across the enterprise. As he notes, disjointed initiatives can result in poor investments. Hence it is critical to take an overall view to business intelligence, yet to do so is such a large task that it seems daunting.

In my experience there are a number of things that can at least improve the chances of an enterprise-wide BI initiative succeeding. It sounds like motherhood and apple pie, but the project needs business sponsorship if it is to succeed; IT is rarely in a position to drive such a project. However that statement on its own is of limited help. How can you find this elusive sponsorship?

The place to start is to find the people that actually have the problem, which is usually either the CFO or the head of marketing. The CFO has the job of answering the questions of the executive team about how the company is performing, so knows what a pain it is to get reliable numbers out of all of those bickering departments and subsidiaries. The head of marketing is the one who most needs data to drive his or her business, usually involving looking at trends over time and involving data from outside the corporate systems, and this is usually poorly provided for by internal systems. The CEO might be a sponsor, but often the CFO will be carefully feeding impressive looking charts to the CEO to give the impression that finance is in control of things, so the CEO might not be aware of how difficult data is to get hold of. The head of operations or manufacturing is another candidate, though this person may be too bogged down in operational problems to give you much time. If there is someone responsible for logistics and supply chain then this is often a fruitful area. Sales people usually hate numbers unless it is connected with their commissions (where they demonstrate previously unsuspected numerical ability), and HR usually doesn't have any money or political clout, so marketing and finance are probably your best bet.

So, you have a sponsor. The next step is to begin to sort out the cross-enterprise data that actually causes all the problems in taking a holistic view, which is these days being termed master data. If you have multiple charts of accounts, inconsistent cost allocation rules, multiple sources of product definition or customer segmentation (and almost all companies do) then it this is a barrier in the way of your BI initiative succeeding. There is no quick fix here, but get backing to set up a master data management improvement project, driven by someone keen on the business side. Justifying this is easier than you may think.

In parallel with this you will want a corporate-wide data warehouse. Of course you may already have one, but it is almost certainly filled with out of data data of variable quality, and may be groaning under a backlog of change requests. If it is not, then it is probably not being used much and may be ripe for replacement. To find out, do a health check. There is a bit of a renaissance in data warehouses these days, and these days you can buy a solution rather than having to build everything from scratch.

In truth your company probably has numerous warehouses already, perhaps on a departmental or country basis, so it is probably a matter of linking these up properly rather than having to do everything from the beginning. This will enable you to take an iterative approach, picking off areas that have high business value and fixing these up first. Once you can demonstrate some early success then you will find it much easier to continue getting sponsorship.

In one of the early Shell data warehouse projects I was involved with we had a very successful initial project in one business line and subsidiary, and this success led to a broader roll-out in other countries, and then finally other business lines came willingly into the project because they could see the earlier successes. This may seem like a longer route to take, but as noted by Robert Farris, this is a journey not a project, and if you start with something vast in scope it will most likely sink under its own weight. Much better to have a series of achievable goals, picking off one business area at a time, or one country at a time, delivering incrementally better solutions and so building credibility with the people that count: the business users.

Elephants need to be eaten in small chunks.

The Informatica recovery story

The data integration market has previously split between the EAI tools like Tibco and Webmethods, and the ETL space with tools like Informatica and Ascential (now part of IBM). The ETL space has seen significant retrenchment over recent years, with many of the early pioneers being bought or disappearing (e.g. ETI Extract still lives on, but is practically invisible now). Mostly this functionality is being folded into the database or other applications e.g. MSFT with SSIS (previously DTS) and Business Objects having bought ACTA. Still in this space are Sunopsis (the only "new" vendor making some progress) and older players like Iway and Pervasive, whose tools are usually sold inside other products. Others like Sagent and Constellar have gone to the wall.

The integration market is surprisingly flat, with Tibco showing 9% growth last year but a 10% shrinkage in license revenues, while Webmethods grew just 4%, with 1% growth in license revenues. Hardly the stuff investor dreams are made of. BEA is doing better, with 13% overall growth last year and 10% license growth, but this is still hardly stellar. Informatica is the odd one out here, having extracted itself from its aberrant venture into the analytics world and now having repositioned itself as a pure play integration vendor. It had excellent 31% license growth and 27% overall growth last year. The logical acquisition of Similarity Systems broadens Informatica's offering into data quality, which makes sense for an integration vendor. When IBM bought Ascential some pundits reckoned the game would be up for Informatica, but so far that is not proving the case at all.

Friday, June 16, 2006

Microsoft builds out its BI offerings

A week ago Microsoft announced Performance Point Server 2007. This product contains scorecard, planning and analytics software, and complements the functionality in Excel and in its SQL Server Analysis and Reporting Services tools. With Proclarity also within the Microsoft fold now, it is clear that Microsoft is serious about extending its reach in the BI market.

I have argued for some time that rivals Cognos and Business Objects should be a lot more worried about Microsoft than about each other in the long term. Most business users prefer an Excel-centric environment to do their analysis, and as Microsoft adds more and more ways into this it will be increasingly uncomfortable for the pure-play reporting vendors. As ever, Microsoft will go for high volume and low price, so will probably never match BO or Cognos in functionality, but that is not the point. Most users only take advantage of a fraction of the features of a BI tool anyway.

Microsoft is playing a long game here, and the pure-play tools will continue to do well in what is an expanding market. But the ratchet just got tightened another notch.

Wednesday, June 14, 2006

Uniting Data

In an article in DM Review Malcolm Chisholm discusses different types of metadata. He sets out a definition which distinguishes between metadata, master data and reference data (separate from “transaction activity” data). I believe that the argument is flawed in several important ways.

Firstly, I believe that the distinction between metadata, master data, enterprise structure data and reference data as made in the article is actually spurious. One point made about master data is the notion that “Customer A is just Customer A” and here is not more to it than that. However, to the account manager looking after the customer there is a complex semantic which needs data to define it. Well, what if that customer is, say: “Unilever”. There is all kind of embedded meaning about the definition of Unilever that is not directly implied by the row itself, but is defined elsewhere e.g. is that the whole Unilever group of companies, or Unilever in the US, a Unilever factory or what? This type of definitional problem occurs to row level entries just as it does to the generic class of things called “customer”. Master data can have semantic meaning at the row level, just as can “reference data” as used in the article. This point is illustrated further if we use the article’s own example of this: the USA having multiple meanings. Both are valid perspectives for the USA but they are different things – they are defined and differentiated by the states that make them up i.e. their composition. This is the semantic of the two objects.

The article seems to want to create ever more classification of data, including “enterprise structure data”. It argues that “Enterprise structure data is often a problem because when it changes it becomes difficult to do historical reporting”. This is really just another type of master data. The problem of change can be dealt with by ensuring that all the data like this (and indeed all master data) has a “valid from” and “valid to” date. Hence if an organisation splits into two, then we want to be able to view data as it was at a point in time: for example before and after the reorganisation. Time stamping the data in this way addresses this problem; having yet another type of master data classification does not help.

The distinction between “reference data” and “master data” made in the article seems to be both false and also misleading. Just because “volumes of reference data are much lower than what is involved in master data and because reference data changes more slowly” in no way means that it needs be treated differently. In fact, it is a very difficult line to draw, since while typically master data may be more volatile, “reference data” also can change, with major effect, and so systems that store and classify it need to be able to expect and to deal with these changes.

In fact, one man’s transaction is another man’s reference data. A transaction like "payment" has Reference data like Payment Delivery, Customer, Product, Payment Type. A transaction
Delivery from the point of view of a driver might consist of Order, Product, Location, Mode of Delivery. Similarly an "order" could be viewed by a clerk as Contract, Product, Customer, Priority. Where is the line between Master and reference data to be drawn??

The article argues that identification is a major difference between master and reference data, that it is better to have meaningful rather than meaningless surrogate keys for things, which he acknowledges is contrary to perceived wisdom. In fact there are very good reasons to not embed the meaning of something in its coding structure. The article states that: “In reality, they are causing more problems because reference data is even more widely shared than master data, and when surrogate keys pass across system boundaries, their values must be changed to whatever identification scheme is used in the receiving system.”

But this is mistaken. Take the very real word example of article numbering. The Standard Industry codes (SIC) European Article Number (EAN) codes, which are attached to products like pharmaceuticals to enable pharmacists to uniquely identify a product. Here a high level part of the key is assigned e.g. to represent the European v. the US v. Australian e.g. GlaxoSmithKlien in Europe, and then the rest of the key is defined as Glaxo wishes. If the article is referred to by another system e.g. a supplier of Glaxo, then it can be identified as one of Glaxo’s products. This is an example of what is called a “global or universal unique identifier” (GUID or UUID), and for which indeed there are emerging standards.

A complication is that when the packaging changes, even because of changed wording on the conditions of use, then a new EAN code has to be assigned. The codes themselves are structured, often considered bad practice in the IT world, but the idea is to ensure global uniqueness and not give meaning to the code. Before Glaxo Welcome and SmithKlienBeacham merged they each had separate identifiers and so the ownership of the codes changed when the merger took place.

Another point I disagree with in the article is “we will be working with a much narrower scope” in the first paragraph. Surely we are trying to integrate information across the company to get a complete perspective. It is only small transactional applets which only need a worms eye view of what they are doing

The article says “Reference data is any kind of data that is used solely to categorize other data in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise”. But someone in the organization does have to manage this data even if it comes from outside the company and that person’s transaction may be the set up of this data and making it available to others.

For example, consider the setting of a customer’s credit rating. Someone in Finance has to review a new customer’s credit rating against a list of externally defined credit ratings say from D&B. Someone in the company spends time lobbying D&B (or parliament/congress) to have additional credit classifications. (the article defines them as Gold, Silver, Bronze etc. But D&B call them AAA, AA etc.). Data is always created through someone carrying out some business function (or transaction) even standards have to be managed somewhere.

A good example of this type of external data where a computer system is used to support the process is the Engineering parts library. It uses the ISO 15926 standard. It is a collaborative process between specialists from multiple engineering companies. It is a high level classification scheme which is used to create a library of spare parts for cars, aircraft, electronics etc. This is a changing world and there are always new and changing classifications. Groups of engineers who are skilled in some engineering domain define the types and groups of parts. One group defines pumps, another piping. Someone proposes a change and others review it to see if it will impact their business, it goes through a review process and ultimately gets authorized as part of the standard.

This example is about reference data, in the terms of the article, but it clearly has the problem the article attributes to master data. There are multiple versions and name changes and a full history of change has to be maintained if you wish to relate things from last year with things for this year.

The artiicle has an example concerning the marketing department’s view of customer v. accounts view of customer. It says this is a master data management issue and is semantic but this doesn’t apply to reference data. It clearly does relate to reference data. (see definition of USA above) and the ISO example above. But what is more important is that the issue can be resolved for both master and reference data by adopting the standards for integration defined in ISO 15926. Instead of trying to define customer in a way that satisfies everyone it is best to find what is common and what is different. Customers in both definitions are Companies – it is just that some of then have done business with us and others have not (yet). Signed up customers are a subset of all potential customers.

At the end of the section on The Problem of Meaning the article says “These diverse challenges require very different solutions” then in the section on Links between Master and Reference data it says “If there is a complete separation of master and reference data management, this can be a nightmare” and then says “we must think carefully about enterprise information as a whole”. I agree with this final statement but it is critical that we do not put up artificial boundaries and try to solve specific problems with some generic rules which differentiate according to some rather arbitrary definition such as Master and Reference data.

The line between master and reference data is really fuzzy in the definition used. Clearly “Product” is master data but I if have a retail gasoline customer which has only three products (Unleaded, Super and Diesel) I guess that means this is reference data. The engineering parts library classification scheme is a complex structure with high volumes (1000’s) of classes so that makes it master data but it is outside the company so does that makes it reference data?

In summary, the article takes a very IT-centric transactional view of the world. By trying to create separate classifications where in fact none exist, the approach suggested, far from simplifying things, will in fact cause serious problems if implemented, as when these artificial dividing lines blur (which they will) then the systems relying on them will break. Instead what is needed is not separation, but unity. Master data is master data is master data, whether it refers to the structure of an enterprise, a class of thing or an instance of a thing. It needs to be time-stamped and treated in a consistent way with other types of master data, not treated arbitrarily differently. Consistency works best here.

I am indebted to Bruce Ottmann, one of the world's leading data modelers, for some of the examples used in this blog.

Tuesday, June 13, 2006

Through a rule, darkly

David Stodder makes a good point in an article in intelligent Enterprise. Business rules take many forms in a large corporation but today they are quite opaque. Rules that define even basic tersm like "gross margin" may not only be buried away in complex spreadsheet models or ERP systems, but are in practice usually held in many different places, with no guarantee of consistency. I know of one company where an internal audit revealed twenty different definitions of "gross margin", and that was within just one subsidiary of the company! In these days of stricter compliance such things are no longer merely annoying.

My observation is that business customers need to take ownership of, and be heavily engaged with, any process to try and improve this situation. It cannot be an IT-driven project. It is not critical whether the ultimate repository of this is a data warehouse, a master data repository or some different business rules repository entirely, but it is key that the exercise actually happens. At present the opaquenes and lack of consistency of business rules is not something that most companies care to own up to, yet it is a major controls issue as well as a source of a great deal of rework and difficulty in presenting accurate data in many contexts.

I was amused by the readership poll quoted that said that 61% of respondents say that they have "no standard process or practice" for business rules management. This might imply that 39% actually did, a number I would treat with considerable caution. Personally I
have yet to encounter any that does so on a global basis.

Wednesday, June 07, 2006

A marketing tale

Marketing is a tricky thing. One lesson that I have begun to learn over time is that simplicity and consistency always seem to triumph over a more comprehensive, but more complex story. Take the case of Tivo in the UK. A couple of my friends bought Tivo when it first appeared in Britain and started to have that kind of scary, glazed expression normally associated with religious fanatics or users of interesting pharmaceutical products. I then saw a cinema ad for Tivo and it seemed great: it would find TV programs for you without you having to know when they were scheduled - how cool was that?! It would learn what programs that you liked and record them speculatively for you; you then ranked how much you liked or disliked them and it would get better and better at finding things you enjoyed. You could turn the whole TV experience from being a passive broadcast experience into one where you effectively had your own TV channel, just with all your favorite programs. Oh, and it looked like you could skip past adverts, though of course the Tivo commercial politely glossed over that.

Well, I bought one and I was like a kid in some kind of store. I soon acquired the same crazed look in my eyes as my fellow Tivo owners, and waited smug in the knowledge that I was at the crest of a wave that would revolutionize broadcasting. My friend at the BBC confirmed that every single engineer there was a Tivo fanatic. And then: nothing happened. Those BBC engineers, myself and a few others constituted the entire UK Tivo market - just 30,000 boxes were sold in the UK. Eventually Tivo gave up and, although Tivo is still (just about) supported in the UK, you can't even buy Tivo 2, or even a new Tivo 1 except on eBay.

What happened? The message was too complex. Years later Sky caught on to the DVR concept and brought out the vastly functionally inferior Sky+. How did they advertise it? They just showed a few exciting clips with the viewer freezing and then replaying: "you can replay live TV" was all that was said. This was a fairly minor option on a Tivo that the Tivo commercial barely mentioned, yet it was simple to understand. Sky+ sales took off, and myself and some BBC sound engineers are left with our beloved Tivos, praying that they don't go wrong. It is another Betamax v VHS story, but this time the issue was a marketing one. Tivo still limps on in the US, still growing slowly in subscriber numbers through sheer product brilliance (helped by being boosted on "Sex in the City"), but has clearly not fulfilled its potential.

What this little parable should teach us is that a key to successful marketing is simplicity, stripping everything down to the core thing that represents value to the customer, and then shutting up. With a simple message people can describe the product to their friends or colleagues, and so spread the word. With a complex, multi-part message they get bogged down and so cannot clearly articulate what the product does at its heart. It is so tempting to describe the many things that your product does well, but it is probably a mistake to do so. Find the one core thing that matters to customers, explain this as simply as possible, and repeat as often and as loudly as you can.

Size still isn't everything

Madan Sheina, who is one of the smarter analysts out there, has written an excellent piece in Computer Business Review on an old hobby horse of mine: data warehouses that are unnecessarily large. I won't rehash the arguments that are made in the article here (in which Madan is kind enough to quote me) as you can read it for yourself but you can be sure that bigger is not necessarily better when it comes to making sense of your business peformance: indeed the opposite is usually true.

Giant data warehouses certainly benefit storage vendors, hardware vendors, consultants who build and tune them and DBAs, who love to discuss their largest database as if is was a proxy for their, er, masculinity (apologies to those female DBAs out there, but you know what I mean; it is good for your resume to have worked on very large databases). The trouble is that high volumes of data make it harder to quickly analyse data in a meaninfgul way, and in most cases this sort of data warehouse elephantitis can be avoided by careful consideration of the use cases,probably saving a lot of money to boot. Of course that would involve IT people actually talking to he business users, I won't be holding my breath for this more thoughtful approach to take off as a trend. Well done Madan for another thoughtful article.

Revenue recognition blues

Cognos shares have slid nearly 20% in recent weeks as an SEC probe into their accounting continues. The questions raised are in the notoriously tricky area of US GAAP rules, specifically on "VSOE" (or vendor specific objective evidence) which determine how much revenue can be credited for a deal in current figures, and what amount should be deferred. The post-Enron climate has ushered in a much harsher review of software industry practices than was normal in the past, and such esoteric sounding accounting rules can seriously impact a company, as Cognos is now seeing.

Word in the market is that the underlying business is actually quite robust at present, so hopefully this will be a blip for the company rather than anything more serious. Cognos 8 means that there is quite a lot of potential for Cognos to gain revenue as customers upgrade to the new software, which features much better integration between ReportNet and Powerplay, and a complete revamp of Metrics Manager, which is retitled Metrics Studio. These improvements should see Cognos customers steadily upgrading, and so having a positive impact on the company's already pretty healthy finances. However perhaps some more conservative interpretation of US GAAP on their part would be wise.

Tuesday, June 06, 2006

MDM: comply

James Kobielus makes an important point regarding master data management - its role in compliance. We know that large companies today generally have many different versions of master data scattered around their organizations: 11 different definitions of "product" on average for example, according to one survey from research firm Tower Group. This of course makes any business performance management question hard to answer: "how much of product X was sold last week" is tricky to discover if there are eleven systems that think they are the master source for information of products. However it may be worse than that: if you are having to produce some report for reasons of regulatory compliance, then such ambiguity may have serious consequences.

In the article James says that "without an unimpeachable official system of records, your lawyers will have to work twice as hard to prove your organization is complying with the letter of the law". Of course the lawyers won't be too troubled about that (all those juicy billlable hours) but business executives certainly need to consider the possible compliance implications of poor master data, as well as its consequences elsewhere.

Monday, June 05, 2006

The patter of tiny pitfalls

There are some sensible tips from Jane Griffin on MDM pitfalls in a recent article. As she points out, improving your master data is a journey, not a destination, so it makes sense to avoid trying to boil the ocean and instead concentrate on a few high priority areas, perhaps in one or two business units. It would make sense to me to start by identifying areas where MDM problems were causing the most operational difficulties e.g. misplaced orders. By starting where there is a real problem you will have less difficulty in getting business buy-in to the initiative. Be clear that there are lost of different types of master data e.g. we are involved with a project at BP which manages 350 different master data types, and clearly some of these will be more pressing an issue than others.

I have seen some articles where people are struggling to justify an MDM initiative, yet really such initiatives should be much easier to justify than many IT projects. For a start IT people can put the issues in business terms. Master data problems cause very real, practical issues that cost money. For example poor or duplicated customer data can increase failed deliveries, and issues with invoicing. Poor product data can result in duplicated marketing costs, and in some cases even cause issues with health and safety. Problems with chart of accounts data can delay the time needed to close the books. These are all things that have a cost, and so can be assigned a dollar value to fix.

Successful MDM projects will be heavily business-led, driven by the need to improve operational performance. IT staff need to educate business people that there are now an emerging set of solutions that can help, and get those business people involved in owning the data. It is the lack of data governance in many companies that contributed to the poor state of master data in the first place.

Thursday, June 01, 2006

How healthy is your data warehouse?

Not all data warehouses are created equal. Indeed both custom-built and some packaged data warehouse products can have surprising limitations in terms of their functionality. Just as I referred recently to Dave Waddington's excellent checklist of things that would indicate a master data management problem, I would like to propose a series of questions that could be used to assess the depth of functionality of your data warehouse, whether it is custom built or packaged. For this list I am indebted to Dr Steve Lerner (until recently IS Director, Global Finance Applications and Integration at pharmaceutical firm Merial), who was kind enough to set out a series of symptoms that he had found indicated a problem with a data warehouse application. What I like about these is that they are all real business problems, and not a series of features defined by a software vendor or database designer. They are as follows.

1. Do you have difficulty conducting what-if analysis for a variety of business or product or geographical hierarchies?

2. Would it be hard for your current system to determine the impact of a business organization change on Operating Income?

3. Would it be hard for your current system to determine the impact of realigning geographical associations on regional profitability estimates?

4. Do you have difficulty restating historical data?

5. Can you view historic data using both a time-of-transaction basis and a current basis?

6. Can you currently restate historical data using new account structures?

7. Do you have difficulty viewing composites of data from sources with different granularities along key dimensions (i.e., comparing daily sales for a month, to forecast sales done monthly, to your annual profit plan, and to a five year long-range projection)?

8. Do you have difficulty with "bad data" getting into your current data warehouse?

9. Do you have difficulty maintaining the accuracy of your reference data?

10. Do you have difficulty with traceability from source to report?

So, how did your data warehouse application score? If it did not do well (i.e. failed on several of these ten points) then you should be concerned, because business users are likely to do exactly these types of things with the data in the warehouses, if not today then at some point. When they struggle, they will come looking for you.

A potential application of this checklist would be identify the best and worst data warehouses in your company. This type of "health check" could be useful in prioritising future investment e.g. it may highlight that some systems are in urgent need of overhaul or replacement. If you work in an IT department then going to business users with this kind of health check could be seen as being very pro-active and enhance the IT department's credibility with the business. If you are a systems integrator then creating a process for measuring the health of a data warehouse along these lines could be a useful tool that could be sold as a consulting engagement for clients.

Casting spells

I write this column using some software called Blogger, which is fairly simple to use but is rather limited in some ways, so I am probably going to switch to a more flexible blog editor soon. However one cause of constant entertainment is the Blogger spell check function, which almost makes the thing worthwhile. My typing is erratic at best, so I frequently encounter the Blogger spell checker. At first I found its eccentric suggestions annoying, or even inept, but now I find that they have a certain charm of their own. It takes me back to the early days of word processing, when spell checkers were crude, and their alternative suggestions for one's typographical errors were sometimes wildly inappropriate. Blogger's spell checker recalls that era, as it presents sometimes surreal suggestions for what to a human eye is a pretty easy mistake to spot. For example, if you misspell:

"management" as "managemnet"

then you are presented with two alternatives. Its best guess is "mincemeat", which is somehow appropriate in a couple of cases of managers I can recall, but not really a very likely error. Its only other attempt is "mankind". This is not an isolated case. If you write about federated databases then it is endearing to see the typo:

"federaion" have the two alternatives: "bedroom" or "veteran"

proposed by the beastie. "Bedroom"? I would love to understand the algorithm that came up with that one. I was also impressed by:


Instead of the pretty obvious "performance" it rather sweetly suggests "peppermints".

However my favorite is that if you type "Blogger" as a phrase then not only does it not recognize it. The term "blog" also sadly is a complete mystery to it, which might seem an omission given that it is intended as a spell checker for, er, blogs, or perhaps "blocs" as the spell checker so helpfully proposes. For "blogger"then it suggests the wonderfully ironic:


How true, how true. I would be interested to hear of your worst spell check horror, or indeed of a spell checker whose ineptness rival Blogger's. Any offers?