Wednesday, May 31, 2006

Woods and Trees

Janet Kersnar writes a well balanced article which highlights some of the ways in which business intelligence has the potential to become more strategic, but sensibly also points out some of the barriers to this occurring. Certainly today most BI solutions deployed are within departmental silos, and it is difficult for most companies to get a true enterprise-wide view. The insight that an up-to-date, cross-departmental view of business performance can give may lead to dramatic benefits, so we see a renewed interest in deploying data warehouse and related technology to address this problem. Companies that are acquisitive have even greater difficulty than most, since each time they buy a company, it may take years for that company's systems and data to be fully integrated into the new corporation. Again, the latest business intelligence technologies and approaches can help here.

However, as the article rightly points out, it is easy to get carried away with the latest tools, and overlook the very real issues getting people to buy in to new systems. One of the case studies acknowledges that it is easy to go crazy with BI technology and end up with an unfathomable morass of data: "Our stores get ranked on well over 150 metrics on a daily basis.", which is clearly a recipee for inaction and confusion. All the modern technology in the world will not deliver benefit unless it is married to useful business metrics, and unless the business users of the information are fully engaged in the process.

Tuesday, May 30, 2006

Are the media revolting?

Joshua Greenbaum writes a thoughtful piece on the clash of the "new media" (blogs, wikis etc) with the mainstream media. He correctly concludes that revolutions rarely go in the directions that are originally intended, and he comes down on the side of the mainstream media camp, who he predicts will subsume the newer media. I agree with his analysis. It is exciting to see new content appearing in blogs on many subjects, but if you actually want to know whether something is true you'd be advised to look at the BBC or CNN. It is positive that the barriers to entry to creating content have dropped away, but media brands will be critical in ensuring reliable, truthful content, as distinct from individuals just spouting off on their latest hobbyhorses.

In fact very few industries have been really demolished by the internet. I heard that there are 10% less people working as travel agents than a few years ago, but there aren't too many others that spring to mind. Even that despised breed, realtors (estate agents in the UK) who essentially just control privileged information, are still very much in business. If the internet couldn't displace them, what chance does it have with journalists?

Friday, May 26, 2006

Data warehouse architectures

Rick Sherman writes an interesting article about how data warehousing, despite being quite venerable in IT terms, is still poorly understood. He makes a good point, discussing various typical implementation approaches and how thee fail to get to the "single version of the truth" dream. Let's consider for a moment a few architectural choices:

(a) direct access (EII)
(b) data marts only - no pesky warehouse
(c) a single warehouse for the enterprise
(d) a federation of linked warehouses.

The first approach is limited to only a small subset of the reporting needs, and is insufficient to meet most enterprise reporting requirements. To have only single subject data marts was still surprisingly commonly advocated as late as the mid 1990s (born mainly out of the frustration of lengthy or failed data warehouse projects) yet pretty clearly is not going to scale for a company of any size. The sheer number of combinations of data sources required to build the marts means that the problem of resolving inconsistency is being done every time a mart is built, rather than being dealt with in the warehouse, so each mart either becomes a major project in itself, or (more likely) people just give up and go with some data source without getting a complete or even accurate picture.

The single giant warehouse certainly has a lot of appeal, as it resolves the semantic differences of source systems just once, allowing dependent data mars to be deployed easily. The trouble is one of practicality: for a large corporation the sheer scale of the task is scary. Large enterprises have hundreds (and usually thousands if they are counting properly) of applications where data is being captured, and these applications are often duplicated by country or major business lines. Hence the sheer scale of getting hold of all these sources and bring them into line is going to be a massive challenge. In the cases of certain industries (retail, Telco, retail banking) the scale of the data itself is also daunting, bring major technical challenges.

Hence for any large corporation it seems to me that a federated warehouse approach is what you will end up with, whether you like it or not. Few companies will have the energy or resources to deliver the single giant warehouse, and even those few that do will, in reality, have a series of skunk works data marts/warehouses dotted around the corporation since such a behemoth warehouse will be a bottleneck, hard to change and inevitably slow to respond to rapidly changing business needs.

The most pragmatic approach would seem to me to acknowledge this reality and architect for a federated approach, rather than staying in denial. It is practical to build a warehouse for either a country-level subsidiary (or groups of countries) or each business line, let that deal with the needs of that particular country or business line, and then link these together to a global warehouse which deals at the summary level. The global warehouse does not need to store every transaction in the enterprise; at that level you need to know what the sales were in Germany yesterday by product, channel and perhaps customer, but not that a particular customer bought a specific item at 14:25 at a store in Rhine-Westphalia. The detailed information like this is the domain of the country-level warehouse. Because the transaction detail is not needed at the enterprise level, you avoid the problems of technical scale that may otherwise occur, and only deal with the data that makes sense to look at across the enterprise as a whole.

Wednesday, May 24, 2006

Putting a little glitz into data warehousing

Data warehouse technology is rarely associated with the glamorous world of mergers and acquisitions, usually the domain of sharp-suited investment bankers, late night board meetings, top lawyers and sometimes bizarre behaviour (see Barbarians at the Gate). But once the deal is signed, the party is over and the bankers and lawyers get their modest fees, what happens next? You can be sure that there is a hard-pressed person, possibly the CFO, who is put in charge of delivering all the vast synergy benefits that were promised by the chief executives to their shareholders. Do not envy this person. According to Deloitte: "Between 50-70% of mergers fail to deliver shareholder value after the deal." Moreover, to emphasize just how important quick results are, according to Accenture: "For an acquirer expecting to reap $500 million in yearly cost savings from an M&A transaction, a one-month delay reduces the net present value of the deal by more than $150 million (assuming a 10 percent cost of capital). A seven-month delay costs nearly $1 billion in lost value, or approximately $3.5 million per day."

Given this type of background, it is perhaps understandable that the first thought from the woefully underpaid consultants from the big systems integrators is not "let's build a data warehouse then". Yet understanding the cross-enterprise picture is immediately critical realizing benefits. For example, when HBOS merged, one of the key areas for quick savings was identified as procurement. Yet to just pick one of the existing procurement systems, switch off the other and convert all the data from one to the other was estimated at taking well over a year. Instead, what they did was to implement a packaged data warehouse, map the two sets of data from each bank, and in this way get a single view of the procurement spend across torganizationion without having to convert all the data in the underlying systems. This was achieved in just three months, giving an immediate view of post-merger procurement that allowed huge business savings to achieved. For more on this award-winning project click here.

In this way HBOS cleverly avoided a common trap, neatly summed up by McKinsey: "To succeed, a merger requires the smooth integration of IT systems and services, but the task often plunges the CFO responsible for ensuring the savings into uncharted territory. Confronted by an immediate technical challenge, companies typically choose one of two questionable routes. Some, fearing costs and complexity, never fully integrate their acquisition's systems and thus gain few synergies. Others focus on the promise of synergy gains and improved performance but, in their haste, simply choose one system over another, often alienating both customers and employees."

Other companies that have successfully used a data warehouse in this fashion are Shell, Intelsat, Unilever and Cadbury Schweppes. What is critical in such cases is the need for the warehouse to be able rapidly deployed, so that the business can see quick results. For those toiling away on an unrewarding data warehouse project, remember that next time your company buys another, a data warehouse could be a key part of the solution. Just ask HBOS.

But how do I explain MDM to a business user?

There is an excellent article today by Ventana analyst Dave Waddington on how to tell if you have a master data management problem in your company. He sets out no fewer than 17 symptoms that would indicate that your master data is not fully under control. The beauty of this article is that it takes a business viewpoint and lists a series of different issues that will resonate with business executives; so many articles on MDM are written by people who have a pure technology problem, but Dave is that rare breed: someone is an expert in technology who worked for many years at Unilever, so has excellent business grasp. Dave also happens to have an unusually sharp mind.

His checklist is an excellent way of engaging with business people to try and put across the concepts of master data management in language that they will understand, rather than discussing hubs and metadata repositories. There has been much written on how difficult it is to justify master data initiatives, and yet if you run through this list of potential issues it should be possible to at least estimate a dollar cost associated with these problems, which is the first step to justifying a project that the business will support. The sorts of issues listed e.g.

"You struggle to determine total product sales to global customers"

is exactly the kind of problem that I recall the business struggling with when I worked at Shell. Shell sold products to Ford motor company, which of course in reality trades under multiple subsidiaries, and moreover different IT systems have different codes to describe "Ford". This is not just an abstract issue: an account can be lost if the customer does not feel that you are able to deal with them consistently on a global basis, yet doing so is a major challenge for most companies. Working out the loss of revenue if a major global account defected to a rival should rapidly justify an MDM initiative.

Tuesday, May 23, 2006

Complexity conservation

There is a thoughtful article by Mike Garrett which has the original notion that the amount of complexity in an information system is constant, and that you can only move around the complexity e.g. by putting more effort into the back-end system to shield some complexity from the end user. His article is discussing SAP BW but I like the idea. It is certainly true that in a truly bespoke reporting system everything is shielded from the user, but that requires huge IT resources, while just plonking a complex reporting tool on the user's desktops and hoping they will make sense of the data reduces demand on IT but is pretty much doomed because must users won't be able to navigate the complexity of the database (especially for something as complex as SAP, with 32,000 tables and counting).

However I do believe that a reasonably happy medium can be found if a layer is presented to the end user that uses their own terminology and hides the physical database implementation. This, after all, was what made Business Objects so successful with its "semantic layer". However this certainly passed some complexity back to the IT department, who then had to spend significant effort in building and maintaining the Business Objects "universes". If you go one step further and have a data warehouse that is driven from a business model, then this itself can generate a meaningful environment (such as a Business Objects universe) from the warehouse. This is what Kalido does, for example. Again, it is fair to say that the complexity has not entirely disappeared, but now some effort is needed to build the business model in the data warehouse. However where I think the article's neat idea falls down is to assume that the magnitude of the complexity is always the same. It seems clear to me that there is more effort in customizing every report to an individual user than there is in delivering an environment like a Business Objects universe (or even a carefully built Excel pivot table) which gives the end user a fair degree of freedom of formatting a and exploration. There us still less effort involved if you push the effort back one layer into the warehouse, since the warehouse can then generate multiple Business Objects universes, and not require each one to be customized by hand. Hence the further back in the stack you start the business modeling, the more ripple-through benefits you get by having less separate things to customize by hand. The end user still gets a flexible environment in terms that are meaningful, but there is less effort in total in delivering this environment. Further benefit would occur if all the operational systems were business-model driven, but this is just a pipe-dream today.

Complexity is one thing that should not be conserved if at all possible. Driving things from a business-model as far back in the stack as practical won't make complexity extinct, but may at least make it a little endangered.

Friday, May 19, 2006

The weakest data link

There is a thoughtful article in McKinsey quarterly on managing supply chains. It highlights the problem that even if you have perfectly consistent and accessible information in your company, in many situations e.g. with mobile phone, there is a web of separate companies between the designer and the customer e.g.

components supplier -> distributor -> ODM -> OEM -> distributor -> customer

Each of these is dependent to some extent on the other, and so if you want to know how your sales are going or how is product quality, you will want to interact with information from other companies further back in the chain. This presents the problem that the systems in other companies will not use the same terminology and coding structures as yours, meaning that you will need to resolve these differences in some way e.g. through a data warehouse project. The article points out that in many cases companies have not built these links and so have no visibility up and down the supply chain. This information is not just nice to have:

"Bridging these gaps pays off. In one case, a leading enterprise-computing company started gathering better data from field services, which gave it information on the incidence of failures and their costs. By feeding that data to design teams, the company developed products that could be serviced and repaired more easily. The result: total costs over the product life cycle fell by 10 to 20 percent."

Clearly such savings are worth having. The article is an excellent illustration that the issues of dealing with multiple semantics are not confined to internal systems, and indeed in such cases standardization is literally unattainable. Instead software solutions are required that can map multiple business structures together and make sense of them. Companies that invest in such data warehouse solutions are, as this article shows, getting very tangible results.

Enter the Dragon

I spent the last two weeks on holiday in China. Apart from the awesome sense of history (the Great Wall is 3,000 miles long and was completed n 220 BC) it was intriguing to get a sense of one of the world's two great emerging economies. Shanghai was striking in this regard. In just 20 years since Deng Xiaoping's reforms Shanghai has been transformed into the most dynamic of ultra-modern cities. There is a striking symbolism in standing on the Bund (one side of the city's Huangpu river) amongst the fine 1920s and 1930s building built mainly by the British, and looking out across the river at the future. On the opposite bank is Pudong, a sort of Canary Wharf on steroids, a city of gleaming steel and glass. The sheer scale of Pudong is best appreciated from the Grand Hyatt hotel, the tallest hotel in the world at 1,380 feet. From either the 54th floor lobby or the 88th floor (8 is a lucky number in Chinese culture) bar you look out across at the old Shanghai, but also at the forest of skyscrapers that is Pudong. A quarter of the world's cranes are at work here, to give some sense of scale. The desire to create an image of progress is epitomized by the Maglev train, which whisks you from the town to the international airport at a top speed of 266 mph (431 km/h). It can go at 311 mph (501 km/h), but at its slower cruising speed still does the 30 km journey in well under eight minutes. Symbols are important, and the Maglev stands in striking contrast to the shambolic infrastructure of India's airports and trains. India does have the key advantage of widely spoken English, but China's modern infrastructure wins hands down. One danger to Western companies is also apparent in the Maglev. Built on German technology, China now intends to build a far longer Maglev track to Hangzhou, but will build it on Chinese technology: quick learners, or intellectual property theft? Conversations I had when in China suggested that intellectual property rights are an alien notion in China, at least for now; our guide in Beijing ran a web site selling fake Rolex watch mechanisms which can be made up into expensive replica watches. He was simply bewildered at the notion that there could be anything wrong with this.

However, despite this, China is now the world's largest exporter of hi-tech products. When you are there you can sense the sheer dynamism of the place in the air. As an example, just today Teradata announced that their new R&D centre was to be based in Beijing.

Wednesday, May 03, 2006

A short intermission

I am just off on vacation for a couple of weeks, so the blog will be quiet for a while. Normal service will be resumed on my return.

Searching for an MDM strategy

I saw a curious article called "Informatica addresses master data management" in which I expected some sort of product announcement or acquisition that would launch Informatica into the MDM space. Yet you can scour the article for as long as you like in search of anything resembling a product announcement. It seems that Informatica "supports" MDM, which is fair enough in that they of course provide one of the main data integration technologies out there, and so indeed can move master data (amongst other data) around. However they had the exact same technology yesterday, so exactly what had changed?

It seems to me that Informatica is crying out for an MDM strategy of some kind, perhaps via a partnership or even an acquisition (though most of the juicy MDM titbits like Razza have already been gobbled up). Given that Informatica has data quality capability via the Similarity Systems acqusition, and its focus on data integration, it would be a natural extension to move into MDM proper. So, when will the other shoe drop?

Tuesday, May 02, 2006

CDI compared to other master data

There is a good article on CDI by Jill Dyche, a co-founder of Baseline Consulting and someone who has clearly seen a lot of real-world CDI projects. She does a good job of explaining how CDI projects have traditionally been quite transaction-oriented, with hubs serving up customer data via middleware to other applications. CDI hubs are at one end of the MDM spectrum, firmly at the "operational" level. At the other end are "analytic" MDM applications, which enable companies to take a cross-enterprise view of key information like assets, people, products, channels etc. Getting to understand the differences between the multiple, conflicting definitions embedded in the source systems is a major job in itself, and will usually result in a master data repository. This in turn can be a feed into a corporate warehouse. A few pioneering companies have taken the final logical step and hooked up their master data repositories, via middleware like Tibco or IBM Websphere, to their operational systems, so that the master data repository becomes the true master source, driving changes as required back down into the operational systems like ERP and CRM.

CDI hubs have started at the other end, linking up to systems providing customer data, often in real-time. Customer data represents a high-value area of MDM, as in the case of consumers the customer data is often quite simple, but is in high volume, and requires fairly simple processing to match a customer record in one system to one in another (e.g. matching "A. Hayler" v "Andy Hayler"). However, this is only part of the answer, as even in the case of "customer" things can get more complex. Suppose you are a company like Shell and you want to treat Unilever as a key global account. Finding out all the information about Unilever is not just a simple keyword matching exercise, since Unilever trades under many different subsidiary names and brands around the world e.g. its main Indian subsidiary is not called Unilever but Hindustan Lever; it also owns a company called Algida, and I defy even the cleverest fuzzy logic algorithm to associate "Algida" with "Unilever" (such examples are why you should always be sceptical about vendors selling matching algorithms) It can be seen that, for more complex situations like this, human intervention is required in order to correctly add up all the element of Unilever's business.

This issue can become considerably more complex with things like "asset" or "product", which can have a whole hierarchy of sub-types. This is why CDI hub technology tends to be used specifically for consumer information. Other types of MDM technology are required to manage more complex data and the workflows that surround the updating this e.g. no automated system is going to just create a new brand; this requires numerous approvals and has various knock-on effects to other master data.

I would argue that, at least at present, you are likely to require one kind of technology to handle general purpose MDM data, whether customer or asset or whatever, from an analytical viewpoint, and potentially a separate technology to handle real-time updates, perhaps real-time. Of course it would be nice if a single product did everything, but at present nobody can truly claim this. What does seem a missed opportunity is the way that vendors have made their technology so very specific to particular types of master data e.g. PIM and CDI. While operational and analytic needs are inherently different, there is no reason at all not to take a generic approach to all types of master data. Customers can hardly be expected to buy a separate hub for every type of master data.

One more TLA to remember

EIM is a recent Gartner market positioning which is an umbrella term for business intelligence, master data management and content management. While there is a certain inevitable "not another acronym" reaction, this particular one makes quite a lot of sense to me. Gartner have sensibly made the term explicitly cover business processes rather than just technology, so that data governance and stewardship would be part of this broad area. As the Gartner notes say, data integration is at the heart of this.

I think this is positive because the industry has taken an overly technology-centric perspective this so far. Technologies such as ETL are necessary but not sufficient to deliver a broad-based understanding of corporate information. I have observed some forward-looking companies setting up new organizations to manage information, staffed with mainly business rather than IT staff. The groups have the remit to cover the provision of data as a service to the rest of the enterprise, and so they have to worry about data quality, data warehouses, master data, integration middleware and all the processes that go along with it: indeed, this is pretty much a definition of what EIM is all about. Taking a holistic, business-led approach is the right thing to do, since providing high quality, timely data requires a level of business ownership that cannot just be delegated to the internal IT department, or out-sourced to India. The various supporting technologies need to do just that: support business rather than being ends to themselves.

It will be interesting to see how this new terminology catches on, but I think it has legs since it seems to me to incorporate a lot of common sense.