Wednesday, November 30, 2005

The supply chain gang

There is a thoughtful article today by Colin Snow of Ventana in Intelligent Enterprise. In it he points out some of the limitations today in trying to analyze a supply chain. At first sight this may seem odd, since there are well established supply chain vendors like Manugistics and I2, as well as the capabilities of the large ERP vendors like SAP and Oracle. However, just as with ERP, there are inherent limitations with the built-in analytic capabilities of the supply chain vendors. They may do a reasonable job of very operational level of reporting ("where is my delivery") but struggle when it comes to analyzing data at a broader perspective ("what are my fully loaded distribution costs by delivery type"). In particular he hits the nail on the head as to one key barrier: "Reconciling disparate data definitions". This is a problem even within the supply chain vendors' software, some of which have grown through acquisition and so do not have a unified technology platform or single data model underneath the marketing veneer. We have one client who uses Kalido just to make sense out of data within I2's many modules, for example.

More broadly, in order to make sense of data across a complete supply chain you need to reconcile information about suppliers with that in your in-house systems. These will rarely have consistent master data definitions i.e. what is "packed product" in your supply chain system may not be exactly the same as "packed product" in you ERP system, or within your marketing database. The packaged application vendors don't control every data definition within an enterprise, and the picture worsens if the customer needs to work with external suppliers more closely e.g. some supermarkets have their inventory restocked by their suppliers when stocks go below certain levels. Even if your own master data is in pristine condition, you can be sure that your particular classifications structure is not the same as any of your suppliers. Hence making sense of the high level picture becomes complex since it involves reconciling separate business models. Application vendors assume that their own model is the only one that makes sense, while BI vendors assume that such reconciliation is somehow done for them in a corporate data warehouse. What is needed is an application-neutral data warehouse in which the multiple business models can be reconciled and managed, preferably in a way that allows analysis over time e.g. as business structures change. Only with this robust infrastructure in place will the full value of the information be able to be exploited by the BI tools.

Tuesday, November 29, 2005

The shrinking band of ETL players

Informatica has had a couple of good quarters and is about the last independent ETL player left standing, now that Ascential has disappeared into the IBM Websphere maw. The only other players out there now are the quirky Ab Initio in the high volume niche, but hamstrung by surreal business practices (customers must sign an NDA even to get a demo) and, er, well that's about it. Sunopsis has taken a smart approach by using the capabilities within the database engines, but who else is left? Sagent had the last rites read before being bought by Group 1, while early pioneer ETI extract seems to have shrunk almost to oblivion, at least in terms of market presence. The trouble is that the features that ETL tools provided are increasingly being built into the underlying database engines, as Microsoft has with DTS (which it is about to revamp substantially in SQL Server 2005). This makes it harder for companies to justify a high price tag for more functional tools. Informatica has correctly abandoned its mad strategy of "analytics" and has broadened into EAI, where there is a larger market. Even this market is pretty competitive though, with Tibco battling it out with IBM Websphere, and with third-placed Web Methods actually shrinking in revenue in 2004. The remaining ETL vendors can be comforted by the fact that a lot of companies still "do it yourself", and so there are plenty of opportunities still for sales since the market is a long way from being saturated. However the pricing pressure that the "free" offerings in the database engines puts on vendors make it a struggle to make a good living. This pressure will increase as these capabilities become more functional and scalable.

Monday, November 28, 2005

Open up your email - it's the feds!

I recall a few days ago being sent an email that was so transparently obviously a virus that I thought "who on earth would click on such an obviously dodgy looking attachment?" It was an email from the FBI (yeah, right, email being the FBI's most likely form of communication to me) saying that "your IP address had been logged looking at illegal web sites" and then inviting you to click on an attachment of unknown file type asking you to "fill in a form about this activity". I'm guessing that if the FBI were troubled about my web site browsing they would be more likely to burst through the front door than send an email. At the time I chuckled to myself and deleted it, but apparently millions of people actually did decide to fill in the form about their dubious web browsing, immediately infected their PCs with the "sober" virus.

Against this depressing demonstration of human gullibility was at least the entertainment value of the deadpan statement from the real FBI, which read: "Recipients of this or similar solicitations should know that the FBI does not engage in the practice of sending unsolicited e-mails to the public in this manner." Quite so.

Wiping the slate CleAn?

When a company or organization changes its name, you know that its troubles are big. A few days ago Computer Associates made the radical name change to "CA" (I congratulate the brand naming people for their imagination; I imagine their fee was suitable modest). This follows a series of accounting scandals that has accounted for most of the executive team and has its previous CFO pleading guilty to criminal charges and previous CEO Sanjay Kumar now leaving the company. The name change is in a fine tradition of hoping to cover up the past: in the UK we had a nuclear power station called Windscale which had an unfortunately tendency to leak a bit, and a safety reputation that would have troubled Homer Simpson. A swift bit of PR damage control and voila - Sellafield was born. We all felt much safer that day, I can tell you. Of course some name changes are just good sense e..g Nintendo was previously called Marafuku (I kid you not).

However the big question for CA is whether its rebirth will be superficial or deep-rooted. CA was the body-snatcher of the IT industry, picking up ailing companies that had decent technology and maintenance revenues for a song, stripping out most of the costs and milking the maintenance revenue stream. It does have some strong systems management technologies like Unicenter, and is a very large company, with USD 942 million in revenues last quarter. However its famously antagonistic relationships with its customers have not helped as it has had to weather a series of scandals and management changes. Jim Swainson of IBM is the latest person to have the role of turning the company around in his new role as CEO. I wish him luck, as a company with troubles that deep-seated will not be fixed by a PR blitz and a name change.
I hope he has better fortune than PWC did with "Monday".

Friday, November 25, 2005

Overcoming (some of) the BI barriers

A new survey from from Gartner has some interesting findings. Business intelligence in its broadest sense has moved from #10 in CIO priorities to #2, quite a jump. Spending in this area is set to rise sharply, with companies on average spending 5-10% of their overall software budgets on BI, but with some sectors such as finance spending 16% of their software budgets on business intelligence (more can be found in research note G00129236 for those of you who are Gartner clients). This is obviously good news for vendors in the space, but it seems to me that CIOs have been very slow to grasp that providing business insight is surely a high priority for their customers. Once the Y2k madness was over and everyone had rushed in their shiny new ERP, CRM and supply chain systems, just what else was it that CIOs should be doing rather than exploiting the wealth of information now being captured by these systems? CIOs are increasingly under pressure to deliver value to their companies, and what better way to do this than by providing improved insight into the performance of the company's operations? Surely there is more bang for the buck for a CIO here than in fiddling with new middleware or upgrading the network, activities in which most business people have little interest and regard as "business as usual". Anyway, the penny now seems to be dropping, so given it is finally on their radar CIOs should also consider what will stop them delivering this value, with its inherent career-enhancing kudos.

The main barriers to adoption of business intelligence, in Gartner's view, are:
  • a lack of skills
  • difficulty in getting business intelligence out of enterprise applications (e.g. ERP)
  • perceived high cost of ownership
  • resistance due to some managers viewing enterprise-wide transparency as a threat to their personal power
I recently wrote on this last point. Let's consider the others.

The skills one is the easiest: "organizations lack the analytical skills, causing them to have difficulty using the tools." The answer is that most people in a business simply do not need a sophisticated analytical tool. It is not their job to be creating whizzy charts or mining through data - most business managers just need a regular report telling them their key performance information e.g. production throughput, sales figures etc. This requires at most Excel, and probably not even that. As I have argued elsewhere, the answer to BI vendors selling you thousand of copies of their software is simple: just say no, to quote Nancy Reagan. In my experience perhaps 5%-10% of end users of data warehouse applications actually need true ad hoc analytic capability - the rest need a report or at most an Excel pivot table. Putting a complex, powerful but unfamiliar tool on business people's desks and then wondering why usage rates are low is a self-inflicted problem.

The second barrier is the dffifcult in getting BI linked to enterprise applications. This is a real issue, with the big application vendors either providing weak capability or, where they do, tying it too heavily to their own data structures. While there is a place for operational reporting, enterprise-wide performance management requires information from a wide range of sources, some of it in spreadsheets and external data. Obsessed with trying to broaden their own footprint, application vendors seem unable to let go and realize that customers have a diverse set of applications and are not going to simply ditch everything they have from everyone else. The answer here is to adopt an enterprise data warehouse approach that is separate from the application vendors and neutral to the applications. Leave the operational reporting to the app vendors if you must, but at the enterprise level you need something that is truly application-neutral. Cutting this link, and using app vendors analytic tools for what they are good at, rather than trying to shoe-horn them into roles they were never designed for, will save you a lot of pain and suffering here.

The third issue is cost of ownership, and here too the issue is very real. Recent work by The Data Warehouse institute (TDWI) shows that data warehouses have very high cost of maintenance and support. Indeed according to major TDWI survey, the annual support costs are 72% of the implementation costs, on average. For people used to traditional "15% of build" costs this may seem outlandish, but it is not. The reason that maintenance costs are often around 15% of build costs for transaction systems is that this is roughly the amount of code that is impacted by change each year. Most transaction systems are operating in a specific area of the business that does not change radically every week, so much of the application code is stable. By contrast, a data warehouse (by definition) takes as sources many different transaction systems, and every one of these is subject to change. So if you had a data warehouse with six sources, each of which had a 15% degree of change per year, then your warehouse is subject to a 6 * 15% = 90% level of change stress. Of course this is too simplistic, but you can see the general idea: with many sources, each undergoing some change, the warehouse encounters more change issues than any specific transaction system. Hence custom-built data warehouses do indeed have these very high lavels of maintenance costs. There is not a lot to be done about this unless you use a data warehouse design specifically built to address this issue, but unfortunately the mainstream approaches (3NF, star schema, snowflake schema) do not.

So to summary, you can take steps to circumvent at least three of Gartner's four barriers. The fourth one involves tackling human nature, which is a more difficult problem that software design or architecture.

Thursday, November 24, 2005

Pure as the driven data

In a recent conference speech, IDC analyst Robert Blumstein had some interesting observations about linking business intelligence applications to corporate profitability. Noting how many business decisions are still made based on spurious, incomplete or entirely absent data, he notes that "It's easier to shoot from the hip, in many ways". I found this comment intriguing because it echoes similar ones I have heard before in my corporate career. I remember one of my managers saying that many corporate managers didn't seek data to support their decisions because they felt that using their "judgment" and "instincts" were mainly what they were being paid for. This syndrome was summarized elegantly by historian James Harvey Robinson, who said: "Most of our so-called reasoning consists in finding arguments for going on believing as we already do."

I personally believe that there are very, very few managers who are so gifted that their instincts are always right. The world was always a complex place, but it is ever more so now with a greater pace of change in so many ways. Hence I believe that being "data driven" is not only a more rational way of responding to a complex world, but that it will lead to greater success in most cases. As the economist John Maynard Keynes said on being questioned over a change of his opinion: "When the facts change, I change my mind -- what do you do, sir?". I have observed that the most impressive managers I have seen are prepared to modify their decision in the face of compelling new information, even if that contradicts their "experience", which was often built up many years ago in quite different situations.

Making business decisions is hard, all the more so in large organizationss where there are many moving parts. There are many insights that good quality data can give that contradict "experience". One customer of ours discovered that some of their transactions were actually unprofitable, which had never come to light since the true costs of manufacturing and distribution were opaque prior to their implementing a modern data warehouse system. All the people involved were experienced, but they were unable to see their way through the data jungle. In another customer, what should have been the most profitable product line in one country was also being sold at a loss through one channel, but again the true gross margin by channel was opaque prior to their new data warehouse system; in this case it was a problem of a poorly designed commissionn plan that was rewarding salesmen on volume rather than profitability. "Data driven" managers will seek to root out such business anomaliess through the analysis of hard data, fact rather than opinion.

It is often noted that data warehouse projects have a high failure rate. Of course there are many reasons for this, such as the difficulty most have in keeping up with businesss change, and the vagaries that beset any IT project. Yet could part of the problem be that, at least in some cases, the people for whom the systems are supposed to provide informationn simply would prefer to wing it?

Tuesday, November 22, 2005

Why are most job interviews so bad?

We have all been to job interviews, but has it struck you how remarkably random the whole process is? Some companies put in effort in but I recall interviews where the person interviewing me was clearly bored, had been asked to fill in for someone else, didn't really know what the job was about etc. If it is a big company they might have a session about the company e.g. I recall as a graduate going to an interview at Plessey and hearing about the pension plan for an hour; just what every 21 year old is dying to listen to. On the other side of the fence, most of us will have interviewed some surreal people. I had one guy who was clearly on drugs, and one CFO candidate who considered that answering my accounting questions was "beneath him". My colleague had one candidate for a technical author who, when asked about his prior work experience, jumped up, grabbed and opened the large suitcase he was carrying, revealing a cloud of dust, a horrible musty smell and two maintenance manuals for the ejector seat of an aircraft, which he proceeded to read out loud.

Does it really have to be this way?

I have been studying this in some depth recently, and was pleased to find that there is at least some science around. If you look at various selection techniques, it turns out that "unstructured interviews" i.e. the ones most of us are used to, are actually a pretty dismal way to select people. A major 2001 study looked into various selection techniques and tracked back performance at selection e.g. how good their interview was v how well the candidates were performing in the job a few years later. It turns out that unstructured interviews manage just a 0.15 correlation between interview results and job success i.e. only a bit better than random (correlation of 1 is perfect, zero is randomly, while -1 is perfect inverse correlation). Highly structured interviews, based on job competencies and evidence-based questions (which can be trained) manage 0.45 correlation. Ability tests on their own manage a correlation of 0.4, and if combined with structured interviews take the number up to 0.65, which although still not perfect was the best score that had been achieved.

Ability tests take various forms. The best of all are ones that are directly related to the job in hand e.g. actually getting someone to do a sales pitch if they are a salesman, or to write some code if they are a programmer. For more on one of these see an earlier blog. The most creative of these I heard about was an interviewer for a sales position who asked each candidate to sell him the glass of water in front of him on his desk. One candidate want to the waste-paper bin by the desk, pulled out a match, set fire to the paper inside and then said "how much do you want for the water now?". Generally less creative approaches are adequate, and at Kalido we use a software design test for all our software developers which enables us to screen out a lot of less gifted candidates, saving time both for us and the candidates.

General intelligence tests also score well as, all other things being equal, bright people do better in a job than those less bright; studies show that this applies across all job disciplines (yes, yes, you can always think of some individual exception you know, but we are talking averages here). The 0.4 correlation with job success that these tests provide is a lot better than the 0.15 which most interviewing manages. Personality profiles can be used to supplement these, as for some types of job research has been done which shows certainly personality types will find it more comfortable than others. For example a salesman who was hated rejection, didn't enjoy negotiating, disliked working on his own and was pessimistic might still be a good salesman, but would probably not be a very happy one. You don't have to invent such profiles and tests: there are several commercially available ones, such as the ones we use at Kalido from SHL.

The cost/benefit case for employing proper interview training and such tests is an easy one to make: the cost of a bad job hire is huge just in terms of recruitment fees, never mind the cost of management time in sorting it out, the opportunity costs of the wasted time etc. Yet still most software companies don't employ these inexpensive techniques. Perhaps we all like to think our judgment of people is so great that other tools are irrelevant, yet remember that 0.15 correlation score. There may be a few great interviewers out there, but most people are not, and by supplementing interviews by other tools like job-related tests and good interview training we can improve the odds of hiring the best people. I used to work at Shell, who did a superb job of structured interview training, and I recall being trained for several days, including video playback of test interviews, on how to do a half-hour graduate interview. This may sound like a lot of effort, but it is trivial compared to the cost of a bad hire.

Many software companies seem to be missing a trick here. When I applied for jobs as a graduate I recall virtually every large multi-national had an extensive selection process including ability tests, yet in the software industry, where almost all the assets are its people, such things seem rare. I was amused to hear a recruitment agency whining at me for our use of screening tests at Kalido: "but only software companies like Microsoft and Google do tests like that". I rest my case.

Monday, November 21, 2005

A bit rich

You may have seen SAP's latest advertising campaign, which breathlessly claims that "A recent study of companies listed on NASDAQ and NYSE found that companies that run SAP are 32% more profitable than those that don't*. Based on a 2005 Stratascope Inc. Analysis". There are at least three interesting features in this advert. The first is that little asterisk at the end. If you read the (small font) footnote you will see that this excludes all financial services companies. A little odd until you realize that financial services companies these days have two particular characteristics: they make a lot of money, and they rarely use SAP. Could their inclusion have, perhaps changed the results somewhat? Of course one could ask Stratascope, the market research firm headed by a chairman and President Juergen Kuebler, a nine-year SAP veteran, and I'm sure they will give an unbiased and objective opinion. I am going to take a wild guess and say that including financial services companies would not make the figure look better.

However by far the most interesting aspect of this advert is its sheer chutzpah, with its implication that if you use SAP then you will be more profitable: 32% more in fact. Lest the subtleties of statistics have escaped the denizens of Walldorf, I would like to remind them that because two datasets have a positive correlation, it does not mean that this correlation is caused by anything. For example, I observe that my increasing age is well correlated with the steady rise in global temperatures. As far as I know, there is no direct link. Similarly, one could observe: "the stork population has gone up, as has the human population. Hence storks must create human babies". For example, I can tell you that four of the UK's five most admired companies last year were Kalido customers (true) yet to say that one implies the other is absurd.

It is particularly implausible to make such bold claims in the case of IT systems of any kind, which may well have distinct and real benefits in individual cases and projects but whose influence on overall productivity has generally eluded economists entirely. The studies there have been are controversial e.g. the 2002 McKinsey study that showed that, other than a few sectors (retail, securities, telco, wholesale, semiconductors IT) there had been no productivity growth whatever between 1995 and 2000 in the US despite heavy IT investment. This study was looking at all IT investment, of which ERP is only a small fraction.

So overall, SAP's claim excludes a key industry sector to selectively improve its results and in any case makes a claim that is logically spurious and has no supporting evidence. Other than that, excellent. All par for the course in software industry marketing.

Friday, November 18, 2005

Uncomfortable bedfellows

It is rare to find the word "ethical" and "software company" to appear in the same sentence. The industry has managed to become a byword for snake oil, aggressive pricing and sneaky contract terms. Years ago when working at Exxon I recall one vendor who sold Esso UK some software, rebadging the product as two separate products and then trying to charge Esso for the "other" product, which of course they had already bought. Needless to say I was having none of that, but the very notion that they would try to do this spoke volumes about their contempt for the customer.

The prize (so far) goes to one of my colleagues, who used to work for a software company that once sold a financial package to a customer on the basis that it had a particular module. The only problem was that it did not exist. He was asked to set up a "demo" of the software to the customer which sounds like something out of "The Office". In one room sat the customer at a screen, who typed in various data to the system and requested a report from an (entirely fictiitious) pick list of reports that the vendor was supposed to have built but had not. In the next room was a programmer. When the customer pressed "enter" the data would appear in a table, and they quickly manually edited a report format using the customer data, which was "off to the printer". A couple of minutes later the report was brought in to the customer, who could they see the new reporting module in action. The slow response time was explained by an "old server". Lest you think this was some fly by night operation, this major provider of financial software had over USD 100 million revenue back in the early 1990s, which was when this particular scam was perpetrated. And yes, they closed the deal.

As if to prove that enterprise software companies are still amateurs when its comes to dubious behavior, Sony has just made all the wrong headlines by placing what is essentially a clever virus on its CDs, purportedly to avoid digital copyright violations. The software installs itself into the root directory of your PC and, quite apart from preventing unathorised copying of music, also broadcasts back to Sony what music you have been listening to. Apparently million of PCs may have been infected, and only after several refusals have Sony now agreed to stop producing the spyware. Just what corporate manager at Sony thought this was a really bright idea that no-one would figure out is yet to emerge. However it is safe to say that Sony's PR agency are not having a quiet run-up to Christmas right now.

I'd be interested to hear about any reader's experiences of outrageous software company behavior.

Less is more when it comes to innovation

A survey by the Economist Intelligence Unit (sponsored by PWC) just released today has a very interesting finding that backs up something I have written about before: when it comes to innovation, don't look for it in large companies.

In answer to the question:

"Small or start up competitors are more likely than large, established companies to create breakthrough products or business models" no less than 70% of senior executives "agreed" or "strongly agreed", with only 10% disagreeing. Given the vastly greater resources and R&D budgets available to large companies, why the dearth of innovation there?

It is easy to argue that bureaucracy is the cause but I think there is another reason that I have not seen written about. I had some dealings with Oracle in the 1990s when they were concerned about the emergence of object databases, and they wanted customer input as to whether this was a real threat to them. What struck me in several meetings in Redwood City as I met with a range of senior Oracle technologists, was how that the most impressive people were the ones working on the database kernel, the core of the Oracle product. Less impressive were ones working on the applications, and least of all were some working on the tools layer above. This makes sense: if you are a top developer and join Oracle then you probably want to work on the crown jewels. Similarly in my dealings with my favorite Walldorf-based ERP vendor I have found the best people to have worked on the basis, the next best the modules, and the least impressive ones on the peripheral tools. Again, the key to SAP's success has been its integrated ERP system, so it is hardly surprising that the top people gravitate there. Moreover the area which made the company initially successful is probably the one where the greatest understanding of the customer issues resides. The farther you move away from this the less likely it is that the best people will be working, and also the less likely it is that the senior executives (who built the company n the first place around a core technology) will grasp the opportunity and back innovation. Hence the ideas leak out of the company as those passionate about them leave to set up start-ups.

Thursday, November 17, 2005

Nine women can't produce a baby in a month

The software industry is not good at learning from previous lessons and mistakes. We seem to re-invent the wheel at fairly regular intervals, perhaps because a lot of people working in the technology industry are quite young and perhaps because we assume that anything done ten or more years ago is inherently outdated. One area I regularly observe this collective blind spot is in estimating and project management. Software projects have a poor track record of coming in on time and budget, and this has a number of causes. One is unrealistic expectations. A wise aeronautical engineer once said: "better, cheaper, faster: pick any two, but never all three" yet we all still encounter projects where the end date seems to be set by some surreal remit ("the CEO wants it in time for Christmas" syndrome) without regard to the feasibility or effect on the project deliverable. Moreover, when a project does hit problems, as all too many do, there still seems to be the impression that throwing more resources at it will claw back the time lost. Sadly this is rarely the case.

There is some useful theory on this subject. A number of writers in the 1980s published algorithms to help estimate project duration and team size, based on the observation of many software projects. You can read more about this subject in books such as "Controlling Software Projects"by Tom deMarco, Software Engineering Economics by Barry Boehm and Measures for Excellence by Putnam and Myers. However these sources agree that the evidence is that in order to bring a project end-date forward you need to deploy exponentially more resources to do that. The theory actually shows an equation relating the elapsed time to effort as:

effort = constant/time to the power four

which for the less mathematically inclined means that the end date (project time) has a BIG effect on the effort needed. For example a project that was estimated at 18 months elapsed (a nice round number selected by IT management) could, if it was extended to 19 months, be done with 20% less effort. That's right: by extending your elapsed time by 5% you need 20% less effort. When I first saw this it seemed almost absurd, until I got involved briefly in project estimating when I worked at Shell and observed two projects in the upstream. They were that rare thing: the same project, but being done in different Shell subsidiaries. They were independent, and were the same size and scope. One project was estimated to 13 months, the other was to take the same, but in one project a decision was taken to bring forward the elapsed time to 12 months in order to fit in with another project. Money was not a major factor on this particular project, and more resource was piled in to bring the date forward. Remarkably, the compressed project took 50% more effort to bring in than the one which ran its "natural" course, something that caused general bewilderment at the time but one which actually fits in tolerably well with the software equation above.

Why is the effect so dramatic? By adding new people to a project, people that were working on the project have to stop what they were doing an on-board the new people. Now there is a bigger team, the communication becomes trickier, as more people need to be involved and the project specification is open to interpretation by more people. As you add more and more people the problem worsens: now people don't fit into one room any more, so they have to have a meeting in order to solve a problem rather than just turning to their neighbor etc.

The message is that if you have a project that is having problems with its schedule, it is much easier to reduce the scope of the project slightly, and deliver the rest of the functionality in a later phase, than it is to try and pile on more resources to cram it into the original schedule. If you can't reduce the scope of the project then you can only make the people on project more productive (good luck with that) or add a LOT of resources. At least there is a formula you can look up to tell you many more resources you need, even if management won't like the answer.

Wednesday, November 16, 2005

Software vendors can learn something from Eskimos

Now that master data management is gaining attention as an issue, it is interesting to observe the stances of the industry giants. As perhaps might be expected, each claims to have an all-encompassing solution to the issue (though they curiously had no product offering at all in this area two years ago, so presumably must count as quick learners) - all you have to do is adopt their middleware stack. Oracle have their DataHub, SAP have MDME or whatever it is called this week, IBM have an offering crafted by their acquisitions of DWL, Trigo and Ascential, Microsoft is well, Microsoft. All of them seem to be missing a key point. Intent on expanding their own infrastructure footprint at the expense of their rivals, they do not seem to grasp that large enterprise customers simply aren't in a position to move wholesale to one middleware stack. Large global companies have SAP and IBM Websphere, and Oracle, and Microsoft and have a huge base of deployed applications using these, so any solutions which says: "just scrap the others and move to ours" is really not going to fly for the vast majority of customers.

By contrast what customers need is software that can, in a "stack neutral" way, deal with semantic inconsistency of their business definitions, whatever technology these definitions reside in. Surely by now it is clear after the billions of dollars spent on ERP that "just standardize" is simply a doomed approach for companies of any scale. Large companies can just about manage a common chart of accounts at the highest level, but soon as you drill down into the lower level details these definitions diverge, even in finance. In marketing (which by definition has to respond to local markets) manufacturing and others there is even less chance of coming up with more than a small subset of common high-level business definitions. Just as the Eskimos are said to have fifty-two words for snow, you;d be surprised at how many different ways large companies can describe a product, or even something apparently unambiguous like "gross margin" (26 different ways in one company I worked for). Hence you need technology that can help resolve the semantic differences here, and support the workflow required to allow maintenance. For example, DWL was strong at customer data integration at the detailed level, but many types of MDM problems require complex human interaction. For example a new international product hierarchy does not just get issued; there are versions of it, people need to review, modify and finally publish a golden copy. Most MDM tools today simply ignore this human interaction and workflow issue.

I think IBM have the best chance of figuring this out of the big four, simply because unlike SAP and Oracle they don't have an applications business to defend, while Microsoft has never really figured out how to deal with the complex end of large enterprises. IBM's acquisitions in this area may have been multiple but they have been shrewd. Ascential was the strongest technology of the ETL vendors, while DWL and especially Trigo were well respected. Ironically, IBM may need yet another leg to their strategy, since they too have yet to really address the "semantic integration" problem that is at the heart of MDM.


A big thank you to those of you who nominated this blog for the blog awards. I am pleased to say that it has been short-listed as one of the ten finalists for the independent tech blog of the year. I am so glad that you are finidng the blog interesting. Please register your vote for the final round.

Tuesday, November 15, 2005

Size isn't everything

An October 2005 survey by IT Toolbox shows that, even amongst large companies, the size of the corporate data warehouse is, er, not that big. Out of 156 responses (40% US), only 12% had enterprise data warehouses larger than 4TB, while 18% had ones between this and 1 TB, while the rest had data warehouses less than 1TB. Indeed 25% had data warehouses less than half a terabyte. Admittedly only 20% of customer had just one data warehouse, with 26% having over five warehouses, but these figures may seem odd when you hear about gigantic data warehouses in the trade press. Winter Group publish a carefully checked list of the 10 largest data warehouses in the world, and their 2005 survey shows the winner at Yahoo weighing in at 100TB. The tenth largest, however (at Nielsen), is 17TB, which shows that such mammoths are still a rarity.

Why are IT folks obsessed about this? I can recall speaking at a data warehouse conference a few years ago and speaker after speaker eagerly quoted the size of his data warehouse as some sort of badge of courage: "Well, you should see how big mine is...". Of course companies that sell hardware and disk storage love such things, but why is there such a big discrepancy between the behemoths in the Winter Group survey and the less than a terabyte brigade? The answer is quite simple: business to business companies don't have large transaction volumes. If you are a large retailer or a high-street bank, then you may have thousands of branches, each one contributing thousands of individual transactions a day. These add up, and constitute the vast majority of the volume in a data warehouse (perhaps 99% of the volume). The rest of the data is the pesky master data (or reference data, or dimension data - choose your jargon) such as "product", "customer", "location", "brand", "person", "time" etc that provides the context of these business transactions. You may have millions of transactions a day as a retailer, but how many different products do you stock? 80,000 for a convenience store chain? 300,000 for a department store? Certainly not tens of millions. Similarly McDonalds has 27,000 retail outlets, not millions. The same for organizational units, employees etc. One exception that can be very large is "customer" but again this is true only for business to consumer enterprises e.g. retailers or Telcos. Companies like Unilever are very large indeed, but primarily sell to other businesses, so the number of direct customers they deal with is measured in the many thousands, but not millions.

So B2B enterprises usually have quite small data warehouses in volume, even though they may have extremely sophisticated and complex master data e.g. elaborate customer segmentation or product or asset classification. One way to measure such complexity is by adding up the types of business entity in the data model e.g. each level of a product hierarchy might count as one "class of business entity" (CBE), "customer" as another. Some very large data warehouses in volume terms often have very simple business models to support, perhaps with 50 CBEs. On the other hand a marketing system for a company like BP may have 400 or more CBEs. This dimension of complexity is actually just as important as raw transaction size when looking at likely data warehouse performance. A data warehouse with 1TB of data but 50 CBEs may be a lot less demanding than one with 200GB of data but 350 CBEs (just think of all those database joins). Oddly, this complexity measure never seems to feature in league tables of data warehouse size, perhaps because it doesn't sell much disk storage. I feel a new league table coming on. Anyone out there got a model with more than 500 CBEs?

Data quality blues

A 2005 research note from Gartner says that more than 50% of data warehouse projects through 2007 will be either outright failures or have limited acceptance. This does not surprise me in the least. There are several aspects in which data warehouse projects are under unusual strain, as well as the normal problems that can beset any significant project. Data warehouses take in data from several separate data sources (ERP, supply chain, CRM etc) and consolidate it. Consequently they are dependent upon both the quality of the data and the stability of those source systems: if any of the underlying source systems has a major structural change (e.g. a new general ledger structure or customer segmentation) then it will affect the warehouse. You might think that data quality was a minor problem these days with all those shiny new ERP and CRM systems, but you'd be wrong. In Kalido projects in the field we constantly encounter major data quality issues, including with data captured in the ERP systems. Why is this?

An inherent problem is that systems typically capture information that is directly needed by the person entering the data, and other things as well, which are useful to someone, but not directly to that person. I remember doing some work in Malaysia and seeing a row of staff entering invoice data into a J.D. Edwards system. I was puzzled to see them carefully typing in a few fields of data, and then just crashing their fingers at random into the keyboard. After a while, they would resume normal typing. After seeing this a few times my curiosity got the better of me and I asked one of then what was going on. The person explained that there were about 40 fields that they were expected to enter, and many of the fields were unnecessary they could not move to the next screen without tabbing through each field in turn unless they entered some gibberish in one of the main fields, at which point the system conveniently took them to the last field. So by typing nonsense data into a field that turned out to be quite relevant (but not to them) they could save lots of keystrokes.

Of course this is an extreme case, but have you ever filled out an on-line survey, got bored or frustrated because they are asking something you don't have the data for and started just answering any old thing just to get to the end? The point is that people care about data quality when they are going to get something back. You can be sure they enter their address correctly on the prize draw form. But in many IT systems people are asked to enter information that doesn't affect them, and human nature says that they will be less accurate with this than something which does mean something directly to them. Some data quality issues can be dramatic. In the North Sea one oil company drilled through an existing pipe because it was not there according to the system that recorded the co-ordinates of the undersea pipes: this merely cost a few million dollars to fix, as fortunately the pipe was not in active use that day or the consequences would have been much worse. Another company discovered that it was making no profit margin on a major brand in one market due to a pricing slip in its ERP system that had gone undetected for two years.

The reason data warehouses suffer so much from data quality issues is that they not only encounter the data problems in each of the source systems they deal with, but because they also bring all the information together the data problems often only becomes clear at this point e.g. the problem with the pricing became apparent because the data warehouse showed that they were making zero gross margin, which was not apparent inside the ERP system since the margin calculation was made up of data from several systems combined. It is the data warehouse that shines light on such issues, but often is wrongly blamed when the project is delayed as a result.

Data quality problems are one major issue where there is no magic solution. Data quality tools can help, but this is a people and process issue rather than a technology issue. However another reason data warehouse projects are perceived to fail is that they take a long time, and cost a lot to maintain. Since it takes 16 months to build an average data warehouse (TDWI) survey) it is not surprising that some changes to the business occur in this time. The only way to really address this is to use a packaged data warehouse solution, which takes less time to implement (typically less than 6 months for Kalido). Maintenance costs are another major problem, and here again there are modern design techniques that can be applied to improve this situation. See my blog the data warehouse carousel.

It is only by making use of the most modern design approaches, iterative implementation approaches that show customers early results, and the most productive technologies that data warehouse project success rates will improve. There will always be projects that run into trouble due to poor project management, political issues and lack of customer commitment, but data warehouse projects at least need to stop making life harder for themselves than they need be.

Monday, November 14, 2005

Lies, damned lies, and Excel formulae

I made a discovery the other day. Not one of those "eureka" moments beloved of bathing Greeks, but something that prompted me to wonder about the accuracy of some of the figures we take for granted. We are so used to Excel on every desktop that we trust it implictly, and so when I needed to work out the standard deviation of some figures, I naturally turned to Excel. For those of you whose maths is rusty, standard deviation is how spread out a sample of numbers are. For example: the average of 1,3,5,7,9 is 5, and so is 3,4,5,6,7 but it can be seen that the latter sequence has its numbers more closely bunched. Standard deviation is just a mathematical measure of how close or otherwise that bunching is (in the examples above the standard deviation of the first set is 2.83 and the second set 1.41 i.e. the second set is closer bunched than the first set).

In Excel to use a function you just type into a cell something like "=average(1,3,5,7,9)" and magically you get the answer (5 in this case). So, what could be easier to do than type in:
"=stdev(1,3,5,7,9)" and see the answer appear? The trouble is that it doesn't. The answer pops up as 3.16, not 2.83 as I was expecting. Just in case you doubt my ability to calculate a standard deviation, feel free to do it the old-fashioned way by hand, and you will see that you get 2.83, not 3.16, so what is going on? After digging around the Excel help and chatting to a mathematician friend to check I was not going completely mad, I discovered that there are actually two different standard deviation functions in Excel, one designed for where you want the whole sample set measured, and one where you want to estimate a large population from a sample, which has a slightly different formula. Now I may be getting a bit slow these days, but I did do a maths degree and yet this distinction had eluded me all these years, so I doubt I'm the only person out there unaware of this difference. If you were the person at Microsoft naming Excel functions, which do you think that people would think was the "normal" version, "STDEV" or "STDEVP", which is what they actually named the function that calculates the standard deviation of a whole population. I am guessing that not too many of us go "aha, I'll try "=STDEVP I expect that will be it".

Now this may seem like a lot of fuss about an esoteric mathematical function, but be aware that standard deviation is one of the most commonly used statistical functions, used to look at samples of population, mechanical failure rates, delivery errors, temperatures, patient response rates, you name it. People take serious decisions based on statistics: which drug to put forward for clinical trials, traffic planning, machine maintenance and endless others; standard deviation is the most commonly used tool in "statistical process control", widespread in the manufacturing industry. Given that the most of the modern world uses Excel, I find it pretty surprising that a sizeable proportion of the world has been using the wrong standard deviation function for the last twenty years all because some idiot in Seattle chose a "precise" name rather than the obvious name that most of us would have chosen.

I suppose this is only what I should have expected from a product who thinks that 1900 is a leap year. Try the formula "=DATE(1900,2,29)" and watch it happily display the 29th February 1900. As we should all be aware after the fuss over Y2k, 1900 is NOT a leap year (leap years are every four years, except centuries which are not, expect every fourth century, which is, so 1600 and 2000 are leap years, but not 1800 or 1800 or 1900). The moral of this little story: don't take everything on trust!

Friday, November 11, 2005

"You want a system to do what now?"

Tony Lock writes an excellent article in this week's newsletter, highlighting the communication gap between IT departments and their customers. A new survey by Coleman-Parkes found that amongst 214 FTSE 750 organizations, only 18 percent held weekly meetings between business managers and the IT teams. The research also indicated that 31 percent of those surveyed claim that they never or hardly ever have such meetings. In large corporate IT departments there can be a culture of avoiding contact with "users", who always seem to have strange and unreasonable demands that don't fit into the perceptions of the IT department. The atmosphere can become quite hostile if IT departments set themselves up as "consultancy" organizations that charge themselves out to their internal customers. The internal customers resent being forced to use an internal service that they often perceive as unresponsive, and can be outraged to find themselves being charged at similar rates to external service providers. Some of this is not reasonable - those same customers are forced to use internal legal counsel and are charged through the nose, whether or not they like it. However there is a peculiar frustration with many business users over their IT departments that can boil over when discussing charge-back mechanisms and service level agreements.

Over-elaborate internal billing systems can cause unnecessary cost and frustration. I recall when I was at Exxon seeing an instructive project to review internal data centre charges. The existing system was extremely elaborate and charged based on mainframe usage, disk storage, communications costs and a whole raft of items. Most users didn't understand half the items on their bills, or played games to try and avoid hitting arbitrary pricing thresholds. None of this added one iota of revenue to Exxon. The project manager, a very gifted gentleman called Graham Nichols (on his way rapidly up the organization), successfully recommended replacing the entire system with a single charge once per year. This saved a few million pounds in administration and endless arguments, and people's tempers were much improved all round.

Perhaps some of the problem is when an organization grows very large, it is difficult to keep perspective. Shell employed around 10,000 IT staff in the 1990s, directly or indirectly, so it perhaps not surprising that the IT staff concentrated on their own internal targets and objectives, rather than troubling themselves too much to align themselves with the objectives of the core energy business. At a time when the oil industry was struggling with oil prices heading down towards 10 dollars, and so with serious cost-cutting going on all round, the internal IT group, living through the internet boom, was hiring to keep up with demand e.g. dealing with the Y2K problem. Seeing redundancies going on in engineering and marketing at the same time as a hiring boom in internal IT, tempers became frayed, to put it mildly.

Clearly senior internal IT staff do need to spend more time with their business customers, and find out how they can help them achieve their objectives. Moreover they need to communicate this throughout their organizations. How many internal IT staff know the top three business objectives of their company this year? Without even a vague idea of the goals that the business is pursuing, it is hardly surprising that business leaders become frustrated with internal IT groups. Those 31% of internal IT groups who never or hardly ever meet with their customers need to change this attitude or get used to living on a Bangalore salary in the future.

A big industry, but still a cottage industry

IDC today announced their annual survey results of the size of the data warehousing market. IDC sizes the overall market in 2004 at 8.8 billion. The "access" part of market e.g. Business Objects, Cognos, was USD 3.3 billion,"data warehouse management tools" (which includes databases like Teradata, and data warehouse appliances) was USD 4.5 billion Data warehouse generation software (which includes data quality) was sized at USD 1 billion. This was 12% growth over 2003, the fastest for years, and IDC expect to see compound annual growth of 9% for the next five years.

One feature of this analysis is how small the "data warehouse generation" part of the market is relative to databases and data access tools. It is in some ways curious how much emphasis has been on displaying data in pretty ways (the access market) and the storage mechanism (data warehouse management market) rather than how to actual construct the source of the data that feeds these tools. This is because today that central piece is still in the cottage industry stage of custom-build. Indeed with an overall market size of USD 35 billion (Ovum) it can be seen that the bulk of spending in this large market is still with systems integrators. Only a few products live in the "data warehouse generation" space e.g. SAP BW and Kalido (data quality tools should really be considered a separate sub-market). Hence the bulk of the industry is still locked in a "build" mentality, worrying about religious design wars (Inmon v Kimball) when one would have expected them to move into a "buy" mentality. This inevitably will happen, as it did to financial applications. Twenty or so years ago it was entirely normal to design and build a general ledger system, and who would do that today? As large markets mature, applications will gradually replace custom-build, but it is a slow process, as can be seen from these figures.

The average data warehouse costs USD 3 million to build (according to Gartner) and only a small fraction of this is the cost of software and hardware, the majority being people costs. It also takes 16 months to deliver (a TDWI survey) which is an awful long time for projects which are supposedly delivering critical management information. To take the example of Kalido, the same size project takes less than 6 months instead of 16 months, so for that reason alone people will eventually come around to buying rather than building warehouses. Custom data warehouses also have very high maintenance costs, which is another reason for considering buy rather than build.

The rapid growth in the market should not be surprising. As companies have bedded down their ERP, supply chain and CRM investments it was surely inevitable that they started to pay attention to exploiting the data captured within those core transaction systems. The diversity of those systems means that most large companies today still have great difficulty answering even simple questions ("who is my most profitable customer", "what is the gross margin on product X in France v Canada") which causes senior management frustration. Indeed a conversation I had at the CEFI conference this week with a gentleman from McKinsey was revealing. In recent conversation with CEOs he explained that McKinsey were struck by how intensely frustrated CEOs were at the speed of response of their IT departments to business needs, above all in the area of management reporting. 16 month projects will not do any longer, but IT departments have are still stuck in old delivery models that are not satisfying their business customers - the ones who actually pay their salaries.

Thursday, November 10, 2005

The decline of the trade show

Douglas Adams once remarked that mathematics is universal, with its rules being consistent across the universe with the exception of the numbers on an Italian waiter's billpad. To this he might have added the supposed attendee numbers at IT industry trade shows. Ever been to a show recently where the organizers claim 500 paying attendees, yet you can only count half that? Some trade show exhibitor sections in the last few years have become depressing affairs, with a thin trickle of conference attendees registering for prize draws, or in some cases just handing over their resumes at the booths. Vendors have been increasingly desperate to drum up attendance. At one trade show recently I was handed a brochure about a database appliance by a pretty girl. I asked her about the technology and she said "actually, I don't know anything about this - I'm a model". Is this what things have come to?

I had assumed that the malaise would have improved with the general modest recovery in the IT industry's fortunes, but this barely seems to have happened. I am unaware of any proper data on the subject, but anecdotally the punters seems to be staying away in droves. Where have they gone?

For one thing they are attending webinars. Enabled by modern technology such as Webex, webinars are a more targeted way of reaching people interested in your message. They are attractive to vendors because they are relatively cheap (a trade show exhibit can cost USD 10-30k in fees, plus travel and people costs) and you get people attending who are genuinely interested. Instead of a trickle of bored looking geeks in search of free giveaways, with the odd five minute conversation thrown in, at a webinar people log in and listen to you for perhaps an hour. The number of contacts can compare well also. At a regular trade show you might get (say) 40-50 contacts, but perhaps only 5-10 of these will be of any real level of interest. By contrast, at a webinar you have people who have bothered to take an hour of their time to listen to you. At Kalido we have run webinars with over 300 attendees, so it can be seen that this compares very favorably to trade shows.

The other forum that people still attend are user groups e.g. there were 2,000 attendees at last week's Business Objects user group. Customers still want to hear about product directions, meet other customers and get a bit of free education. While trade shows are not yet an endangered species, I wonder whether the rise of the webinar will gradually cast them in the role of the slide rule against the pocket calculator.

Monday, November 07, 2005

Elephants rarely dance

A recent Business Week article gave a good example of a small software company providing an important solution to retailer Circuit City in the face of competition from industry giants. The dot-com madness made CIOs understandably wary of small software companies bearing gifts, yet it is important for these same CIOs to realize that they do their shareholders no favours by taking an ultra-conservative "buy only from giants policy". For a start, this option is by no means always safe. It also a flawed strategy.

Industry giants rarely produce innovative software. For example the founding executives of both Siebel and were both Oracle executives, but were unable to create what they knew the market wanted at Oracle itself. Large companies become inevitably less fast-moving as they grow, and frequently become more inwardly focused and cease listening to their customers, the very people that made them successful when they were small. In my years as a strategic technology planner I learned that the key to success in software portfolio planning was a twin approach: standardize on commodity infrastructure, yet encourage innovation above this layer. For example, it is pretty clear by now that the major relational databases (Oracle, DB2, SQL Server) are all functionally rich and basically work. Nobody uses more than a fraction of the features they have, and which one you choose is largely a matter of taste. However it is best if you can standardize on one of them, since you then get easy interoperability, and your IT staff build up skills in that technology that transfer when they switch departments. This is an example of a layer of infrastructure that has matured to the extent that the benefits of standardizing outweigh a few features at the edges.

Yet at the application layer this is by no means the case, except perhaps in the area of finance, where no one is really likely to produce a deeply innovative general ledger system any more. Yet this is clearly not the case in marketing, sales and many other business areas, where exciting applications are still popping up, and companies like Salesforce can radically change an existing application area. Here it makes no sense to try and second guess the market, where evolution is still working its magic to see what technologies work best. Trying to standardize too quickly in an area that is evolving will not only most likely make you look foolish when you get it wrong, but also misses real opportunities for companies to take advantage of new and exciting offerings.

Giant software behemoths are not the place where innovation flourishes, and the further they get from their core area of competence, the less likely they are to succeed. As a strategic technology planner, your job is to solidify the core infrastructure but to enable your customers to take advantage of innovation in fast-moving or evolving areas. This state of affairs is not going to change in software, where fairly low barriers to entry enable innovation to be created without massive capital investment. Best of breed software does indeed live!

The fullness of time

Supposedly "timing is everything", yet analysis across time is a surprisingly neglected topic in many data warehouse implementations. If you are a marketer, it is clear that time is a critical issue: you want to be able to compare seasonal sales patterns, for example. A retailer may even be interested in the pattern of buying at different times of the day, and change stock layout in response to this. Yet in many data warehouse designs, time is an afterthought. For example in SAP BW you can only analyze a field for date/time reporting if you specify this up-front at the time of implementation, and this carries a performance penalty. Even this is an improvement on many custom-build warehouses, where data is not routinely date-stamped and so even basic reporting using time is impractical.

Advanced data warehouse technology should enable you to not only do simple time-based analysis like "last summer's sales v this summer's sales" but also be able to keep track of past business hierarchies. For example you may want to see the sales profitability before and after a reorganization, and so want to look at a whole year's data as if the reorg never happened, or as if it had always happened. One major UK retailer has a whole team of staff who take historic data and manually edit a copy of that data in order to be able to make such like-for-like comparisons, and yet this type of analysis should be something that their data warehouse can automatically provide. An example of doing it right is Labatt where the marketing team now had access to a full-range of time-based analysis, enabling to take more data-driven decisions.

Another sophisticated user of time-based analysis is Intelsat, who used sophisticated time-based analysis to improve their understanding of future satellite capacity. Satellite time is sold in blocks, usually in recurring contracts to news agencies such as CNN or the BBC e.g. "two hours every Friday at 16:00 GMT". Each of these contracts has a probability of being renewed, and of course there are also prospective contracts that salesmen are trying to land but may or may not be inked. Hence working out the amount of satellite inventory actually available next Tuesday is a non-trivial task, involving analysis that was previously so awkward that it was only done on occasion. After implemented a data warehouse that inherently understands time-variance, Intelsat were able to identify no less than USD 150 million of additional capacity, and immediately sell USD 3 million of this, a handsome return on investment on a project that was live in just three months and cost less in total than even the immediate savings.

If your data warehouse can't automatically give you sophisticated time-based analysis then you should look at best-practice cases like this. Make time to do it.

Thursday, November 03, 2005

Look before you leap, but look in the right place

How should customers go about "due diligence" prior to buying software? Certainly enterprise software is a major, multi-year commitment, and the overall costs of it will be many times the actual purchase price, so it is worth looking carefully before you leap. However many companies seem to look in the wrong place.

Firstly there is an assumption that if you buy software from an industry behemoth, then this is "safe", whereas buying from a smaller vendor is inherently more dangerous. This is not necessarily the case. While the actual finances of an industry giant are rarely in doubt, the question to ask is not the the size of the balance sheet, but how committed are they to this particular product? When I was working at Exxon in the 1980s we discovered that the "strategic" 4GL called ADF that IBM sold was to be dropped in favor of another tool they had built called CSP. The fact that we were a big oil company and they were ultra-safe IBM did not help us one bit. Migration? We could hire their consultants to help us rewrite all the applicatons: thanks a lot. Or consider all the technologies that Oracle has acquired over the years and quietly dropped when theyfailed to perform. When looking at products from large vendors I believe the key to risk assessment is to see how far the vendor is straying from its core competence. For example, Oracle is hardly likely to abandon its core database product, which still accounts for a huge share of its profits, but just how committed will it be to something a long way from this core area of its expertise? SAP has come to dominate the ERP space, but its execution on products away from its core competence has been shaky, to say the least. The most recent example was its dropping its MDM offering after two years, now promising a new product based around an acquisition. Cold comfort to those loyal customers who pioneered SAP MDM thinking that it was the "safe" choice. Vendors tend to misfire the further they stray from their core area of business, and customers should factor this into their risk assessment.

Assuming the software you are interested in is not from an industry giant, then how do you assess the risks then? Small software vendors always dread the following sentence: "We like your software but we will need to bring in our financial due diligence team before we go any further". This is partly because large corporations frequently lack experience in understanding how software companies are financed, and end up asking the wrong questions. Financial analysts used to dealing with large, stable public companies are often surprised at how small, and how apparently shaky, the balance sheets of privately held software companies are. This is partly because most are venture funded, and venture capital firms are careful to dole out their capital as their portfolio companies need it, rather than investing cash just to bolster company balance sheets.

Before looking at the right questions to ask, here is a true story to illustrate the wrong way. When working at Shell I was asked to look at a small company called Dodge Software (the name in itself did not inspire confidence), a general ledger vendor with some innovative technology that a subsidiary of Shell had already purchased. Before making a deeper commitment it was decided that due diligence should be done, so I was teamed up with a banking type with a very posh accent to look at the company. The company was very reluctant to share its accounts, but as it wanted the business it had little choice. There was a clear problem in terms of cash, with the company having less than six months of cash left at the rate it was burning. The finance analyst called the companies VCs, who hardly surprisingly sang its praises and its rosy future - well, what else were they going to do? The banker then met with the company CFO, who assured him that everything was fine and that further funds could be raised as needed. This actually comforted the banker, but not me, because I could see that, while the company had built up 16 customers with good names, there was no momentum: there were few recent customer names. This meant that the company in fact was stalling, and so would very likely struggle to raise another round of capital. Even if it did, why would the market situation improve for this company? It was pig-headedly selling a best-of-breed general ledger package at a time when broad integrated finance packages were all the rage, and it seemed unlikely to change this mind set. Consequently I wrote a negative assessment and the banker a guarded but positive one. The company duly folded about six months later, unable to raise new money.

The key here is not to focus entirely on the cash position of the vendor. If the company is growing fast and acquiring prestigious customers at a steady clip then it will very likely be able to raise more cash when it needs it. However there is a saying in venture capital: "raise money when you don't need it, because it is hard when you do need it". When things are going well then VCs flock around, but when they are problems then they stay away on droves.

The message is that there are certainly risks in buying software, but a risk assessment should be carried out even if buying from the largest vendors. For smaller vendors their market momentum is critical, and needs to be assessed just as much as their cash reserves.

Wednesday, November 02, 2005

Awash with appliances

It is interesting how success attracts competition. Teradata have built up a billion dollar business from selling high end hardware and proprietary database technology to handle extremely large transaction-based data warehouses, such as those occurring in retail, Telcos and retail banking. Netezza has done an excellent job of raising its profile as a start-up in clear competition to Teradata, while now there are even newer startups like DATAllegro, offering a data warehouse appliance in competition to both (with a new offering out today) and Calpont. This is a healthy sign in an industry that is undeniably very large (business intelligence is variously estimated to be USD 25-35 billion in size, though the vast bulk of this is consulting services) yet has remained extremely fragmented in terms of vendors. Software vendors, other than the DBMS vendors, are few and far between in the data warehouse space, since the industry is mostly locked into a custom build mentality, with Kimball v Inmon design religion wars being the order of the day. SAP have, after some false starts, brought their BW product to a wide (SAP) audience but, other than Kalido, there are few data warehouse software companies. Of course there are ETL vendors such as Informatica and Ascential (now bought by IBM) and the reporting tools of Business Objects, Cognos and Hyperion, but the data warehouse itself has lacked much in the way of software automation

Teradata have succeeded despite an apparently major obstacle: the highly proprietary nature of their offering. Large companies CIO departments generally loathe proprietary infrastructure, especially when they have just spent years trying to (just about) standardize on a particular database or hardware platform, so it is an uphill struggle for the appliance vendors. Red Brick briefly did well selling a database tuned for data warehouse applications, but eventually it could not shake off the idea that Oracle or IBM could just add a "star join" feature to their products and make it redundant. Hence it is to Teradata's credit that they have maintained clear blue water between themselves and Oracle/IBM/Microsoft at the high end of large data warehouses. This in turn has created a market large enough to attract new entrants such as Netezza and DATAllegro, who can offer an easy to understand "like Teradata, but cheaper" message to customers who have giant transaction datasets to analyze but balk at Teradata's high price tag and opaque pricing when it comes to maintenance payments. It will be very interesting to see whether IT departments will pass a blind eye over the proprietary nature of these offerings (after all, this objection was essentially what killed off object databases) in the way they have with Teradata, though rumor has it that Netezza at least is making good early progress.

Of course only a small subset of data warehouses have the kind of volumes and processing requirements that require such technology. A TDWI survey showed Teradata at just 3% market penetration of deployed data warehouse databases, but of course this is a very attractive 3%, with typical deals in the million dollar range. Teradata has managed to overcome the proprietary stigma that bedeviled object databases in the 1990s and carved out an attractive high end niche that Oracle etc seem unable to really compete with. Its challenge now is growth, with competitors like Netezza nibbling into its margins and general purpose databases that get more powerful with each release. However the boom in raw data e.g. RFID seems likely to mean there is plenty of demand yet for raw power.

Tuesday, November 01, 2005

Information Management Enlightenment

Gartner have recently been using the term "enterprise information management" as a blanket term to describe the technology and processes around a company's efforts to control and best use its information assets. The term extends beyond structured data into text, and even to digital content such as movies or music. As they identify, a key to making progress in such a potentially monumental task is to resolve semantic inconsistencies across the technical boundaries. The problem will be familiar to anyone who has worked in a large company. A product code used in the ERP system has a different code in the CRM system, and a different one again in the manufacturing system. There are godo reaons why such differences have emerged. If we take a physical product, then a manufacturing group will care about the materials that go into that product, its manufacturing process, and perhaps health and safety information associated with it. From a distribution perspective, its dimensions are important e.g. how many will fit in a container. From a marketing viewpoint it is important to understand the branding used, the packaging (perhaps the product is marketed in different ways in different countries) and the pricing. Each business unit cares about certain aspects of the product, but has limited direct interest in other aspects, so it is hardly surprisng that the ways in which the product is classified are different depending on whether you have a manufacturing, distribution or marketing viewpoint. For example, even something familiar like a Big Mac actually has quite different recipes in different countries; in some cases it is no longer even made from beef (e.g. in India, where the cow is sacred). A branded automative lubricant will have quite different technical specifications if you buy a can of it in a hot country like Vietnam than if you buy apparently the same product in, say, Iceland.

These different perspectives cause a complex web of differing classifications and semantics to occur for products in a large enterprise, and it is similar story for other terms like "customer", where again the key points of interest varies dramatically if you are a salesman from if you are trying to deliver a consignment to that customer, to whether you are trying to collect a payment from them. This is not just of academic interest: according to a Reuters survey, 30% of operational errors (e.g. incorrect deliveries) can be traced back to poor quality data. Those who have ever had the joy of trying to change your address on your bank or savings account will be familiar with the issue that your details do not just occur in one system only in many banks.

Trying to manage the various types of data in a large company is a mammoth task, and one which is on an uncertain footing since brands and customer details do not stay constant forever. This underlying patern of change means that initiative which seek to standardize (say) product codes "once and for all" are doomed to failure, because the things themselves are changing. However progress is possible. The first stage is to gain an understanding of the data across the company, then to describe the processes used to update this master data, and finally to bring automation to these processes. There is no single technology silver bullet since business processes are just as important as integration technology, but there a number of technologies do help matters e.g. data quality tools to help identify issues, emerging master data management products to assist with process automation, data warehouse technology to help understand and classify reference data, and EAI technology to actually link up and automate processes once they are under control. "Think big, start small" is the mantra, starting with a manageable scope and going through this process: identify data -> capture the processes -> automate the processes. Modern technologies that are better able to deal with change, along with universal access to the internet and so to applications that can automate workflow, are starting to make it possible to begin this enterprise information management journey. Confuscious said: "A journey of a thousand miles begins with a single step", and companies can set about this journey with more confidence than ever before.