Månadjuli 2010

Business Intelligence: Sometimes a problematic term

I often find myself in between the world of military language and the completely different language used in the information technology domain. Naturally it didn’t take long before I understood that term mapping or translation was the only way around it and that I often can act like a bridge in discussions. Understanding that when one side says one thing it needs to be translated or explained to make sense in the other domain.

Being an intelligence officer the term Business Intelligence is of course extremely problematic. The CIA has a good article that dives into the importance of defining intelligence but also some of the problems. In short I think the definition used in the Department of Defense (DoD) Dictionary of Miltary and Associated Terms can illustrate the core components:

The product resulting from the collection, processing, integration, evaluation, analysis, and interpretation of available information concerning foreign nations, hostile or potentially hostile forces or elements, or areas of actual or potential operations. The term is also applied to the activity which results in the product and to the organizations engaged in such activity (p.234).

The important thing is that in order to be intelligence (in my area of work) it both has to gone through some sort of processing and analysis AND only cover things foreign – that is information of a certain category.

When I first encountered the term business intelligence at the University of Lund in southern Sweden it then represented activities done in a commercial corporation to analyse the market and competitor. It almost sounded like a way to take the methods and procedures from military intelligence and just apply it in a corporate environment. Still, it was not at all focused on structured data gathering and statistics/data mining.

So when speaking about Business Intelligence (BI) in a military of governmental context it can often cause some confusion. From an IT-perspective it involves a set of technical products doing Extract-Transform-Load, Data Warehousing as well as the products in the front-end used by analysts to query and visualise the data. Here comes the first issue of a more philophical issue when seeing this in the light of the definition of intelligence above. As long as the main output is to gather data and visualising it using Enterprise Reporting or Dashboards directly to the end user it ends up in a grey area whether or not I would consider that something that is processed. In that use case Business Intelligence sometimes claims to be more (in terms of analytical ambitions) than a person with an Intelligence-background would expect.

Ok, so just displaying some data is not the same thing as doing indepth analysis of the data and use statistical and data mining technology to find patterns, correlations and trends. One of the major players in the market, SAS Institute, has seen exactly that and has tried to market what they can offer as something more than ”just” Business Intelligence by renaming it to Business Analytics. That means that the idea is to achieve ”proactive, predictive, and fact-based decision-making” where the important word is predictive I believe. That means that Business Analytics claims to not just visualise historic data but also claim to make predictions into the future.

An article from BeyeNETWORK also highlights the problematic nature of the term business intelligene because it is often so connected with data warehousing technology and more importantly that only part of an organisation’s information is structured data stored in a data warehouse. Coming from the ECM-domain I completely agree but it says something about the problems of thinking that BI covers both all data we need to do something with but also that BI is all we need to support decision-makers. The article also discusses what analysis and analytics really mean. Looking at Wikipedia it says this about data analysis:

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.

The question is then what the difference is between analysis and analytics. The word business is in these terms also and that is because a common application of business intelligence is around an ability to measure the performance through an organisation through processess that are being automated and therefore to a larger degree measurable. The BeyeNETWORK article suggests the following definition of business analytics:

“Business analysis is the process of analyzing trusted data with the goal of highlighting useful information, supporting decision making, suggesting solutions to business problems, and improving business processes. A business intelligence environment helps organizations and business users move from manual to automated business analysis. Important results from business analysis include historical, current and predictive metrics, and indicators of business performance. These results are often called analytics.”

Looking at what the suite of products that is covered under the BI-umbrella that approach downplays that these tools and methods have applications beyond process optimization. In law enforcement, intelligence, pharmaceutical and other applications there is huge potential to use these technologies to not only understand and optimize the internal processes but more importantly the world around them that they are trying to understand. Seeing patterns and trends in crime rates over time and geography, using data mining and statistics to improve understanding of a conflict area or understanding the results of years of scientific experiments. Sure there are toolsets that is marketed more along words like statistics for use in economics and political science but those applications can really use the capabilities of the BI-platform rather than something run on an individual researchers notebook.

In this article from Forbes it seems that IBM is also using business analytics instead of business intelligence to move from more simple dashboard visualizations towards predictive analytics. This can of course be related to the IBM acquisition of SPPS which is focused on that area of work.

From the book written by Davenport and Harris, 2007

However, the notion of neither Business Intelligence nor Business Analytics says anything about what kind of data that is actually being displayed or analyzed. From a military intelligence perspective it means that BI/BA-tools/methods are just one out of many analytical methods employed on data describing ”things foreign”.

In my experience I have seen that misunderstandings can come from the other end as well. Consider a military intelligence branch using…here it comes BI-software…to analyse incoming reports. From an outsider’s perspective it can of course seem like what makes their activity into (military) intelligence is that they use some form of BI-tools and then present graphs, charts and statistical results to the end user. Resulting from that I have heard over and over again that people believe that we should also ”conduct intelligence” in for instance our own logistics systems to uncover trends, patterns and correlations. That is wrong because an intelligence specialists are both skilled in analytics methods (in this case BI) and the area or subject they are studying. However, since these tools are called Business Intelligence the risk for confusion is high of course just because of the Intelligence word in there. What that person means is of course that it seems like BI/BA-tools can be useful in analysing logistics data as well as data of ”things foreign”. A person doing analysis of logistics should of course be a logistics expert rather than an expert in insurgency activities in failed states.

So lets say that what we currently know as the BI-market evolves even more and really assumes a claim to be predictive. A logical argument on the executive level to argue that the investment must provide something more than just self-serve dashboards. From a military intelligence perspective that becomes problematic since all those activities does not need to be predictive. In fact it can be very dangerous if someone is let to believe that everything can be predictive in contemporary complex and dynamic conflict environment. The smart intelligence officer rather need to understand when to use predictive BI/BA and when she or he definitely should not.

So Business Intelligence is a problematic term because:

  • It is a very wide term for both a set of software products and a set of methods
  • It is closely related to data warehousing technology
  • It includes the term intelligence which suggests doing something more than just showing data
  • Military Intelligence only covers ”things foreign”.
  • The move towards expecting prediction (by renaming it to Business Analytics) is logical but dangerous in a military domain.
  • BI still can be a term for open source analysis of competitors in commercial companies.

I am not an native English speaker but I do argue that we must be careful to use such as strong word as intelligence when it is really justifiable. Of course it is still late for that, but still worth reflecting on.

Enhanced by Zemanta

EMC & Greenplum: Why it can be important for Documentum and ECM

It's the logotype of Greenplum
Image via Wikipedia

Recently EMC announced it was acquiring the company Greenplum which many people interpret as EMC is putting more emphasis on the software side of the house. Greenplum is a company that focuses on data warehousing technology for the very large datasets that is called ”big data” applications where the most public examples are Google, FaceBook, Twitter and such. Immediate reactions to this move from ECM is of course it is a sign of market consolidation and a desire play among the largest players like Oracle/Sun, IBM and HP by being able to offer a more complete hardware/software combo stack to its customers. Oracle/Sun of course has its Exadata machine as an appliance-based model to get data warehousing capability. Chuck Hollis comments on this move by hightlighting how it is a logic move that fits nicely both with EMC storage techonology but also of course the virtualisation technology coming out of VMWare. To highlight the importance EMC will create a new Data Computing Product Division out of Greenplum. As a side note I think it is better to keep the old name to keep the ”feeling” around the product just as Documentum is a better name than the previous Content Management & Archiving Division. After an initial glance of Greenplum it seems to be an innovative company that can solve business problems where established big RDBM vendors does not seem to be able to scale enough.

With my obvious focus is on Enterprise Content Management I would like to reflect how I think or maybe hope this move will matter to that area of business. In our project we started looking deeper into the data warehousing and business intelligence issues in January this year. Before we had our focus in implementing a service-oriented architecture with Documentum as a core component. We already knew that in order to meet our challenges around advanced information management there was a need to use AND integrate different kind of products to solve different business needs. ECM for the unstructured content, Enterprise Search to provide a more advanced search infrastructure, GIS-technology to handle maps and all spatial visualisation and so on. Accepting that there is no silver bullet but instead try to use the right tool for the right problem and let each vendor do what it is best at.

Replicate data but stored differently for different use cases
SOA-fanatics has a tendency to want very elegant solutions where everything is a service and all pieces of information is requested as needed. That works fine for more steady near realtime solutions where the assumption is that there is a small piece of information needed in each moment. However, it breaks down when you get larger sets of data that is needed to do longer term analysis, something which is fairly common for intelligence in support of military operations. If each analyst requests all that data over a SOAP-interface it does not scale well and the specialised tool that each analyst needs is not used to its full potential. The solution to this is to accept that the same data needs to be replicated in the architecture for performance reasons sometimes as a cache – common for GIS-solution to get responsive maps. However, there is often a need for different storage and information models depending on the use case. A massive audit trail stored in an OLTP-system based on a SQL-database like Documentum will grow big and accessing it for analysis can be cumbersome. The result is that the whole system can be slowed down just because of one analysis. Instead we quickly understood the need for a more BI-optmized information model to be able to do massive user behaviour analytics with acceptable performance. It is in the end a usability issue I think. Hence the need for a data warehouse to offload data form an ECM-system like Documentum. In fact it not only applies for the audit trail part of the database but also makes up for excellent content analytics by analysing the sum of all metadata on the actual content objects. The audit trail reveals the user interaction and behaviour while the content analytics parts gives a helicopter perspective on what kind of data is stored in the platform. Together the joined information provide quite powerful content/information and social analytics.

Add a DW-store to Documentum?
The technology coming from X-hive now has become both the stand-alone XML database xDB but also of course the Documentum XML Store that sits beside the File Store and the relational database manager. That provides a choice to store information as a document/file in the file store, as structured information in the SQL-database or as XML Documents in the XML Store. Depending on use case we have the choice to choose the optimal storage together with different ways of accessing it. There are some remarkable performance numbers looking at running Xqueries on XML Documents in the XML Store as being presented at EMC World 2010. Without knowing how it makes any sense from an architecture perspective I think it would be interesting to have a Data Warehouse Store as yet another component of the Documentum platform. To some degree it is already in there within the Business Process Management components where the Business Activity Monitor in reality is a data warehouse for process analytics. Analysis is off-loaded from the SQL-database and the information is stored in a different way to power the dashboards in Taskspace.

Other potential pieces in the puzzle for EMC
I realize that Greenplum technology is mainly about scalability and big data applications but to me it would make sense to also use the technology just as xDb in Documentum to become a data warehousing store for the platform. A store focused on taking care of the structured data in a coherent platform with the unstructured that is already in there. Of course it would need a good front-end to enable using the data in the warehouse for viualisation, statistics and data mining. Interestingly Rob Karel has an interesting take on that in hist blog post. During EMC World 2010 EMC announced a partnership with Informatica around Master Data Management (MDM) and Information Lifecycle Management (ILM) which also was a move towards the structured data area. Rob Karel suggest that Informatica could be the next logical acquisition for EMC althought there seem to be more potential buyers for them. Finally he suggests picking up TIBCO both to strengthen EMCs BPM offering but of course also to get access to the Spotfire data visualisation, statistics and data mining platform.

We have recently started working with Spotfire to see how we can use their easy-to-use technology to provide visualisations of content and audit trail data in Documentum. So far we are amazed over how powerful it is but still very easy to use. In a matter of days we have even been able to create a statistics server powered visualisation showing likelyhood of pairs of document being accessed together. Spotfire can then be used to replace Documentum Reporting Services and the BAM solution in Taskspace. Their server components are made in Java but the GUI is based on .Net which is somewhat a limitation but maybe something EMC can live with on the GUI-side. The Spotfire Web Player runs fine on Macs with Safari at least.

An opportunity to create great Social Analytics based on ECM
I hope the newly created Information Intelligence Group (IIG) at EMC sees this opportunity and can convince the management at EMC that there are these synergies except going for the the expanding big data and cloud computing market that is on the rise. In the booming Enterprise 2.0 market upcomers like Jive Software have added Social Analytics to their offering. Powering Centerstage with real enterprise class BI is one way of staying at the front of competitors with much less depth in their platform from an ECM perspective. Less advanced social analytics solutions based on dashboards will probably satisfy the market for  while but I agree with James Kobielus that there will be a need to analysts in the loop and these analysts expect more capable BI-tools just like Spotfire. It resonates well with our conceptual development which suggests that a serious approach to advanced information managements requires some specialists focusing on governing and facilitating the information in the larger enterprise. It is not something I would leave for the IT-department, it is part of the business side but with the focus on information rather than the information technology.

Enhanced by Zemanta