Month: April 2010

Dave Kellogg on Palantir

I recently began reading the blog written by Dave Kellogg who is the CEO of Mark Logic, a company devoted to XML-based content management. I think I came to notice them when I discovered what cool technology EMC got when it bought X-hive which has now become Documentum xDb/XML Store. Mark Logic and X-hive was of course competitors in the XML Database market. In a recent blog post he reflects on the Palantir product after attending their Government Conference.

The main scope of his blog post is around different business models for a startup and that is not my expertise and I don’t have any particular opinion around that although I tend to agree and it was interesting to read his reflections of how other companies such as Oracle (yet another competitor to Mark Logic and xDb) have approached this.

Instead my thinking is based around his analysis of the product that Palantir offers and how that technology relates to other technology. I think most people (including Kellogg) mainly view Palantir as a visualisation tool because you see all these nice graphs, bars, timelines and maps displaying information. What they tend to forget is that there is huge difference between a tool that ONLY do visualisation and one that actually let you modify the data (actually modifying contextual data around them such as metadata and relations) within those perspectives. There are many different tools around Social Network Analysis for instance. However, many of them assumes that you already have databases full of data just waiting to be visualised and explored. Nothing new here. This is also what many people use Business Intelligence toolkits for. Accessing data in warehouses that is already their, although the effort of getting there from transactions oriented systems (like in retail) is not small in any way. However, the analyst using these visualisation-heavy toolkits access data read-only and only adds analysis of data already structured.

Here is why Palantir is different. It provides access to raw data such as police reports, military reports, open source data. Most of it in unstructured or semi-structured form. When it comes into the system it is not viewable in all these fancy visualisation windows Palantir has. Instead, the whole system rests on a collaborative process where people perform basic analysis which includes manual annotations of words in reports. This digital marker pen allows users to create database objects or connect to existing ones. Sure this is supported by automatic features such as entity extraction but if you care about data quality you do not dare to put them in automatic mode. After all this is done you can start exploring the annotated data and linkages between objects.

However, I do agree with Dave Kellogg that if people think BI is hard, this is harder. The main reason is that you have to have a method or process to do this kind of work. There are no free lunches – no point of dreaming about full automation here. And people need training and mindset to be able to work efficiently. Having played around with TIBCO Spotfire lately I feel that there is a choice between integrated solutions like Palantir which has features from many software areas (BI, GIS, ECM, Search etc) or using dedicated toolkits with your own integration. Powerful BI with data mining is best done in BI-systems whereas they probably never will provide the integration between features that vendors like Palantir offers. An open architecture based on SOA can probably make integration in many ways easier.

Why iPhone OS (iPad) is ECM…

I like Twitter. It exposes me for a lot of interesting thoughts from interesting and smart people that I follow. Today I read a post called  Why the iPad Matters – Its the Beginning of the End by Carl Frappaolo. It talkes a lot of why the iPad brings a new promise for content delivery – a complete digital chain. It made me think about one of the things which is unique with the iPod/iPhone/iPad – it is the lack of a folder-based file system exposed to users. Surprisingly (maybe) it is the lack of it that makes the whole user experience much better.

So how does this relate to ECM then? Well, I guess many of us ECM-evangelists (or “Ninjas” I heard today) have been in endless meetings and briefings explaining the value of metadata and the whole “context-infrastructure” around each object in an ECM-system that can hold fine-grained permissions, lifecycles, processess, renditions and so forth. I have even found myself explaining the ECM concept using the iTunes as an analogue. You tag the songs with metadata and access them through playlists which is in essence virtual folders where each song can be viewable in many playlists. That is the same concept as the “Show in folder” flag in Documentum. Metadata can even power Smart Playlists which in essence is just a saved search query – something we have added as a customization in Documentum Digital Asset Manager (DAM). So in essence the iTunes Library (should be call it a repository 🙂 is a lightversion of an ECM-system. Before continuing I really wonder why I have to customize Documentum to get the GUI-features that iTunes provide…?

So iTunes abstracts away the folder-based file system on a Mac or Windows PC but as long as you are using Mac OS X or Windows the file system is still there right. Some people even get really frustrated by iTunes and just can’t get around their head that there is no need to move files around manually when synching them to iPhone OS-powered devices. And here comes the beauty, in these devices there are no folder-based file system to access. Just the iPod App for music, the Photos App for photos and so forth. All your content is suddenly displayed in context and filtered out based on metadata and that App’s specific usage.

To some degree that means that the whole concept of iPhone OS-based devices not only can make content delivery digital but it can provide a much better user interface that is powered by all these ECM-features that we love (and have a hard time explaining). Suddenly we have an information flow entirely based on metadata instead of folder names and file names. Maybe that will make ECM not only fun but also able to much more quickly explain the dreaded “What’s in it for me question?”.

Now, can someone quickly write an iPad App for Documentum so I can make my point 🙂 It will be a killer app, believe me!

CPU, Cores and software licenses

In an article in ComputerWorld there is a good discussion around license models for different software vendors. There seem to be a mix of per socket pricing and some notion of a CPU and that each CPU corresponds to a number of processor cores. In EMC:s case for instance a CPU-license corresponds to 2 Cores and Oracle has a similar model. The number of processor cores is steadily increasing and soon it will be common with 6-8 cores per socket on server hardware. I agree with the article that that these models need some kind of revision. This is especially true if you sign longer contracts where this development can lead to some interesting issues. Server hardware need to be replace sooner or later because of power, storage or just performance reasons. It is not uncommon that the idea is to get fewer but more powerful servers in order to save power and cooling.

The interesting effect then is even if you can consolidate software applications on fewer hardware they each overstep their licenses in terms of server cores. What about virtualisation then? Well, that is of course also the future so power can be load-balanced between applications more easily. However, that means that the license model must allow for using virtualisation to throttle down to any number of cores per licensed application. In Oracle’s case again that usually means a requirement to run their own virtualisation product even if you have a VMWare investment.