Category: Search Technologies

Getting social with EMC Documentum?

EMC World 2014 and the sub conference for EMC:s Information Intelligence Group called Momemtum is hours from kicking off here in Las Vegas. In the light of last week’s announcement from Cisco that it is dropping their WebEx Social (formerly Cisco Quad) product and instead partnering with Jive Software make me want to examine this space in relation to EMC Documentum. We are all talking Enterprise Social or Enterprise 2.0 of course and when it comes to Enterprise Content Management systems that is something we like to see as a social layer on top of established technologies to store unstructured information that makes interaction and collaboration more seamless.

What is this enterprise social thing then?

I would argue that what we need is support for asynchronous or non-realtime collaboration added to the features we can offer enterprise users. However, not only offering that as some separate silo of information often run in the cloud but as an integrated solution (with an on-premise option for some of us) where we can collaborate asynchronously around documents that we already have stored in ECM-systems like EMC Documentum.

  • Enterprise Social = Asynchronous collaboration = Non-realtime collaboration

Why does it matters for enterprise content management?

ECM is all about storing unstructured information in a way that provides context around it so it can be used efficiently in our business processes. That context includes metadata based on various taxonomies, security, lifecyles, workflow capabilities, alternative formats but also extracting information from the content that allows for “discovered metadata”, classification and of course search. To me Enterprise Social is just about adding an additional set of context around information based on people collaborating around it not seldom without any formalized workflow support. That context is then so helpful in providing additional perspectives or views on the information based on what people have thought if it rather than what it is actually containing.

What kind of feature are we talking about then?

The social features that we usually expect when it comes to enterprise social applications are not just the pieces of added collaborative context but also the visualization views based on those attributes which includes:

  • Comments
  • Sharing (links to content in ECM or general links which is social bookmarking in a sense)
  • Likes
  • Favorites
  • Tags (ad hoc metadata)
  • Tag clouds
  • Status updates
  • Questions/answers
  • Wikis
  • Blogs
  • @messaging
  • Private messaging (hence replacing email)
  • Document collections (again based on links to ECM)
  • Activity Streams (both based on social actions but also from ECM like recently uploaded documents, workflow tasks, versioned documents etc)
  • Ideas and Thanks
  • Related content (others have also viewed or similar content to this)

What about synchronous or real-time collaboration?

Having some kind of asynchronous collaboration features in place of course make it natural to include integration to realtime collaboration support with features like:

  • Presence
  • Text chat
  • Audio chat (or VOIP)
  • Video chat
  • Desktop sharing
  • Meeting rooms with text, audio, video, presentation sharing

An ability to provide a seamless integration from seeing a comment around a document from a specific user, noticing the green presence icon next to the name and seamlessly launch a text or audio chat is of course powerful an helpful instead of relying in three different applications to do the same (ECM, Social and chat).

What have EMC Doumentum done in the social area before then?

Quite a lot to be honest but also to some degree to much in terms of not having a coordinated product effort over the years. First of all eRoom was/is a product based on just that, collaboration. It provided a way to set up spaces for collaboration often used for projects. After it was bought an integration effort was made to make eRoom use the same repository technology as the rest of the Documentum stack which made a lot of sense for those wanting to eliminate the information silo. A popular feature in that also included data tables even though their social nature is not that obvious but providing “basic Excel” on a web page is a useful feature.

Second, application-based on Web Development Kit (WDK) and especially Digital Asset Manager included a fairly cool set of additions called Documentum Collaborative Services. What it did was to provide that asynchronous collaboration layer inside both the user interface and in the information model of Content Server. It included features like:

  • Rooms (basically supercharged folders)
  • Rich media formatted texts that appeared as “banners” on top of every folder view (could be used to explain the purpose of the folder)
  • Discussion threads
  • Data tables
  • Calendars

After that came the first true effort to provide a modern social layer for Documentum in its Centerstage client which provided a modern AJAX-based interface with features like:

  • comments
  • tags
  • tag clouds
  • activity streams
  • new way of previewing content inspired by coverflow in iTunes.
  • wikis
  • blogs
  • discussions

In addition to that it also feature a new search capability based on Xplore and a nice facetting based on discovered metadata. From a technology standpoint so called Centerstage spaces included document libraries, discussion components which placed in a Spaces cabinet which in other clients looked like folders. Even more powerful these collaboration objects could be set up as a result from a an activity in a process.

Finally we are now in the era of the next-generation client called Documentum D2 with an even more modern architecture with configurable workspaces that can be set up differently for different groups of people. It actually includes some social capabilities like being able to view collaborative workspaces created in Centerstage but also doing comments and favorites on pieces of content. Another collaborative feature is support for annotation on/in documents which are even more enhanced if using any of the third party provided viewers that work with Documentum D2. Finally the feature which allows you to subscribe to changes to a document is back again (was available in Webtop/DAM) which actually is a powerful collaboration feature.

So there have been a fair set of collaborative features in the Documentum product over the years. The catch is however, that not all are available today in the current set of products.

What about Documentum and other provides of social platforms?

In May 2011 there was an announcement from EMC and Cisco about teaming up around social features by providing an integration between Documentum and what was then called Cisco Quad. In essence it meant being able to connect to documentum to document library sets within Cisco Quad using the CMIS interface. Cisco Quad then provided seamless integration to Cisco’s support for real-time collaboration such as WebEx Meeting and Jabber.

Around the same time it was announced that VMWare acquired the microblogging platform called Socialcast which provides an internal activity streams to customers but without features likes wikis and blogs. We have also seen a few official signs of that Socialcast technology is making its way into the platform and even seen prototypes of Socialcast-power activity streams within Documentum D2. CMS-wire have in fact publicly requested Chairman Joe Tucci to give Socialcast to the Information Intelligence Group which has been discussed again this weekend in a new article from CMS-Wire.

Finally it is worth mentioning that EMC the company is using social software from Jive Software to power its own socially-enables community called ECN which means that experience around that product is within EMC, although not necessarily in the Information Intelligence Group where Documentum belong. The hard part is to be able to both surface Documentum content in Jive and provide social interface components in D2 and xCP.

What is IBM and Alfresco doing then?

I spent last week at IBM Impact 2014 and had the opportunity of speaking with some executives around both their Enterprise Content Management software based on IBM Filenet and their collaboration software IBM Connections and IBM Sametime.  They currently provide an integration between both their Content Navigator product and Case Manager where social features like comments can be made on content stored in Filenet. There is also presence and chat integration from Content Navigator to Sametime as well as a possibility to go to Sametime Meetings directly from Connections. A lot of the integration is based on CMIS but there is of course possible to use REST-integration for all these three clients. I guess that is also the interesting observation that even if the integration exists there are still three (or four if you count Case Manager) web interfaces for ECM, Social and Real-time collaboration.

Since Cisco is now abandoning WebEx Social an integration with a product like IBM Connections (and therefore also IBM Sametime) make a lot of sense if this thing with VMWare Socialcast does not plays off. All these have REST-interfaces which would make integration feasible.

I guess I have to mention Alfresco as well here since it has a tendency to surface as an alternative to Documentum and Filenet from time to time. Alfresco is sometime marketed as Social ECM and the community edition offer a basic set of social features on top of the documents. You can favorite, like and comment (without notifications) and there is an activity stream to let you see what has happened in sites you are a member of. In addition to that each site contains support for discussions, wikis, bookmarks and data lists (similar to data tables). Finally you can do “Dropbox”-like sharing of content but their integration with Desktop and mobile apps are nowhere near where EMC has come with syncplicity. A decent benchmark for social features in ECM but also not a dedicated Social interface like Socialcast, WebEx Social, Connections and Jive. Also no integration with enterprise real-time collaboration tools like WebEx, Lync and Sametime. Still Alfresco seem like things you use if you can leverage other web/cloud-based services.

EMC World 2013 & Momentum 12: EMC Documentum Roadmap Session

Presented by Patrick Walsh, Principle Product Manager Documentum Platform and Aaron Aubrecht VP Product Management & XPO, IIG.

This session will focus on the core platform. Last year they tried to fit in everything and ran 20 min over time and half of the slides left unshown.

Talks about the need for IT to deliver business capability not just applying patching to Documentum. My personal reflection is that many IT-shops do not have a business perspective today. Maybe because they have budget efficiency requirements on them making reducing costs the main priority.

Few people in the room had actually upgraded to Documentum 7 and that is a problem that to much modern capability is left unused in many organisations. That is why the separation of upgrades to platform and clients is pushed now.

What’s new in Documentum 7

So repeating the same message of what is new in D7. Performance improvements due to intelligent session management (ISM) and type caching. ISM reduces memory usage up to 65 % by multiplexing communications between application server, content server and the database. Similar memory usage improvements with type caching.

Talks about xMS with automated deployment of a new D7 and xCP 2.0 stack for private VMWare private cloud environment. Deployment down to hours via XML-based blueprints describing the deployment parameters. Includes embedded deployment and configuration of Hyperic agents. We have yet to try this but I really hope that the blueprints represent a best practice starting point to develop our own blueprints.

Also improved content intelligence with xPlore 1.3. Includes large file support through partial indexing, content classification inline, added date-range search capability & metadata and of course the recommendation engine. It also features ad scriptable command line interface for automation and you can control xPlore from third party tools via Admin UI.

Crypto algorithms switched from DDS to AES which seem about time! Leads also to improved performance.

Finally the EMC Syncplicity Connector for Documentum which allows for external sharing of information with security enforced at the endpoint.

What’s next for Documentum 7.1

Will come in Q4 2013. Full minor version.

Expanded infrastructure certifications:

  • Solaris 11 (with Oracle 11g R2)
  • AIX 7.1 TL2 (with DB2 Enterprise 9.7 FP7)
  • Windows Server 2012 (with SQL Server 2012 and Oracle 11g R2)
  • WebSphere 8.5 is supported in D7.1 while D7 supported alongside tcFabric App Server, Tomcat 7 and Oracle Weblogic
  • RHEL 6.x, x64 in D7.1, Native 64-bit, multithreaded architecture- Intelligence Session Management & Type Caching.

xMS 1.1 is coming in D7.1

Smarter deployments (Automatic discovery of services and componetns for existing environments). Orchestration for externally managed VMs or physical hosts. Clustering support with HA and load balancing.

Web administration UI is coming with automated software patching.

Documentum REST Services API (Q3 2013):

  • Standards based
  • Consumer-agnostic
  • Mobile friendly
  • Everything is a resources
  • Scalability

Enhanced trust and security. Continue to harden Documentum.

  • Stronger authentication security. Non-anonymous SSL.
  • Authentication plug-in for Jasig Central Authentication Service (CAS)
  • SSL Option for internal JMS – Content Server Communication

Great to see CAS support coming.

xPlore 1.4 is coming with faster response times for large result sets, improved diagnostics and automation for easier deployment.

Upgrading to Documentum 7

Talks about strategy forwards. Wants to reduce the number of configurations to test code against. Expect a narrower set of combinations of operating systems, app servers and databases.

Talks about the possibility to move upgrades for platform and clients separately.

The Enterprise Migration Appliance (EMA) is a response to the fact that migrations are complex projects. Happens not on the API level but on the database level. Traditional API-based methods are It is a virtual appliance with a complete server running on vSphere/ESXi environment. Also promotes migration solutions from both fme and euroscript.

There is an EMC Documentum 7.0 Rapid Success Program. To register go to: http://bit.ly/D70RSP by May 17, 2013.

Vision for the Documentum platform:

Best in-class ECM:

  • VIPR integration
  • Rapid content access through addressable caching

Trusted Platform

  • Mobile SSO via SAML and OAuth
  • Federated Identity Management and Dynamic User Enrollment for virtual trust zones

As much cloud as you need:

  • Dynamic Scailing with xMS
  • Cloud-based performance management and monitoring
  • Content contribution and bi-directional sync with Syncplicity
Enhanced by Zemanta

At the Apache Lucene Eurocon in Prague

Today I am in Prague to attend the Apache Lucene Eurocon conference hosted by Lucid Imagination, the commercial company behind the Lucene/Solr -project. It seem to almost 200 people here attending the conference. I am looking forward to speak tomorrow of Friday afternoon about our experiences of using search in general and with Solr in particular. The main part of our integration is that we have integrated Solr with a connector that listens to a queue on our Enterprise Service Bus which is a part of Oracle SOA Suite 10g (the Oracle Middleware platform).

EMC World 2010: Chiming in with Word of Pie about the future of Documentum

We have got a written reaction to Mark Lewis’ keynote held at EMC World 2010 in Boston. I both feel and have the passion around Enterprise Content Management and it is great that Laurence Hart spent so much time and effort on talking to people to craft this post. Someone need to say things even if they are not always easy to hear. So I will try to not repeat what he said in this blog post but rather try to provide my perspective which comes from what I have learned about Information and Knowledge Management over the past years. ECM and Documentum is a very critical component to move that IKM vision from the Powerpoint stage into reality. In our case an experimentation platform that allows to put our ideas to improve the “business” of staff work in a large military HQ into something people can try, learn and be inspired from. Also, this turned out to be a long blog post which calls for an summary on top:

The Executive Summary (or message to EMC IIG) of this blog post:

  • Good name change but make sure You live up to your name.
  • A greater degree of agility is very much needed but do not simplify the platform so much that implementing an ECM-strategy is impossible.
  • Case Management is not the umbrella term, it is just one of many solutions on top of Documentum xCP
  • The whole web has gone Social Media and Rich Media. The Enterprise is next. Develop what You have and stay relevant in the 2010-ies!
  • Be more precise when it comes to the term “collaboration”. There is a whole spectrum to support here.
  • Be more bold and tell people that Documentum offers an unique architectural approach to informtion management – stop comparing clients.
  • Tell people that enabling Rich Media, Case Management, E 2.0 and (Team) Collaboration on one platform is both important and possible.
  • I am repeating myself here: You want to sell storage, right? Make sure Video Management is really good in Documentum!

The name change

Before I start I just need to reflect on the name change from Content Management and Archiving into Information Intelligence Group (IIG). I agree with Pie…the had to be changed to make it more relevant in 2010 and a focus on information (as in information management which is more than storage ILM) is the right way to go. The intelligence part of it is of course a bit fun because of my own profession but still it implies doing smart things with information and that should include everything from building context with Enterprise 2.0 features to advanced Content and Information Analytics. You have the repository to store all of that – now make sure you continue to invest in analytics engine to generate structure and visualisation toolkit to make use of all the metadata and audit trails. Maybe do something with TIBCO Spotfire.

Documentum xCP – lowering the threshold and creating a more agile platform

Great. Documentum needs to be easier to deploy, configure and monitored. Needed to get know customers on board easier and make existing ones be able to do smarter things with it in less time. However, it is easy to fall into the trap of simplifying things to much here. To me there is nothing simple around implementing Enterprise Content Management (ECM) as a concept and as a method in an organization. One major problem with Sharepoint and other solutions is that they are way to easy to install so people actually are fooled into skipping the THINKING part of implementing ECM and think it is just “next-next-finish”. All ECM-systems needs to be configured and adapted to fit the business needs of the organisation. Without that they will fail. xCP can offer a way to do that vital configuration (preceeded by THINKING) a lot more easier and also more often. We often stress how it is important to have the technical configuration move as close to any changes in Standard Operating Procedures (SOP) as possible. If Generals want to change the way they work and the software does not support it they will move away from using the software. Agility is the key.

In our vision the datamodel needs to be much more agile. Value lists need to updated often – sometimes based on ad hoc folksonomy tagging. Monitoring of the use of metadata and tags will drive that. Attributes or even object types need to be updated more often. Content need to be ingested quickly while providing structure later on (think XML Store with new schemas here). xCP is therefore a welcome thing but make sure it does not compromise the core of what makes Documentum unique today.

The whole Case Management thing

Probably the thing that most of us reacted against in the Mark Lewis Keynote was the notion that ECM-people in reality just have done Case Management all the time. I recently spend some time reflecting on that in another blog post here called “Can BPM meet Enterprise 2.0 over Adaptive Case Management?“. There is clearly a continuum here between supporting very formal process flows and very ad-hoc Knowledge Worker-style work. They clearly seem different and while they likely meet over Adaptive Case Management but to me it makes no sense to have that term cover the whole spectrum – even for EMC Marketing 🙂

I immediately saw that Public Sector Investigative work is often used as an example of Case Management. Case Management in especially done by law enforcement agencies is fundamentally different from work done by Intelligence Agencies because in Case-based Police investigations there is usually some legal requirement to NOT share information between cases unless authorised by managers. This is of not the case (!) for all Case Management applications but from a cultural perspective it is important that Case Management-work by the Police is not a line of business that should be used as an example of information sharing. It is even so that the underlying concept actually is at ends with any concept of unified enterprise content management strategy where information should be shared. That is why workgroup-oriented tools such as i2 Analyst’s Workstation have become so popular there.

The point here is that it is important to not disable sharing in the architectural level because again it is what constitutes a good ECM-system that content can be managed in a unified way. Don’t be fooled by requirements for that – use the powerful security model to make it possible. Then Law Enforcement Agencies can use it as well. However, there must be more to ECM than Case Management – as Word of Pie suggests it is just ONE of many solutions on top of the Documentum xCP platform. A platform which is agile enough to quickly build advanced solutions for ECM on top.

Collaboration vs Sharing and E.20

So, Collaboration is used everywhere now but the real meaning with it actually varies a bit. First there are two kind of collaboration modes:

  • Synchronous (real-time)
  • Asynchronous (non-real time – “leave info and pick up later)

Obviously neither Documentum nor Sharepoint is in real-time part of the business. For that you will need Lotus Sametime, Office Communications Server, Adobe Connect Pro or similar products. However, Google Wave provides a bit of confusion here since it integrates instant messaging and collaborative document editing/writing.

However, I am bit bothered by the casual notion of anything as a collaboration tool like Sharepoint and for that sake eRoom is getting. To further break this down I believe there is a directness factor in collaboration. Team collaboration has a lot of directness where you collaborate along a given task with collegues. That is not the same as many of the Social Media/Enterprise 2.0 features which does not have a clear recipient of the thing you are sharing. And sharing is the key since you basically are providing a piece of information in case anyone wants/needs it. That is fundamentally different from sending an email to project members or uploading the latest revision to the project’s space. Andrew McAffe has written about this concept and uses the concept of a bullseye representing strong and weak ties to illustrate this effect.

My point is that it is important that tools for team collaborations from an information architecture standpoint can become part of the more weaker indirect sharing concept. That is the vehicle to utilze the Enterprise 2.0 effect in a large enterprise. Otherwise we have just created another set of stove-pipes or bubbles of information that is restricted to team members. I am not saying that all information should be this transparent but I will argue that based on a “responsibility to provide”-concept (see US Intel Community Information Sharing Policy) restricting that sharing of information should be exception – not the norm.

Sure as Word of Pie points out in his article “CenterStage, the Latest ex-Collaboration Tool from EMC” there are definitely things missing from the current Centerstage release compared to both Sharepoint and EMC’s old tool eRoom. However, as Andrew Goodale points out in the comments I also think it is a bit unfair because both eRoom and at least previous versions of Sharepoint (which many are using) actually lacks all these important social media features that serves to lower the threshold and increase participation by users. They also provide critical new context around the information objects that was not available before in DAM, WebTop or Taskspace. Centerstage also provides a way to consume them in terms of activity streams, RSS-feeds and faceted search. Remember that Centerstage is the only way to surface those facets from Documentum Search Server today.

So, I am also a bit disappointed that things are missing in Centerstage that should be there and I also really want to stress the importance of putting resources into that development. Those features in there are critical for implementing all serious implementations of an ECM-strategy and the power of Documentum is that they all sits in the same repository architecture with a service layer to access them. Maybe partner with Socialcast to provide a best practice implementation to support a more extensive profile page and microblogging. Choose a partner for Instant Messaging in order to connect the real-time part of collaboration into the platform. Again, use your experience from records management and retention policies to make those real-time collaboration activities saved and managed in the repository.

Be bold enough to say you are an Sharepoint alternative – but for the right reasons

I’m not an IT-person, I come into this business with a vision change the way a military HQ handles information so I see Enterprise Content Management more as a concept than a technology platform. However, when I have tried to execute our vision it becomes very clear that there is a difference between technology vendors and I like to think that difference comes from internal culture, experience, and vision of the company. It is the “why” behind why the platform looks like it does and has the features it has. So as long you are not building everything from scratch for yourself it actually matters a lot which company you chose to deliver the platform to make your ECM vision happen. That means that there IS a difference between Documentum and Sharepoint in the way the platform works and we need to be able to talk about that. However, what I see now is that most people focus on the client side of it and try to embrace it is a popular collaboration tool. Note that I say tool – not platform. All those focuses on the client side of it where the simplified requirement is basically a need for a digital space to share some documents in. However, the differentiator is not whether Centerstage or Sharepoint meets that requirement – both do. The differentiator is whether you have a conceptual vision on how to manage the sum of all information that an organization have and to what degree those concepts can be implemented in technology. That is where the Documentum platform is different from other vendors and why it is different from Sharepoint. Sharepoint is sometimes a little bit to easy to get started with which unfortunately means there is no ECM-strategy behind the implementation and when the organisation have thousands of Sharepoint sites (silos) after a year or so that is when that choice of platform really starts to differ.

This week at EMC World has been a great one as usual and there is no shortage of brilliant technical skills and development of features in the platform. What I guess bothers me and some other passionate ECM/Documentum-people is the message coming out from the executive level at IIG. In the end, that is where the strategic resource decision are made and where the marketing message being constructed. I think now there is a lot more to do on the vision and marketing level than actually needs to be done on the platform itself. The hard part seem to be proud of what the platform is today, realize it’s potential to remain the most capable and advanced on the market and use that to stay relevant in many applications of ECM – not just Case Management.

Rich Media – A lot of content to manage and storage to sell

One of the strong points of Documentum is that it can manage ALL kind of content in a good way and that includes of course rich media assets such as photos, videos and audio files. Don’t look upon this as some kind of specialised market only needed by traditional “creative” markets. This is something everybody needs now. All companiens (and military units for that sake) have an abundance of digital still and video cameras where a massive amount of content needs to be managed just as all the rest of the content. There is a need for platform technologies that actually “understands” that content and can extract metadata from it so that this content can be navigated and found easily. It is also important to assist users in repurposing this content so it can be displayed easily without consuming all bandwith and also easily be included in presentations and other documents. This is also very much relevant from a training and learning perspective where screencams and recorded presentations has so much potential. It does not have to be a full Learning Management System but at least an easy way to provide it. Maybe have a look at your dear friend Cisco and their Show and Share application. Oh, it is marketed as a Social Video System – the connections to Centerstage (and not just MediaWorkspace) is a bit too obvious. Make sure you can provide Flickr and Youtube for the Enterprise real soon. People will love it. Again, on one very capable platform.

Media Workspace is a really cool application now. Even if it does not have all the features of DAM yet (either) it is such a sexy interface on Documentum. The new capabilites of handling presentations and video are just great. Be sure to look more at Apple iPhoto and learn how to leverage (and create) metadata to support management of content based on locations, people and events. A piece of cake on top of a Documentum repository. Now it is a bit stuck in the Cabinet/Folder hierarchy as the main browsing interface.

Summary

I agree with Word of Pie that there is a lack of vision – an engaging one that we all can buy into and sell back home to our management. In my project we seem to have such a vision and for us Documentum is a key part of that. I just hoped that EMC IIG would share that to a greater degree. From our responses back home in Sweden and here at EMC World people seem to both want and like it (have a look at my EMC World presentation and see what you think). We can do seriously cool and fun stuff that will make management of content so much more efficient which should be of critical importance for every organisation today. At least in the military one thing is for sure and that is that we won’t get more people. We really have to work smarter and that is what a vision like this will provide a roadmap towards.

So be proud of what you do best EMC IIG and make sure to deliver INTEGRATED solutions on top of that. For those who care that will mean a world of difference in the long run and will gather looks of envy for those who did not get it.

The Long Tail of Enterprise Content Management

Question: Can we expect a much larger amount of the available content to be consumed or used by at least a few people in the organisations?

Shifting focus from bestsellers to niche markets
In 2006 the editior-in-chief of Wired magazine Chris Andersson published his book called ”The Long Tail – Why the Future of Business is Selling Less of More”. Maybe even the text printed on the top of the cover saying ”How Endless Choice is Creating Unlimted Demand” is the best summary of the book. This might have been said many times before but I felt a strong need to put my reflections into text after reading this book. It put a vital piece of the puzzle in place when seeing the connections to our efforts to implement Enterprise 2.0 within an ECM-context.

Basically Chris Andersson sets out to explain why companies like Amazon, Netflix, Apple iTunes and several others make a lot of money in selling small amounts of a very large set of products. It turns out that out of even millions of songs/books/movies nearly all of them are rented or bought at least once. What makes this possible is comprised out of these things:

Production is democratized which means that the tools and means to produce songs, books and movies is available to almost everybody at a relatively low lost.
– Demoractization of distribution where companies can broker large amount of digital content because there is a very low cost for having a large stock of digital content compared to real products on real shelves in real warehouses.
– Connecting supply and demand so that all this created content meets its potential buyers and the tools for that is search functions, rankings and collaborative reviews.

What this effectivly means is that the hit-culture where everything is focused on a small set of bestsellers is replaced with vast amounts of small niches. That has probably an effect of the society as a whole since the time where a significant amount of the population where exposed to the same thing at the same time is over. That is also reflected in the explosion of the number of specialised TV-channels and TV/video-on-demand services that lets views choose not only which show to watch but also when to watch it.

Early Knowledge Management and the rise of Web 2.0
Back in the late 90-ies Knowledge Management efforts thrived with great aspirations of taking a grip of the knowledge assets of companies and organisations. Although there are many views and definitions of Knowledge Management many of them focused on increasing the capture of knowledge and that the application of that captured knowledge would lead to better efficiency and better business. However, partly because of technical immaturity many of these projects did not reach its ambitous goals.

Five or six years later the landscape has changed completely on the web with the rise of Youtube, Flickr, Google, FaceBook and many other Web 2.0 services. They provided a radically lowered threshold to contribute information and the whole web changed from a focus on consuming information to producing and contributing information. This was in fact just democratization of production but in this case not only products to sell but information of all kind.

Using the large-scale hubs of Youtube, Flickr and Facebook the distribution aspect of the Long Tail was covered since all this new content also was spread in clever ways to friends in our networks or too niche ”consumers” finding info based on tagging and recommendations. Maybe the my friend network in Facebook in essence is a represention of a small niche market who is interested in following what I am contributing (doing).

Social media goes Enterprise
When this effect started spreading beyond the public internet into the corporate network the term Enterprise 2.0 was coined by Andrew McAfee. Inside the enterprise people where starting to share information on a much wider scale than before and in some aspects made the old KM-dreams finally come into being. This time not because of formal management plans but more based on social factors and networking that really inspired people to contribute.

From an Enterprise Content Management perspective this also means that if we can put all this social interaction and generated content on top of an ECM-infrastructure we can achieve far more than just supporting formal workflows, records management and retention demands. The ECM-repository has a possibility to become the backbone to provide all kind of captured knowledge within the enterprise.

The interesting question is if this also marks a cultural change in what types of information that people devoted their attention to. One could argue that traditional ECM-systems provide more of a limited ”hit-oriented” consumption of information. The abscense of good search interfaces, recommendation engines and collaboration probably left most of the information unseen.

Implications for Enterprise Content Management
The social features in Enterprise 2.0 changes all that. Suddenly the same effect on exposure can be seen on enterprise content just as we have seen it on consumer goods. There is no shortage of storage space today. The amount of objects stored is already large but will increase a lot since it is so much easier to contribute. Social features allows exposure of things that have linkages to interests, competencies and networks instead of what the management wants to push. People interested in learning have somewhere to go even for niche interests and those wanting to share can get affirmations when their content is read and commented by others even if it is a small number. Advanced searching and exploitation of social and content analytics can create personalised mashup portals and push notifcations of interesting conent or people.

Could this long tail effect possibly have a difference on the whole knowledge management perspective? This time not from the management aspect of it but rather the learning aspect of it. Can we expect a much larger amount of the available content to be consumed or used by at least a few people in the organisations? Large organisations have a fairly large number or roles and responsibilities to there must reasonably be a great difference in what information they need and with whom they need to share information with. The Long Tail effect in ECM-terms could be a way to illustrate how a much larger percentage of the enterprise content is used and reused. It is not necessarily so that more informtion is better but this can mean more of the right information to more of the right people. Add to that the creative effect of being constantly stimulated by ideas and reflections from others around you and it could be a winning concept.

Sources

Andersson, Chris, ”The Long Tail – Why the Future of Business is Selling Less of More”, 2006
Koernan, Brendan I, ”Driven by Distraction – How Twitter and Facebook make us more productive workers” in Wired Magazine March 20

EMC World 2009: Enterprise Search Server (ESS)

To me one of the biggest news delivered during the conference was the new generation of Documentum full text indexing called the Enterprise Serch Server (ESS). This marks the first official message that EMC Documentum will move away from the OEM-version of FAST ESP which has been in use since Documentum 5.3 (2005). The inclusion of FAST back then meant that Documentum got a solution where metadata from the relational database where merged with text from the content file into an XML-file (FTXML) that could be queried using DQL. Before diving into the features of the new technology I guess everyone wonders about the reason for this decision. The main reasons are said to be:

  • Performance. 1 FAST Full-text node supports up to around 20 Million objects in the repository (some customers commented that their experience were closer to 10 M…) and it requires in memory indices. With Documentum installations containing Billions of objects that means 100+ nodes and that has been a hard sell in terms of hardware requirements.
  • Virtualisation. Apparently talks with Microsoft/FAST about the requirement on supportin all Documentum products on VMWare made no progress. This has been a customer demand for some time. MS/FAST cites intensive I/O-demands as a reason why they where not interested in certifying the full-text index on virtualisation.
  • NAS-support.
  • More flexible High Availability (HA) options. Today FAST can be clustered by adding new nodes which leads to a requirement of having the same amount of nodes for backup/high availability.

From a performance stand-point I personally think that the current implementation of FAST lead to slow end-user experience when searching in Documentum. One reason for this is that a search is first triggered to FAST which then delivers a search result set irrespective of my permissions. Instead the whole result set must be filtered by quering it towards the relational database. That takes time. This is also a reason why we have integrated an external search engone based on the more modern FAST ESP 5.x server with Security Access Module which means that acl:s are indexed and filtering can be done in one step when searching in the external FAST Search Front-end (SFE). More about how that is solved in ESS later on.

From a business perspective EMC outlines these challenges they see a need to satisfy:

  • End users expect Google/Yahoo search paradigms
  • IT-managers want low cost, scalable, ease of deployment and easy admininstration.
  • Requirements for large scale, distributed deployments with multiingual support.
  • Enterprise requirements such as low cost HA, backup/restore and SAN/NAS-suppprt.

New new ESS is based on the xDb technology coming from the aquisition of the company X-hive and leveraging the open source full-text indexing technology in the Lucene project. The goal for ESS is to leverage the existing open indexing architecture in Documentum. The idea is both to create a solution that really scales but of course with some trade-offs when it comes to space vs query performance.

ESS supports structured and unstructed search by leveraging best of breeed XML Database and XQuery Standards. It is designed for Enterprise readiness, scalabiity, ingestion throughput and high quality of search as core features. It also provides Advanced Data Management (enables control where placement of data on disk is done) functionality necessary for large scale systems. The intention is to give EMC to continue to develop and provide new search features and functionality required by their customer base.

It is architected for greater scalability and gives smaller footprint than current Full-Text Search as well as scale both horisontally (more nodes) as vertically (more servers on the same node). It is designed to support tens to hundreds of millions of objects per node.

This allows for solutions such as Archiving where there can be Billion+ emails/documents while preserving the high quality of search while still achieving scale. The query response time can be throttled up or down based on needs – priority can be shifted between indexing and quering.

The installation procedure is also simplified and EMC promises that a two node deployment can be up and running in less than 20 minutes. The solution is also designed to easily allow to add new nodes to an installation.

ESS is much more than a simple replacement of the full-text engne. It will focus on deliver these additional features compared to existing solutions:
– Low cost HA (n+1 Server based)
– Disaster Recovery
– Data Mangement
– VMWare Support
– NAS Support
– New Administration Framework

The new admin features includes a new ESS Admin interface which has a look and feel very similar to CenterStage. Since the intention is to support ESS on non-Documentum installation it is a separate web client. The framwoork also supports Web Services, Java API, JMX and it is open for administration using OpenView, Tivoli, MMC etc.

The server consists of:

  • ESS API
  • Indexing Services will have document batching capability, callback support for searchable indication and a Content Processing Pipeline with text extraction and linguistic analysis via CPS.
  • Search Services. This will provide search for meta-data, content or both (XQuery based) as well as multiple search options such as batching, spooling, filters, language, analyser etc. It will return results in a XML format and provides term highlight, summary and relevancy. The thread execution management support multi-query and parallell query. It also includes low level security filtering.
  • Content Processing Services is responsible for language detection, text extraction and linguistic analysis. The CPS can be local or remote (co-located with content for improved performance). It will have a pluggable architecture to support various analysers and/or text extractors. It will include out of the box support for Basis RLP and Apache SnowBall analysers. However only one analyser can be configured per ESS. (My question: Can I have different analysers on different nodes?). Content Processing can be extended by plugins.
  • Node and Data Management Services is the primary interface for all data and node management within ESS. It provides ability to control routing of documents and placements of collections and indices on disk. It deals with index management and supports bind, detach, attach, merge, freeze, read-only etc.
  • Analytics includes API’s and Data model for logging, metrics and auditing, ingestion and search analysis and facet computation services.
  • Admin Services. The example shown was really powerfull very an admin could view all searches made by a user by time and see what time it took to first result set. The one with a longer time could be explored by viewing the query to analyse why it took so long.

Below that the xDB can be found and in the botton the Lucene indices. The whole solution is 100% Java and xDb stores XML Documents in a Persistend DOM formats and support XQuery and XPath. Indices conists of a combination of native B-tree indices + Lucene. The xDb supports single and multi-node architecture and has support for multi-statement transactions and full ACID support. In additon it supports XQFT (see introduction it here) which is a proposed standard extension to XQuery which includes:

  • LQL via a full text entension
  • Logical full-text operator
  • Wildcard option
  • Anyall options
  • Positional filters
  • Score variables

ESS includes native security which means that security is replicated into the search server and security filtering is done on a low level in the xDb database. This means effective searches on large result sets and enables facet computation on entire result sets.

Native facet computation is a key feature in ESS which is of course linked to the new search interface in CenterStage which is based on facets in an iTunes-like interface. Facets are of course nothing new but it is good that EMC has finally realised that it is a powerful but still easy way to give users “advanced search”.

ESS Leverages a Distributed Content Architecture (for instance using BOCS) by only sendning the raw text (DFTXML) over the network instead of the binary file which can be very much larger in many cases (such as big PowerPoint files). ESS also utilizes the new Content Processing Services (CPS) as well as ACS.

The new solutions also makes it possible to do hot backups without taking the index server down before as it is today. Backup and restore can be done on a sub-index level. The new options for High Availability include:

  • Active/active shared data (the only one available for FAST)
  • Active/passive with clusters
  • N+1 Server based

Things I like to see but have not heard yet:

  • Word frequency analysis (word clouds based on document content)
  • Clustering and categorisation (maybe done by Content Intelligence Services)
  • Synonym management
  • Query-expansion management
  • How document similarity is handled by vector-space search (I guess done by Lucene?)
  • Boosting & Blocking of specific content connected to a query
  • Multiple search-views (different settings for synonyms, boost&blocking etc)
  • Visualisation of entity extraction and other annotations
  • Functionality or at least an API to manually edit entity extraction within the index. Semi-automatic solutions are the best.
  • Freshness management.
  • Speech-to-text integration (maybe from Audio/Video Transformation Services)

Personally I think this is a much needed move to really improve the internal search in Documentum and make much better use of the underlying information infrastructure in Documentum. It will be interesting to see what effect this has on Microsoft/FAST ambitions to support the Documentum connector. Maybe the remaining resources (no OEM to develop) can focus on bringing the connector from an old 5.3 API to a modern 6.5 API. I still see a need for utilising multiple search engines but as ESS gains more advanced features the rationale for an expensive external solution can change. The beta for Content Intelligence Studio will be one important step in outlining the overall enterprise search architecture for big ECM-solutions. In this lies of course tracking what Autonomy brings to market in the near future.

Another thing worth mentioning is that I during the past four conferences have heard quite a few complaints about the stability of the current FAST-based full-text index. It crashes/stops reguarly and often without letting anybody knowing it before users start complaing about strange search results.

A public beta will be released in Q3 2009 and customers are invited to participate. Participants will recieve a piece of hardware with the ESS pre-installed and pre-configured and after a few configuration changes in Content Server it should be up an running.

Customers will have the option of upgrading existing FAST full-text index  or run the new ESS side-by-side with FAST. ECM will also market ESS for non-Documentum solutions.

Be sure to also read Word of Pie’s notes as well as my previous notes from FAST Forward 09 around the future of FAST ESP.

Where the FAST Enterprise Search Platform (ESP) is going now…

I have spent the last week in Las Vegas attending the FAST Forward 09 conference. About a year ago the Norvegian company FAST Search & Transfer was acquired by Microsoft and like me customers all over the world wonder what would happen. Some thought it was great to have a huge company with its R&D resources to take the platform forward while others like me feared a technology transition which would include cancelling support for other operating systems and integration with nothing but Microsoft technology.

It was very clear that the Microsoft Marketing department had a lot to say about the conference and what messages that were to be conveyed. Somewhere behind all that you could still see some of the old FAST mentality but it was really toned down. To me the conference was about convincing existing customers that MS is committed to Enterprise Search and to give Sharepoint customers some idea of what Enterprise Search is all about.

It is clear that the product line is diversifying in a common Microsoft strategy:

Solutions for Internet Business

  • FAST Search for Internet Business
  • FAST Search for Sharepoint Internet sites
  • FAST AdMomentum
  • Solutions for Business Productivity

  • FAST Search for Sharepoint
  • FAST Search for Internal Application
  • FAST Search for Sharepoint won’t be available until Office Wave 14 (incl Sharepoint) will be released so in the meantime there will be a product called FAST ESP for Sharepoint that can be used today and will have a license migration path towards FAST Search for Sharepoint. That product will have product license of aroudn 25 000 USD and then additional Client Access License (CAL) will follow in a standrad MS manner.

    So what does all of this means for us who like to see FAST ESP continue as an enterprise component in a heterogenous environment? Well, MS has commited to 10 years of support for current customers, I guess in a gesture towards those who are worried. Over and over again I heard representatives talking about how important those high-end installations on other operating systems are. The same message appeared when it came to connectors and integration with Enterprise Content Management systems like EMC Documentum. Still, most if not all demos was connected to Sharepoint and/or other MS-specific technologies.

    The technical roadmap means that the past year has been devoted in rewriting their next generation search platform from Java to .Net. The first product that will be released is the Content Integration Studio (CIS) which consist of Visual Studio (I guess earlier in Eclipse) component and a server-side execution engine. This will only be available on Windows since it is deeply connected to the .Net-environment. It looks like a promising product with support for flows instead of linear pipeline to handle the processing of information before it is handed of to the index engine. CIS therefore sits in-front of FAST ESP and a combination of actions in flow and in old pipelines can be executed. Information from CIS is written to the ESP which then creates the index and also processes queries to it.

    What I think we can expect is that new innovation is focused on creating a modular architecture where CIS is the first one. Features in ESP will the be gradually reengineered in a .Net-environment and thus creating a common search platform some years into the future. It will likely mean that we will still see one or two upgrades to the core ESP as we know it today to enable it to function together with the new components. Content Fusion will most likely be the next module that will extend ESP but on a .Net-architecture.

    When it comes to the presentation logic where we today have the FAST Search Front-End (SFE) we will see them either as Web parts for Sharepoint or as AJAX Aerogel from MS. These are currently developed using Javascript but will include Silverlight later on.

    These will initially be offered in both a IIS and a Tomcat flavour and possibly others if there is demand. They will intitially integrated with ESP and Unity and thus opening up for a new approach of developing a search experience on top of them.

    I general I don’t like the Microsoft approach of insisting of owning the whole technology stack by themselves and refusing to invest in other standards-based projects. Instead of developing their own AJAX libraries they could have used ExtJS or even Google Web Toolkit. While it is not open source MS argues that it is a very Permissive licence from MS that has many of the same qualities. A good thing is that MS was comitted to make sure that this framework works on all major browsers including FireFox, Safari and Chrome. It is interoperable with JQuery.

    In summary I think it is kind of a mixed experience. The new features being developed are truly needed to make FAST keep being one of the most advanced search engines available. I think many of the features look really promising and I can’t wait to get my hands on then. On the other hand it is clear that things are going proprietary (FAST ESP had a lot of open source in it), it is being aligned in a Microsoft stack and thus gradually minimizing options. That includes how new technologies are being implemented (MS-ones instead of open source), what operating systems it will run on and how the support for developing presenation logics look like. It means I have to have people how know both Java and .Net, both Flash and Silverlight (possibly JavaFx) and both ExtJS/GWT and MS AJAX/Aerogel.

    We are deeply invested in the EMC Documentum Platform and would of course like to continue use ESP as a way to add advanced capabilities and performance to our architecture. However, I think I will over time get sick and tired on Microsoft sales people trying to convince me to use Sharepoint instead of Documentum. For anybody who know how both platform work it is almost a joke but I will most likely have to keep explaining and explaining. I just hope that we can have decent connector developed for Documentum.

    Too read more you can go to the FAST Forward Blog which has many interviews, look at videos at the Microsoft Press Room and check out the chatter on ffc09 tagged tweets on Twitter. An finally here is what CMS Watch has to say about it.

    My speech at the Software Development Track at Momentum 08

    On Tuesday it was time for my speech at Momentum 08. It was a first for me at a conference of this kind. I found my room at the top of the conference and it was huge and had seats for around 200 people I would guess. The projector screen was of the back-projection kind and built into a wall with some elaborate theme in blue/gren and yellow details. There was a speakers booth but I almost immediately decided not to use it. I know my voice isn’t the strongest one so I was happy to see a wireless Sennheiser transmitter lying there on the booth. The computer was preloaded with my presentation so it was just to get going. Did not actually feel nervous this time and it feels great to be able to enjoy these situations now. I guess it helps that I am completely passionate about the subject and feel we are doing really cool stuff with Documentum.

    My speech started with outlining how Sweden have changed its focus from a neutral country preparing for a Soviet invasion to an expeditionary force taking part in multi-national peace support operations all over the world. Next I wanted to stress that our project is not driven by the IT-department but rather from the “business” side although the concept of that is a bit strange in the Armed Forces where actually not make money but anyway.

    Because of that we base our ideas and the architecture based on challenges at an operational level military HQ that we have identified and what we think needs improvement. The main ideas is to be able to put context around all the information that needs to be handled in the HQ and use that context to provide a flexible structure of all pieces of information. That structure allows us to handle a vast amount of information in “all” formats and be able to ensure traceability and reuse of it. To do that an advanced ECM-system like Documentum is crucial.

    So what have we done to do that then?

    Digital Asset Manager
    DAM 6.5 is our main client and we have made a couple of customizations to that. Some small ones include filtering away object types that users don’t need to see, putting a number in brackets on the Inbox in the left tree structure and to simplify setting object level security. The bigger ones include a new relationship object that extends dm_relation but with a lot more attributes which we expose in a new relationship dialogue. If there are relationships on objects that is also visible in the folder view and clicking on that link called “View Rel” allows the user to explore objects which are related, edit relationship properties and continue “surfing” the relationship paths. One big nuisance is that the current release of DAM has a bug (possibly related to the IP-rights integration) which has a significant impact on performance. At least ten times more queries than WebTop to view contents in a folder. Will be fixed with DAM 6.5 SP1 in January 2009.

    Documentum Reporting Services
    So far the information or content analytics in Documentum has not exposed in any obvious ways. However, thanks to the advanced repository all the information is available – it is just a matter of visualizing. Reports which uses Crystal Reports technology provides a really powerful way to create great-looking outputs based on queries to the repository. We use it to report on user behaviour which can tell us how users or groups interact with the content. Another set of reports focuses on the characteristics of the content such as what formats are used but also based on our custom attributes to see trends on how different kind of content is used and created. Finally, reports provide a way to create nice looking template-based outputs on the actual values of all the attributes. An object whose attributes represent organisations can through reports provide documents of all organisations in alphabetical order for instance. In discussions with EMC representatives I found out that information analytics is one of the areas that EMC is now investing a lot of resources into. Based on what we were talking about I really look forward to se the first outcomes from that effort. It will truly make the power of advanced ECM much more evident for everyday users.

    First impressions of Documentum Digital Asset Manager 6.5

    Before going on my planned sick-leave I played around with DAM 6.5 for a while. I will try to summarize a few reflections I have on this brand new release.

    Good things
    The interface have got yet another refresh but rather small modifications that I guess I won’t even notice in a couple of weeks. The biggest change is that some functions have got modal windows meaning that when you click on properties you no longer see the big full screen page but instead a new browser window that allows you to see where you where when you clicked. A great improvement I think. The import/export/check-in process also has small modal windows with a nice looking update progress bar.

    A thing that I just love is the new clusters/facets features which appears when performing a search. Your results can then be drilldowned based on user, topic, date and so forth. Will improve findability hugely. We had these installed in D6 SP0 but they did not work then and seem to be connected more closely to ECI Services back then.

    In general the interface is prettier and looks more distinct and modern. The icons have been slightly improved as well.
    Another small improvement is that attributes which have both value assistance (dropdowns) but also allows entering of an own value now have the correct width.

    I guess it is not really connected to this upgrade but I finally manage to find how one creates Presets (rules) for specific folders and users which was great. Look at the three structure in DA – not in the menu.

    Bad things
    The left tree structure has been cleaned up with clearer icons and the update is based on AJAX (or should see bad below). This works fine in Documentum Administrator 6.5 but for some reason they seemed to have missed something in compiling DAM because there is small refresh anyway when you click on a folder. Our partner suggest that they simply have inherited from the wrong WDK-class.

    Another interesting thing is that some features that are highly marketed at EMC World are turned off by default in the configuration files. Those include Deep Export and OLE-linking support (resolving links in Office documents and imports associated files if desired). That is rather strange I think since those are really handy features. The OLE-linking can also be toggled on/off in Preferences. The effect of that is that there was no folder export available at all which is fairly strange. We also had some issues with getting import of more than two folders working.

    We also have a an irritating issue around thumbnails. It seems that those can not be created for PDF-files at all which also means no storyboarding. When reading through the release notes this is noted as a known bug and it seems that despite our bug report from earlier this year nothing has been done to fix it. From a usability standpoint that is not so good.

    EMC Documentum CenterStage

    If you haven’t done it already I recommend a look at the site for the beta of EMC Documentum’s new web client called CenterStage. This modern Web 2.0 client has earlier been called both Magellan and IntelliSpace but EMC now seem to have settled on the name CenterStage. It is kind of funny because I associate CenterStage with a TV application for Mac OS X which is found at the CenterStage Project site. Anyway it is interesting to see how the interfaces and feature looks for the free CenterStage Essentials (included in any Content Server license) and the paid version called CenterStage Pro. I have long waited to for a good application that both can do “Facebook for the Enterprise” while still having all the features of an advanced and full-fledged Enterprise Content Management platform. This seem to be a big step towards that. The key thing is to be able to collaborate both around content (documents etc) but also around people, groups and projects. Although there are good collaboration platforms out there such as Clearspace which has some basic integration with Documentum it is still creating a lot of duplicate information in separate “stove-pipes”. I want the content objects found in the Clearspace platform stored in Documentum but this is not the case today. What we are looking at is being able to search Documentum content from Clearspace for the immediate future.

    Again, back to Centerstage I believe it will provide a lot of organisations with a client that will be a lot more intuitive and useful out of the box than we have ever seen from Documentum before. This is thanks to an ambitions usability project run by Gideon Ansell in the Documentum User experience group. However, after having had a looked at the project release matrix found in the beta community it looks like the beta of CenterStage essentials will not have enough features to be the flexible collaboration client I need. Those features will be added later on this year. We just have to wait for the full CenterStage Pro version I think. I also hope that the few missing pieces like a full fledged personal profile, expert location and integration with external presence/Instant Messaging systems will be on the schedule for the next update of it.

    This week I will be able to play with the Documentum 6.5 release for the first time since it is being installed by our Documentum partners at work. I especially look forward to see the new Digital Asset Manager (DAM) 6.5 client and TaskSpace 6.5. I also hope that I can get some further information about what release of the embedded FAST InStream Search engine is used in this release.