Reflections from Momentum 2011 Berlin

EMC (IIG) the company

  • A real tech company
  • Responsive employees
  • Easy to get access inside the company
  • Willing to share information
  • Sometimes hard to figure out ”who is who” in EMC Information Intelligence Group (IIG)

As a customer it is important how the company feels. My experience is that EMC is a company where you can find tech-savvy people who really like what they are doing. And they are good at it. The general experience is that employees are interested in listening to us and very responsive to our needs. It is easy to quickly get access to both key business people as well as people in engineering. On the other hand that is often required because the product is quite complicated. On the negative side the company is big and that means that things are not always coordinated and it can sometimes be difficult to figure out who is who among all the different product managers, general managers, solutions directors and architects.

EMC IIG seems open and transparent to me. Sure there are disclaimers but they are talking openly about most things and there is no NDA at the conference.

 

Strategy

I feel a big difference this year – maybe because I have been away for over a year due to my year at the National Defense College. The big difference is that EMC Information Intelligence Group finally seems to get it. For real. Away from the idea that Case Management is something different than Enterprise Content Management. A realization that nice-looking usable user interfaces is a key thing. Understanding that the cloud is a key component of EMC IIG future. Communicating the power of configuration instead of coding is the real power of xCP but not just the interfaces – the whole application. Finally working to get decent analytics to make use of the contextual information that already exists around objects in the repository. Somehow it feels like there is a new executive team in place that wants to be a little bit more bold and wants to move IIG in a certain direction.

EMC has made numerous acquisitions after it bought Documentum but now it feels like they are finding out that they all have lots of different pieces of technology within the company that together can be a bigger whole.

Working with EMC owned VMWare to provide not only certification for all Documentum components but also leveraging the power of their virtualization infrastructure to both ease deployment but also enable efficient use of infrastructure.

Working with VMWare-owned Socialcast to include activity streams into Documentum user interfaces.

Working with RSA to enchance the security features of the platform.

Working with Greenplum to power analytics but also provide a new perspective of handling big data with smart on top of it – big information.

 

Towards a unified client

  • Client situation is a mess today
  • C6 acquisition was a good move
  • A unified client is coming along
  • Wonderful to see the focus on iOS apps

The user interface of Documentum is frankly a mess nowadays. A result of too many teams working in their own bubble creating user interfaces based on different customer groups. WDK-based Webtop with its DAM-cousin. Taskspace which is also WDK but gains some power from Forms Builder and xCP technologies. ExtJS-based Centerstage which look great but is a bit late and light in features. Feature-rich Media Workspace which is based on Flex in a world where Adobe Flash is obviously loosing traction and HTML5 is taking off. Steve Jobs really made a difference here it turns out. On top of that Desktop applications for OS X and Windows as well as an Outlook client. It is not that I think there is a need for different clients. There is. Especially from a training perspective where some companies require almost zero training whereas other can accept more extensive training.

The inclusion of C6 Technologies into Documentum is a welcome move and I heard lots of positive reactions to that. However, the key thing is that EMC IIG is now firmly committed to unifying all clients with one technology stack which of course will focus a lot on configurability. So in the end it could very well mean that the number of clients will be much bigger, but will be just different configurations based on very specific user needs. The unified client will most likely be based on C6 and ExtJS technologies which means that Flex is going away quickly. So is WDK and Taskspace but in a longer perspective. So think of D2 as a Webtop replacement and X3 as the new Centerstage with lots of widgets including ones for rich media management. Probably we will see the C6 iPad client replace the existing Documentum client as well. Expect an iPhone client soon as well.

Speaking about iOS. To me it almost like a new world compared to my first EMC World in 2007. Everybody at EMC were using Blackberries and Macs were hardly seen. Now the iPad app is out, Peggy talks about “everybody loves their iPads”, Macs are in booths and on stage, there are several Documentum apps and almost all contest prices consists of iPads. Macgirlsweden is both happy and astonished at this development J

 

Policy-based deployment with monitoring

Ok, so Documentum is not easy to deploy. It takes a while but as Jeroen put it: “You guys want to do complicated stuff!”. I think he is right and it might sometimes be a good thing since you have stop and think (not like Sharepoint which is way to easy to install in that sense). You choose Documentum because you have a complicated process to support, large amount of content and an ECM vision. Still, agility really needs to be improved and that will also simplify deployment. So improvement is important for several reasons.

The first part of that is the xCelerated Management System which in essence lets you describe and model your applications and your deployment needs. Tools then translate these policies onto your VMWare-powered infrastructure and deploys the whole Documentum platform based on your needs. Taken into account the number of users, the type of content, type of processes and what kind of high availability demands you have. Finally all of this is monitored using a combination of open-source Hyperic and their Integrien engine they got through an acquisition. Integrien now seem to have become VMWare vCenter Operations. That architecture will in my opinion set EMC Documentum way ahead of its competitors especially if it can provide some additional agility when the Next-Generation Information Server (NGIS) comes.

 

Analytics and Search

  • xPlore is looking good
  • Thesaurus-support is a good thing
  • QBS is great
  • Custom-pipeline support based on UIMA is great

A dear subject of mine where EMC IIG finally seem to get their act together. They have there own search server called xPlore which is based on open-source Lucene and their own powerful XML-database xDB. A really smart move now when FAST, Autonomy, and Endeca have been bought by the other IT-giants.

xPlore 1.2 provides some really cool features both in terms of baseline search capabilities like thesaurus support but also more text analytics oriented features. The content processing pipeline now supports extensions based on UIMA which opens up to having other entity extraction engines connected into explore. Another really cool feature is Query-Based subscriptions which really leverages the Documentum repository. Create a search query based on a combo of free text and metadata. Save it and set it up to run with different intervals and notify you of any new content that has been ingested. You can even use to to fire of a workflow in order to have somebody take action. Hopefully we will see some xCP integration in the xPlore 1.3 release where the search experience and indexing is linked to the characteristics of the xCP Application Model.

In his Innovation Speech the Chief Architect Jeroen van Rotterdam also showcased a modified centerstage which used a recommendation engine based on a Hidden Markow model to suggest similar content to users based on similiarity in context and similarity in content. A really powerful feature that makes EMC live up to its name: Information Intelligence Group (IIG). Jeroen also mentioned that they are working on video and audio analytics including speech-to-text which is then indexed into explore. That will most likely arrive in the iPad client first.

Another cool thing that is coming for the Content Intelligence Services (CIS) component is automated metadata extraction based on rules and taxonomy cold-start. Which means that you could start generating taxonomy based on your existing content.

Next-Generation Information Server (NGIS)

It seems that there has been a big investment in the xDB technology and therefore it is a key component in NGIS. Not any surprise there since Jeroen is one of the founders of the company that EMC bought. That could also mean that future installations of Documentum will not require a traditional SQL RDBMS which would not be such a bad thing. One less license and one less skill set to manage. NGIS is being designed with both the cloud and ”big information” in mind. The idea is to be able to use different datastores such as Atmos, Greenplum, Isilon etc together with NGIS. I really like the term ”big information” which is a way to take what we now know as ”big data” to the next level where it also covers unstructured data and documents. Since there is a wave of information coming over us now it seems smart to design this for huge datasets from the beginning. After all we need to manage it whether we like it or not. As Peter Hinssen put it at the final keynote: ”It is not information overload – it is a filter failure”. We CAN handle vast amount of data if we design the architecture right. Another interesting concept is to bring processing to data(nodes) instead of what we do today when we have a central processing node which we pipe all data through. Everybody is realising that the first releases of NGIS will not be feature-complete in comparison with Documentum Content Server but I also wonder to what the cloud focus really mean for NGIS. I hope it means cloud as a technical concept and not only public cloud meaning that NGIS only will be available for OnDemand at first. On the other hand, an early access program is now opening up and that will most likely be run on premise. NGIS will be an important aspect to make Documentum retain its position as the leader in ECM-technology. In the light of the other innovation going on it can be a bright future.

Cloud and EMC OnDemand

So now you can run a complete Documentum stack in the cloud. Great thing which I think will broadened the market a bit. Much easier to get up and running and an ability to focus on core ECM-capabilities instead of installning server OS, DBMS and managing storage. A good thing is the ability to have extra power available if needed. Provisions of a full platform is said to happen in 6-8 hours dependning on configuration. Deployment will be in a vCube where all Documentum servers will be managed as images. Each customer gets its own vCube. It will be possible to run a vCube on premise but that means that EMC still manages the configuration over the internet even though it is running on your hardware. There will be some limitations on the level of customizations that you can do in order to have EMC take responsibilty over the vCube. Remember all server OS and DBMS licenses are included in the vCube. All together the cloud initiative is driving huge configuration and deployments which all aspects of Documentum will gain from.

 

Venue and atmosphere

  • Keep working on the IIG and Documentum community feeling

Another Momentum conference has ended and it is time to reflect on our experiences from this event. This was my second European conference but I have attended four EMC World conferences. I keep hearing that they are different and also stories from the old Momentum conferences before EMC acquired Documentum. During my first EMC World events I really felt that the Documentum community was lost among a wave of storage people roaming around. However, the Momentum brand has been strengthened and I believe the difference between the US and the European conference is much smaller now. I think the main difference is the crowd and the atmosphere. The locations in Europe are a bit smaller in scale but also the event sites physically look different. In all EMC IIG made a very good job organizing this event with no visible friction from my point of view.

 

Practical things

  • More power outlets
  • Dedicated wifi in the keynote area (to allow use of Social media)
  • Set up a blogger’s lounge based on the EMC World concept

In general EMC created a very well organized event but there are some room for improvements anyway. One thing is the meals area. For some reason the Americans prefer round tables ”en masse” whereas this event was located in the ordinary breakfast restaurant in the hotel. Tables were straight ones with 2-8 seats each. To me that did not invite to as many spontaneous lunch encounters as I experience at EMC World. People tend to stay in their small groups and eat in those as well.

Another recurring issue is of course shortages in power outlets, which I found really strange in an IT-conference and with EMC’s strong push for social media interactions. Even though iPads are much more common now (even at EMC events) I think the conference experience would be more productive with a decent number of outlets and a capable Wifi network. My best experience so far is still a Microsoft conference around FAST Search in Vegas where all 1200 participants had tables with outlets.

The were a social media center but I felt it was way to small compared to the spacious EMC World blogger’s lounge. There are still quite few people who are using social media during the conference and a good lounge would encourage interaction IRL between us. Consider creating badges where your Twitter name and blog address is printed.

 

Social events

  • Make them about networking
  • Make it possible to talk – have areas without very loud music
  • Make sure those with allergies can eat and eat safely.

First, of all I don’t drink alcohol at all. So I that sense I may not be representative for the group at large. Still, since this is a professional conference I do have some opinions based on what the utility of these social events could have. Of course, it should be a more relaxed time and a possibility to have some fun. However, I do like to see these events as very good opportunity for networking between all of us at the conference. Locating these events in nightclubs with very loud music is therefore not an ideal setting for networking. I think the EMC World Social events in the US are better that way. Spending the night in Universal Studios for instance was a very much different experience than Ewerke in Berlin. Not just because there are terrific and fun rides there but also because there were lots of places to sit down, eat good food and talk a lot. I had a great evening there last time talking a lot about the future of content analytics with EMC staff and customers. So at least provide areas where people can talk to each other. Make the events more of continuation of the conference day. Make sure that it is in theme – any entertainment should have some connection to ECM. Maybe a stand-up around our community or a show with music with dedicated lyrics about us. Also, it would be great to have more non-alcoholic alternatives than orange juice, coke and Fanta. Also, I am allergic to nuts and I had a small incident where I accidentally ate something with nuts in it. Provide good information and possibly alternatives for us with allergies.

DISCLAIMER: All opinions here are my own and does not represent any official view of my employer. Information are based on notes and conversations and may contain errors.

Enhanced by Zemanta
Share

Interesting thoughts around the Information Continuum

In a blog post called ”The Information Continuum and the Three Types of Subtly Semi-Structured Information” Mark Kellogg discusses what we really mean with unstructured, semi-structured and structured information. In my project we have constant discussions around this and how to look upon the whole aspect of chunking down content into reusable pieces that in itself needs some structured in order to be just that – reusable. At first we were ecstatic over the metadata capabilities in our Documentum platform because we have made our unstructured content semi-structured which in itself is a huge improvement. However, it is important to see this as some kind of continuum instead of three fixed positions.

One example is of course the PowerPoint/Keynote/Impress-presentation which actually is not one piece. Mark Kellogg reminded me of the discussions we have had around those slides being bits of content in a composite document structure. It is easy to focus on the more traditional text-based editing that you see in Technical Publications and forget that presentations have that aspect in them already. To be honest when we first got Documentum Digital Asset Manager (DAM) in 2006 and saw the Powerpoint Assembly tool we became very enthusiastic about content reuse. However, we found that feature a little bit too hard to use and it never really took off. What we see in Documentum MediaWorkSpace now is a very much remamped version of that which I look forward to play around with. I guess the whole thing comes back to the semi-structured aspect of those slides because in order to facilitate reuse they somehow need to get some additional metadata and tags. Otherwise it is easy the sheer number of slides available will be too much if you can’t filter it down based on how it categories but who has created them.

Last year we decided to take another stab at composite document management to be able to construct templates referring to both static and dynamic (queries) pieces of content. We have made ourselves a rather cool dynamic document compsotion tool on top of our SOA-platform with Documentum in it. It is based on DITA and we use XMetaL Author Enterprise as the authoring tool to construct the templates, the service bus will resolve the dynamic queries and Documentum will store and transform the large DITA-file into a PDF. What we quickly saw was yet another aspect of semi-structured information since we need a large team to be able to work in parallell to ”connect” information into the finished product. Again, there is a need for context in terms of metadata around these pieces of reusable content that will end up in the finished product based on the template. Since we depend of using a lot of information coming in from outside the organisation we can’t have strict enforcement of the structure of the content. It will arrive in Word, PDF, Text, HTML, PPT etc. So there is a need to transform content into XML, chunk it up in reusable pieces and tag it so we can refer to it in the template or use queries to include content with a particular set of tags.

This of course bring up the whole problem with the editing/authoring client. The whole concept of a document is be questioned as it in itself is part of this Continuum. Collaborative writing in the same document has been offered by CoWord, TextFlow and the recently open source Google tool Etherpad and will now be part of the next version of Microsoft Office. Google Wave is a little bit of a disrupting force here since it merges the concept of instant messaging, asynchronous messaging (email) and collaborative document editing. Based on the Google Wave Federation protocol it is also being implemented in Enterprise Applications such as Novell Pulse.

So why don’t just use a wiki then? Well, the layout tools is nowhere as rich as what you will find in Word processors and presentation software and since we are dependent on being able to handle real documents in these common format it becomes a hassle to convert them into wiki format or even worse try to attach them to a wiki page. More importantly a wiki is asynchronous in nature and that is probably not that user friendly compared to live updates. The XML Vendors have also went into this market with tools like XMetaL Reviewer which leverages the XML infrastructure in a web-based tool that almost in real-time allow users to see changes made and review them collaboratively.

This lead us into the importance of the format we choose as the baseline for both collaborative writing and the chunk-based reusable content handling that we like to leverage. Everybody I talk to are please with the new Office XML-formats but say in their next breath that the format is complex and a bit nasty. So do we choose OpenOffice, DITA or what? What we choose as some real impact on the tool-end of our solutions because You probably get most out of a tool when it is handling its native format or at least the one it is certified to support. Since it is all XML when can always transform back and forth using XSLT or XProc.

Ok, we have the toolset and some infrastructure in place for that. Now comes my desire to not stove-pipe this information in some close system only used to store ”collaborative content”. Somehow we need to be able to ”commit” those ”snapshots” of XML-content that to some degree consitutes a document. Maybe we want to ”lock it” down so we know what version of all of that has been sent externally or just to know what we knew at a specific time. Very important in military business. That means that it must be integrated into our Enterprise Content Management-infrastructure where it in fact can move on the continuum into being more unstructured since it could even be stored as a single binary document file. Some we need to be able to keep the tracability so you know what versions of specific chunks was used and who connected them into the ”document”. Again, just choosing something like Textflow or Etherpad will not provide that integration. MS Office will of course be integrated with Sharepoint but I am afraid that implementation will not support all the capabilities in terms of tracability and visualisation that I think you need to make the solution complete. Also XML-content actually like to live in XML-databases such as Mark Logic Server and Documentum XML Store so that integration is very much need more or less out of the box in order to make it possible to craft a solution.

We will definitely look into Documentum XML Technologies more deeply to see if we can design an integrated solutions on top of that. It looks promising especially since a XProc Pipeline for DITA is around the corner.

Reblog this post [with Zemanta]
Share