Tag: VMware

Reflections from Momentum 2011 Berlin

EMC (IIG) the company

  • A real tech company
  • Responsive employees
  • Easy to get access inside the company
  • Willing to share information
  • Sometimes hard to figure out ”who is who” in EMC Information Intelligence Group (IIG)

As a customer it is important how the company feels. My experience is that EMC is a company where you can find tech-savvy people who really like what they are doing. And they are good at it. The general experience is that employees are interested in listening to us and very responsive to our needs. It is easy to quickly get access to both key business people as well as people in engineering. On the other hand that is often required because the product is quite complicated. On the negative side the company is big and that means that things are not always coordinated and it can sometimes be difficult to figure out who is who among all the different product managers, general managers, solutions directors and architects.

EMC IIG seems open and transparent to me. Sure there are disclaimers but they are talking openly about most things and there is no NDA at the conference.

 

Strategy

I feel a big difference this year – maybe because I have been away for over a year due to my year at the National Defense College. The big difference is that EMC Information Intelligence Group finally seems to get it. For real. Away from the idea that Case Management is something different than Enterprise Content Management. A realization that nice-looking usable user interfaces is a key thing. Understanding that the cloud is a key component of EMC IIG future. Communicating the power of configuration instead of coding is the real power of xCP but not just the interfaces – the whole application. Finally working to get decent analytics to make use of the contextual information that already exists around objects in the repository. Somehow it feels like there is a new executive team in place that wants to be a little bit more bold and wants to move IIG in a certain direction.

EMC has made numerous acquisitions after it bought Documentum but now it feels like they are finding out that they all have lots of different pieces of technology within the company that together can be a bigger whole.

Working with EMC owned VMWare to provide not only certification for all Documentum components but also leveraging the power of their virtualization infrastructure to both ease deployment but also enable efficient use of infrastructure.

Working with VMWare-owned Socialcast to include activity streams into Documentum user interfaces.

Working with RSA to enchance the security features of the platform.

Working with Greenplum to power analytics but also provide a new perspective of handling big data with smart on top of it – big information.

 

Towards a unified client

  • Client situation is a mess today
  • C6 acquisition was a good move
  • A unified client is coming along
  • Wonderful to see the focus on iOS apps

The user interface of Documentum is frankly a mess nowadays. A result of too many teams working in their own bubble creating user interfaces based on different customer groups. WDK-based Webtop with its DAM-cousin. Taskspace which is also WDK but gains some power from Forms Builder and xCP technologies. ExtJS-based Centerstage which look great but is a bit late and light in features. Feature-rich Media Workspace which is based on Flex in a world where Adobe Flash is obviously loosing traction and HTML5 is taking off. Steve Jobs really made a difference here it turns out. On top of that Desktop applications for OS X and Windows as well as an Outlook client. It is not that I think there is a need for different clients. There is. Especially from a training perspective where some companies require almost zero training whereas other can accept more extensive training.

The inclusion of C6 Technologies into Documentum is a welcome move and I heard lots of positive reactions to that. However, the key thing is that EMC IIG is now firmly committed to unifying all clients with one technology stack which of course will focus a lot on configurability. So in the end it could very well mean that the number of clients will be much bigger, but will be just different configurations based on very specific user needs. The unified client will most likely be based on C6 and ExtJS technologies which means that Flex is going away quickly. So is WDK and Taskspace but in a longer perspective. So think of D2 as a Webtop replacement and X3 as the new Centerstage with lots of widgets including ones for rich media management. Probably we will see the C6 iPad client replace the existing Documentum client as well. Expect an iPhone client soon as well.

Speaking about iOS. To me it almost like a new world compared to my first EMC World in 2007. Everybody at EMC were using Blackberries and Macs were hardly seen. Now the iPad app is out, Peggy talks about “everybody loves their iPads”, Macs are in booths and on stage, there are several Documentum apps and almost all contest prices consists of iPads. Macgirlsweden is both happy and astonished at this development J

 

Policy-based deployment with monitoring

Ok, so Documentum is not easy to deploy. It takes a while but as Jeroen put it: “You guys want to do complicated stuff!”. I think he is right and it might sometimes be a good thing since you have stop and think (not like Sharepoint which is way to easy to install in that sense). You choose Documentum because you have a complicated process to support, large amount of content and an ECM vision. Still, agility really needs to be improved and that will also simplify deployment. So improvement is important for several reasons.

The first part of that is the xCelerated Management System which in essence lets you describe and model your applications and your deployment needs. Tools then translate these policies onto your VMWare-powered infrastructure and deploys the whole Documentum platform based on your needs. Taken into account the number of users, the type of content, type of processes and what kind of high availability demands you have. Finally all of this is monitored using a combination of open-source Hyperic and their Integrien engine they got through an acquisition. Integrien now seem to have become VMWare vCenter Operations. That architecture will in my opinion set EMC Documentum way ahead of its competitors especially if it can provide some additional agility when the Next-Generation Information Server (NGIS) comes.

 

Analytics and Search

  • xPlore is looking good
  • Thesaurus-support is a good thing
  • QBS is great
  • Custom-pipeline support based on UIMA is great

A dear subject of mine where EMC IIG finally seem to get their act together. They have there own search server called xPlore which is based on open-source Lucene and their own powerful XML-database xDB. A really smart move now when FAST, Autonomy, and Endeca have been bought by the other IT-giants.

xPlore 1.2 provides some really cool features both in terms of baseline search capabilities like thesaurus support but also more text analytics oriented features. The content processing pipeline now supports extensions based on UIMA which opens up to having other entity extraction engines connected into explore. Another really cool feature is Query-Based subscriptions which really leverages the Documentum repository. Create a search query based on a combo of free text and metadata. Save it and set it up to run with different intervals and notify you of any new content that has been ingested. You can even use to to fire of a workflow in order to have somebody take action. Hopefully we will see some xCP integration in the xPlore 1.3 release where the search experience and indexing is linked to the characteristics of the xCP Application Model.

In his Innovation Speech the Chief Architect Jeroen van Rotterdam also showcased a modified centerstage which used a recommendation engine based on a Hidden Markow model to suggest similar content to users based on similiarity in context and similarity in content. A really powerful feature that makes EMC live up to its name: Information Intelligence Group (IIG). Jeroen also mentioned that they are working on video and audio analytics including speech-to-text which is then indexed into explore. That will most likely arrive in the iPad client first.

Another cool thing that is coming for the Content Intelligence Services (CIS) component is automated metadata extraction based on rules and taxonomy cold-start. Which means that you could start generating taxonomy based on your existing content.

Next-Generation Information Server (NGIS)

It seems that there has been a big investment in the xDB technology and therefore it is a key component in NGIS. Not any surprise there since Jeroen is one of the founders of the company that EMC bought. That could also mean that future installations of Documentum will not require a traditional SQL RDBMS which would not be such a bad thing. One less license and one less skill set to manage. NGIS is being designed with both the cloud and “big information” in mind. The idea is to be able to use different datastores such as Atmos, Greenplum, Isilon etc together with NGIS. I really like the term “big information” which is a way to take what we now know as “big data” to the next level where it also covers unstructured data and documents. Since there is a wave of information coming over us now it seems smart to design this for huge datasets from the beginning. After all we need to manage it whether we like it or not. As Peter Hinssen put it at the final keynote: “It is not information overload – it is a filter failure”. We CAN handle vast amount of data if we design the architecture right. Another interesting concept is to bring processing to data(nodes) instead of what we do today when we have a central processing node which we pipe all data through. Everybody is realising that the first releases of NGIS will not be feature-complete in comparison with Documentum Content Server but I also wonder to what the cloud focus really mean for NGIS. I hope it means cloud as a technical concept and not only public cloud meaning that NGIS only will be available for OnDemand at first. On the other hand, an early access program is now opening up and that will most likely be run on premise. NGIS will be an important aspect to make Documentum retain its position as the leader in ECM-technology. In the light of the other innovation going on it can be a bright future.

Cloud and EMC OnDemand

So now you can run a complete Documentum stack in the cloud. Great thing which I think will broadened the market a bit. Much easier to get up and running and an ability to focus on core ECM-capabilities instead of installning server OS, DBMS and managing storage. A good thing is the ability to have extra power available if needed. Provisions of a full platform is said to happen in 6-8 hours dependning on configuration. Deployment will be in a vCube where all Documentum servers will be managed as images. Each customer gets its own vCube. It will be possible to run a vCube on premise but that means that EMC still manages the configuration over the internet even though it is running on your hardware. There will be some limitations on the level of customizations that you can do in order to have EMC take responsibilty over the vCube. Remember all server OS and DBMS licenses are included in the vCube. All together the cloud initiative is driving huge configuration and deployments which all aspects of Documentum will gain from.

 

Venue and atmosphere

  • Keep working on the IIG and Documentum community feeling

Another Momentum conference has ended and it is time to reflect on our experiences from this event. This was my second European conference but I have attended four EMC World conferences. I keep hearing that they are different and also stories from the old Momentum conferences before EMC acquired Documentum. During my first EMC World events I really felt that the Documentum community was lost among a wave of storage people roaming around. However, the Momentum brand has been strengthened and I believe the difference between the US and the European conference is much smaller now. I think the main difference is the crowd and the atmosphere. The locations in Europe are a bit smaller in scale but also the event sites physically look different. In all EMC IIG made a very good job organizing this event with no visible friction from my point of view.

 

Practical things

  • More power outlets
  • Dedicated wifi in the keynote area (to allow use of Social media)
  • Set up a blogger’s lounge based on the EMC World concept

In general EMC created a very well organized event but there are some room for improvements anyway. One thing is the meals area. For some reason the Americans prefer round tables ”en masse” whereas this event was located in the ordinary breakfast restaurant in the hotel. Tables were straight ones with 2-8 seats each. To me that did not invite to as many spontaneous lunch encounters as I experience at EMC World. People tend to stay in their small groups and eat in those as well.

Another recurring issue is of course shortages in power outlets, which I found really strange in an IT-conference and with EMC’s strong push for social media interactions. Even though iPads are much more common now (even at EMC events) I think the conference experience would be more productive with a decent number of outlets and a capable Wifi network. My best experience so far is still a Microsoft conference around FAST Search in Vegas where all 1200 participants had tables with outlets.

The were a social media center but I felt it was way to small compared to the spacious EMC World blogger’s lounge. There are still quite few people who are using social media during the conference and a good lounge would encourage interaction IRL between us. Consider creating badges where your Twitter name and blog address is printed.

 

Social events

  • Make them about networking
  • Make it possible to talk – have areas without very loud music
  • Make sure those with allergies can eat and eat safely.

First, of all I don’t drink alcohol at all. So I that sense I may not be representative for the group at large. Still, since this is a professional conference I do have some opinions based on what the utility of these social events could have. Of course, it should be a more relaxed time and a possibility to have some fun. However, I do like to see these events as very good opportunity for networking between all of us at the conference. Locating these events in nightclubs with very loud music is therefore not an ideal setting for networking. I think the EMC World Social events in the US are better that way. Spending the night in Universal Studios for instance was a very much different experience than Ewerke in Berlin. Not just because there are terrific and fun rides there but also because there were lots of places to sit down, eat good food and talk a lot. I had a great evening there last time talking a lot about the future of content analytics with EMC staff and customers. So at least provide areas where people can talk to each other. Make the events more of continuation of the conference day. Make sure that it is in theme – any entertainment should have some connection to ECM. Maybe a stand-up around our community or a show with music with dedicated lyrics about us. Also, it would be great to have more non-alcoholic alternatives than orange juice, coke and Fanta. Also, I am allergic to nuts and I had a small incident where I accidentally ate something with nuts in it. Provide good information and possibly alternatives for us with allergies.

DISCLAIMER: All opinions here are my own and does not represent any official view of my employer. Information are based on notes and conversations and may contain errors.

Enhanced by Zemanta

EMC World 2010: Next-generation Search: Documentum Search Services

Presented by Aamir Farooq

Verity: Largest ingex 1 M Docs

FAST: Largest Index 200 M Docs

Challenging requirements today that all requires tradeoffs. Instead of trying to plugin third party search engines chose to build and integrated search engine for content and case management.

Flexible Scalability being promoted.

Tens to Hundreds of Millions of objects per host

Routing of indexing streams to different collections can be made.

Two instances can be up and running in less than 20 min!

Online backup restore is possible using DSS instead of just offline for FAST

FAST only supported Active/Active HA. In DSS more options:

Active/Passive

Native security. Replicates ACL and Groups to DSS

All fulltext queries leverage native security

Efficient deep facet computation within DSS with security enforcement. Security in facets is vital.

Enables effective searches on large result sets (underpriveleged users not allowed to see most hits in result set)

Without DSS, facets computed over only first 150 results pulled into client apps

100x more with DSS

All metrics for all queries is saved and can be used in analytics. Run reports in the admin UI.

DSS Feature Comparison

DSS supports 150 formats (500 versions)

The only thing lacking now is Thesaurus (coming in v 1.2)

Native 64-bit support for Linux and Windows, Core DSS is 64-bit)

Virtutalisation support on VMWare

Fulltext Roadmap

DSS 1.0 GA compatible with D 6.5 SP2 or later. Integration with CS 1.1 for facets, native security and XQuery)

Documentum FAST is in maintenance mode.

D6.5 SP3, 6.6 and 6.7 will be the last release that support FAST

From 2011 DSS will be the search solution for Documentum.

Index Agent Improvements

Guides you through reindexing or simply processing new indexing events.

Failure thresholds. Configure how many error message you allow.

One Box Search: As you add more terms it is doing OR instead of AND between each terms

Wildcards are not allowed OOTB. It can be changed.

Recommendations for upgrade/migration

  • Commit to Migrate
  • No additional license costs – included in Content Server
  • Identity and Mitigate Risks
  • 6.5 SP2 or later supported
  • No change to DQL – Xquery available.
  • Points out that both xDb and Lucene are very mature projects
  • Plan and analyze your HA and DR requirements

Straight migration. Build indices while FAST is running. Switch from FAST to DSS when indexing is done. Does not require multiple Content Servers.

Formal Benchmarks

  • Over 30 M documents spread over 6 nodes
  • Single node with 17 million documents (over 300 Gb index size)
  • Performance: 6 M Documents in FAST took two weeks. 30 M with DSS also took 2 weeks but with a lot of stops.
  • Around 42% faster for ingest for a single node compared to FAST

The idea is to use xProc to do extra processing of the content as it comes into DSS.

Conclusion

This is a very welcome improvement for one of the few weak points in the Documentum platform. We were selected to be part of the beta program so I would now have loved to tell you how great of an improvement it really is. However, we were forced to focus on other things in our SOA-project first. Hopefully I will come back in a few weeks or so and tell you how great the beta is. We have an external Enterprise Search solution powered by Apache Solr and I often get the question if DSS will make that unnecessary. For the near future I think it will not and that is because the search experience is also about the GUI. We believe in multiple interfaces targeted at different business needs and roles and our own Solr GUI has been configured to meet our needs based from a browse and search perspective. From a Documentum perspective the only client today that will leverage the faceted navigation is Centerstage and that is focused on asynchronous collaboration and is a key component in our thinking as well, but for different purposes. Also even though DSS is based on two mature products (as I experienced at Lucene Eurocon this week) I think the capabilities to tweak and monitor the search experience at least initially will be much better in our external Solr than using the new DSS Admin Tool although it seems like a great improvement form what the FAST solution offers today.

Another interesting development will be how the xDB inside DSS will related to the “internal” XML Store in terms of integration. Initially they will be two servers but maybe in the future you can start doing things with them together. Especially if next-gen Documentum will replace the RDBMS as Victor Spivak mentioned as a way forward.

At the end having a fast search experience in Documentum from now is so important!

Further reading

Be sure to also read the good summary from Technology Services Group and Blue Fish Development Group about their take on DSS.

Reblog this post [with Zemanta]