EMC World 2010: Next-generation Search: Documentum Search Services

Presented by Aamir Farooq

Verity: Largest ingex 1 M Docs

FAST: Largest Index 200 M Docs

Challenging requirements today that all requires tradeoffs. Instead of trying to plugin third party search engines chose to build and integrated search engine for content and case management.

Flexible Scalability being promoted.

Tens to Hundreds of Millions of objects per host

Routing of indexing streams to different collections can be made.

Two instances can be up and running in less than 20 min!

Online backup restore is possible using DSS instead of just offline for FAST

FAST only supported Active/Active HA. In DSS more options:

Active/Passive

Native security. Replicates ACL and Groups to DSS

All fulltext queries leverage native security

Efficient deep facet computation within DSS with security enforcement. Security in facets is vital.

Enables effective searches on large result sets (underpriveleged users not allowed to see most hits in result set)

Without DSS, facets computed over only first 150 results pulled into client apps

100x more with DSS

All metrics for all queries is saved and can be used in analytics. Run reports in the admin UI.

DSS Feature Comparison

DSS supports 150 formats (500 versions)

The only thing lacking now is Thesaurus (coming in v 1.2)

Native 64-bit support for Linux and Windows, Core DSS is 64-bit)

Virtutalisation support on VMWare

Fulltext Roadmap

DSS 1.0 GA compatible with D 6.5 SP2 or later. Integration with CS 1.1 for facets, native security and XQuery)

Documentum FAST is in maintenance mode.

D6.5 SP3, 6.6 and 6.7 will be the last release that support FAST

From 2011 DSS will be the search solution for Documentum.

Index Agent Improvements

Guides you through reindexing or simply processing new indexing events.

Failure thresholds. Configure how many error message you allow.

One Box Search: As you add more terms it is doing OR instead of AND between each terms

Wildcards are not allowed OOTB. It can be changed.

Recommendations for upgrade/migration

  • Commit to Migrate
  • No additional license costs – included in Content Server
  • Identity and Mitigate Risks
  • 6.5 SP2 or later supported
  • No change to DQL – Xquery available.
  • Points out that both xDb and Lucene are very mature projects
  • Plan and analyze your HA and DR requirements

Straight migration. Build indices while FAST is running. Switch from FAST to DSS when indexing is done. Does not require multiple Content Servers.

Formal Benchmarks

  • Over 30 M documents spread over 6 nodes
  • Single node with 17 million documents (over 300 Gb index size)
  • Performance: 6 M Documents in FAST took two weeks. 30 M with DSS also took 2 weeks but with a lot of stops.
  • Around 42% faster for ingest for a single node compared to FAST

The idea is to use xProc to do extra processing of the content as it comes into DSS.

Conclusion

This is a very welcome improvement for one of the few weak points in the Documentum platform. We were selected to be part of the beta program so I would now have loved to tell you how great of an improvement it really is. However, we were forced to focus on other things in our SOA-project first. Hopefully I will come back in a few weeks or so and tell you how great the beta is. We have an external Enterprise Search solution powered by Apache Solr and I often get the question if DSS will make that unnecessary. For the near future I think it will not and that is because the search experience is also about the GUI. We believe in multiple interfaces targeted at different business needs and roles and our own Solr GUI has been configured to meet our needs based from a browse and search perspective. From a Documentum perspective the only client today that will leverage the faceted navigation is Centerstage and that is focused on asynchronous collaboration and is a key component in our thinking as well, but for different purposes. Also even though DSS is based on two mature products (as I experienced at Lucene Eurocon this week) I think the capabilities to tweak and monitor the search experience at least initially will be much better in our external Solr than using the new DSS Admin Tool although it seems like a great improvement form what the FAST solution offers today.

Another interesting development will be how the xDB inside DSS will related to the ”internal” XML Store in terms of integration. Initially they will be two servers but maybe in the future you can start doing things with them together. Especially if next-gen Documentum will replace the RDBMS as Victor Spivak mentioned as a way forward.

At the end having a fast search experience in Documentum from now is so important!

Further reading

Be sure to also read the good summary from Technology Services Group and Blue Fish Development Group about their take on DSS.

Reblog this post [with Zemanta]
Share

Where the FAST Enterprise Search Platform (ESP) is going now…

I have spent the last week in Las Vegas attending the FAST Forward 09 conference. About a year ago the Norvegian company FAST Search & Transfer was acquired by Microsoft and like me customers all over the world wonder what would happen. Some thought it was great to have a huge company with its R&D resources to take the platform forward while others like me feared a technology transition which would include cancelling support for other operating systems and integration with nothing but Microsoft technology.

It was very clear that the Microsoft Marketing department had a lot to say about the conference and what messages that were to be conveyed. Somewhere behind all that you could still see some of the old FAST mentality but it was really toned down. To me the conference was about convincing existing customers that MS is committed to Enterprise Search and to give Sharepoint customers some idea of what Enterprise Search is all about.

It is clear that the product line is diversifying in a common Microsoft strategy:

Solutions for Internet Business

  • FAST Search for Internet Business
  • FAST Search for Sharepoint Internet sites
  • FAST AdMomentum
  • Solutions for Business Productivity

  • FAST Search for Sharepoint
  • FAST Search for Internal Application
  • FAST Search for Sharepoint won’t be available until Office Wave 14 (incl Sharepoint) will be released so in the meantime there will be a product called FAST ESP for Sharepoint that can be used today and will have a license migration path towards FAST Search for Sharepoint. That product will have product license of aroudn 25 000 USD and then additional Client Access License (CAL) will follow in a standrad MS manner.

    So what does all of this means for us who like to see FAST ESP continue as an enterprise component in a heterogenous environment? Well, MS has commited to 10 years of support for current customers, I guess in a gesture towards those who are worried. Over and over again I heard representatives talking about how important those high-end installations on other operating systems are. The same message appeared when it came to connectors and integration with Enterprise Content Management systems like EMC Documentum. Still, most if not all demos was connected to Sharepoint and/or other MS-specific technologies.

    The technical roadmap means that the past year has been devoted in rewriting their next generation search platform from Java to .Net. The first product that will be released is the Content Integration Studio (CIS) which consist of Visual Studio (I guess earlier in Eclipse) component and a server-side execution engine. This will only be available on Windows since it is deeply connected to the .Net-environment. It looks like a promising product with support for flows instead of linear pipeline to handle the processing of information before it is handed of to the index engine. CIS therefore sits in-front of FAST ESP and a combination of actions in flow and in old pipelines can be executed. Information from CIS is written to the ESP which then creates the index and also processes queries to it.

    What I think we can expect is that new innovation is focused on creating a modular architecture where CIS is the first one. Features in ESP will the be gradually reengineered in a .Net-environment and thus creating a common search platform some years into the future. It will likely mean that we will still see one or two upgrades to the core ESP as we know it today to enable it to function together with the new components. Content Fusion will most likely be the next module that will extend ESP but on a .Net-architecture.

    When it comes to the presentation logic where we today have the FAST Search Front-End (SFE) we will see them either as Web parts for Sharepoint or as AJAX Aerogel from MS. These are currently developed using Javascript but will include Silverlight later on.

    These will initially be offered in both a IIS and a Tomcat flavour and possibly others if there is demand. They will intitially integrated with ESP and Unity and thus opening up for a new approach of developing a search experience on top of them.

    I general I don’t like the Microsoft approach of insisting of owning the whole technology stack by themselves and refusing to invest in other standards-based projects. Instead of developing their own AJAX libraries they could have used ExtJS or even Google Web Toolkit. While it is not open source MS argues that it is a very Permissive licence from MS that has many of the same qualities. A good thing is that MS was comitted to make sure that this framework works on all major browsers including FireFox, Safari and Chrome. It is interoperable with JQuery.

    In summary I think it is kind of a mixed experience. The new features being developed are truly needed to make FAST keep being one of the most advanced search engines available. I think many of the features look really promising and I can’t wait to get my hands on then. On the other hand it is clear that things are going proprietary (FAST ESP had a lot of open source in it), it is being aligned in a Microsoft stack and thus gradually minimizing options. That includes how new technologies are being implemented (MS-ones instead of open source), what operating systems it will run on and how the support for developing presenation logics look like. It means I have to have people how know both Java and .Net, both Flash and Silverlight (possibly JavaFx) and both ExtJS/GWT and MS AJAX/Aerogel.

    We are deeply invested in the EMC Documentum Platform and would of course like to continue use ESP as a way to add advanced capabilities and performance to our architecture. However, I think I will over time get sick and tired on Microsoft sales people trying to convince me to use Sharepoint instead of Documentum. For anybody who know how both platform work it is almost a joke but I will most likely have to keep explaining and explaining. I just hope that we can have decent connector developed for Documentum.

    Too read more you can go to the FAST Forward Blog which has many interviews, look at videos at the Microsoft Press Room and check out the chatter on ffc09 tagged tweets on Twitter. An finally here is what CMS Watch has to say about it.

    Share