Category: Search Technologies

Other EMC World 2008 references

I love air conditioning systems but they once again seem to have given me cold so I am sneezing and coughing all the time and use aspirin (Alvedon in Swedish) to get along. It is a bit sad since I just love being here at the conference.

Another reflection is that conferences like this involves a lot of walking. These three days I have been walking around 5 km a day at least. Actually more than I walk a normal work day 🙂

Met Laurence from Word of Pie the other day and he has been writing excellent notes from other sessions that I recommend reading:

EMC World 2008: ECM Shared Services in the real world
Random thoughs and Keynote
EMC World 2008: Documentum Performance, Scalability, and Sizing – Part 2
EMC World 2008: Introduction to EMCÂ’s Next-Generation Knowledge Worker Client
EMC World 2008: Web 2.0 and Interactive Content Management
EMC World 2008: Documentum Foundation Services (DFS) – Best Practices and Real World Examples
EMC World 2008: Social Computing Meets R&D
Thoughts on EMC World 2008 and the ECM Professional
EMC World 2008: Best Practices for Designing and Deploying an Enterprise Document Capture Solution
EMC World 2008: Documentum Architecture Deep Dive

EMC World 2008: D6 Webtop – Focus on Knowledge Workers

Presented by Peggy Ringhausen Principal Product Manager during EMC World 2008

Different areas:
– Simple to use (Any content type
– Searchable (Flexible, federatedm consolidated results)
– Collaborative (Team-oriented, extended enterprise, secure)
– Agile (available anywhere, contextual, integrated)

Presets in D6 – configuration – even more in D7
Better preferences
Saved searched improved and search templates
WebTop 6.5 in late July

She talked about features already available.
Subscribe other people to content
Preferences is persistant – no longer any cookies on the client.

Presets allows you to pick a target and set rules.
A Select a folder. Set it up to only allow certain object.-types to be created in that folder.
Only allow certain actions om certain folder.

Extended search is an optional add-on to WebTop – creates clusters.
Clusters can be created based on certain attributes.
Search templates also part of extended search – allows to make some of the values optional and some fixed.

Collaboration Environment (DCE) is now bundled with WebTop. License key still required though.
Data Tables is also available through Collaboration and 6.5 also allow attachments to data tables
Events in the calendar object can be imported through iCal exports.

D6 SP1 – OLE Link support is optionable.
That feature checks if there are linked objects and imports these documents to and create a virtual documents out of all these items. The same thing during exports.
A checkbox on the import screen to also act on linked documents.

WebTop D6.5

Email conversion to EMF-format. Converting everything to the same parent object
Conversion tools to convert all emails to subtype dm_message_archive, lightweight html-format called EMC Email Format. Email viewing tool based on HTML for viewing email and attachment without having to export.

Page Refresh Reduction – reduced as many as possible. A really good thing I think.

Modal Dialogs can be turned off. Brings up small new browser windows to see the context. Properties with its own window. Instead of the usual Documentum-screens that fills the browser window.

Multi-select Drag and Drop is supported.

HTTP or UCF Choice enhancement.

Deep Export. This is great and I can’t really understand why it took them so long.

Content Transfer Improvement (multithreaded streaming)
New import screen. Small window with green bar. No longer a white screen which is a great user interface improvement. That screen had a tendency to scare people a little bit

Security Testing has been extended. Promised no level 1 or 2:s…

Take the WDK components and pull them into a new container and give users a new UI.
The new UI of the WebTop is optional.
Looks a little like Outlook. Collapsable parts (bars) on the left.No huge tree structure any more.
The tabs I saw were:
– Search center
– Subscriptions
– Home Cabinet

User configurable home page. New column to the right for properties, versions, comments. A great improvement since it almost provides a portal page within WebTop.

Contextual right click menus.

Offline client – My Documentum Offline (OEM product) Available end of July and beginning of August.
D6 SP1 release. Free of charge for any user using WebTop.
– The My Documentum Folder (provides access to the latest versions of documents when not connected)
– Synchronize (Choose specific documents, folders, and subscriptions)
– Personalize (Tailor to suit individual needs)
– Resolve issues (Mechanism to resolve conflicts that arise during synchronization)
Offline client has a small Jet-DB on the client to hold the metadata.

Still a bit of an overlap between File Sharing Services (FSS) and Offline client though. Will most likely be merged in someway in a D7 timeframe.

Learned later on that all these new feature will be available in DAM 6.5 as well since DAM is just an extension of WebTop.

The vision for the Modern Knowledge Worker

Presented during EMC World 2008 by: John McCormick, GM Knowledge Worker Business Unit

Today’s Problem: An information Explosion
The nature of content management has changed, IM, email, videos etc
Spend so much time finding things…

Who is the Knowledge Worker?
– Work in different or remote locations
– Create and work with a variety of different content
– Engage in dynamic work processes that change frequently
– Work in teams to get their job done
– Need access to managed content in their everyday application

A wide variety of interface for people to work from, spaces, wikis, powerpoint, outlook etc

KW Challenges
– Proliferation of information silos
– Work in many dispersed teams
– Finding the right information
– Seeing the relationships between types of information
– Organizing and sharing information
– Ensuring information is always accurate
– Adhering to IT-requirements for compliance and governance

IT Challenges
– Volume both in volume but also types of information
– Users are always connected – how to you do maintenance and upgrades
– Enpowerment – expect on-the-fly customization capabilities
– Control – Corporate regulatory concerns

Traditional KW Solutions
– Create silos of information
– Same info stored in multiple places
– Multiple search engines & queries
– Users re-invent wheel out of frustration
– Create solutions
– Complex interfaces are more than what users require
– No immediate access from remote locations
– Rely on yesterday’s technology (email, shared drives)

Four pillars of KW
– The Platform for Web 2.0
– Web 2.0 client
– Intelligence from information
– Access anywhere

The Platform for Web 2.0

Wikis, blogs, RSS managed as objects
Everything exposed as web services
Can be leveraged in any UI (purpose-built, partner, portal etc.)

Enterprise Scale
Built on a repository that can scale to billions
Wide array of platform services
Can interoperate with other CMA soltions like TCM
Any object can be retained, made a record, archived, published

Services-enables
Available to .Net and open environment
From a very chatty API to a less chatt SOA-based interface
Support a wide array of dispersed networks through BOCS
Extensible services for added functionality

Vision for Enterprise CM with Web 2.0
– Author & Publish (Blogger, Youtube, Wikipedia, Flickr) – Ratings on Content, IRM Security. Team Wikis, Collaboration
– Organize & Manage (Digg, del.icio.us) – Guided navigation, Tagging of items, Classification, Personalized Views
– Network & Access (LinkedIn, Facebook, Myspace, iPhones) – Enterprise Ready, Secured off-network, Mobile access, scalable infrastructure, Retention & governance

Pre-configured:
– Object models
– Taxonomies
– Business Processes
— User Experiences
– Retention Policies

BOCS Make sense for the KW Platform – ease the user experiences
BPM
Retention Policy Services
De-duplication
Archive

Web 2.0 Client

Personalized
– Simple to configure
– Include information to you
– Easy to use interface

Team thru Enterprise (Scale)
– Customizable team workspaces and templates improve efficiency
– User Management of Communities
– Ability to locate experts within an organization

Extendable
– Ability to mashup external information sources
– Components can be extended & created by partners and customers

Magellan Essentials
– No cost client
– Team workspaces
– Access control
– Library Services
– Guided navigation
– Content Templates
– Lifecycles

Full client
– Low cost client
– Wikis, Blogs & RSS
– Extranet Support
– Personal Spaces (Team Members)
– Tagging (Tag clouds)
– Federated Search
– Visualization
– Workflow

Multiple Patterns of Collaboration Supported

– Org/LOB/Deparmental
– Team & Project Oriented
– Individual (Ideation)

Information Intelligence

-Expansive Search
Both through EMC or own UI for ECIS
– Analyze & Classify (spot key concepts, detect relationships accross information assets
-Visualize (timeframe etc)

Smart Searching
– Indexing (real-time search results)
– Security (even for outside sources)
– Scalability

Tagging
– User created
– Folksonomies
– Change over time
Rule.based classification
– Metadata qualifiers
– Confidence weights
Semantics.-based classification
– Better linguistics support
– Derives the “gist” of the document
– Monitors designated information sources and provides updates
– Extracts key insights from text based on linguistics and indexing technologies

Visualization (Recommended, Indexed, Personalized, Aggregated, Guided)
– Expertise Location
– Mash-ups (google maps)
– Personalized Navigations
– Tag clouds

Work Wherever
– Offline support (synched to My Documents folder)
Use familiar tools
– MS Office, Adobe Creative Suite and beyond

Bring ECM to the desktop
– get control of email and files on the desktop
– Enforce corporate policies

Used iPod-iTunes metaphors for how the mobile client worked 🙂

EMC World 2008 Day 1 Part One

The conference has started and I was up early to go to the first seminar which was called Effective Classification – From data to information but it was mainly focused around EMCs storage products and low level classification and metadata management. More or less nothing about Documentum during the first 20 minutes so I got bored and went over to Introduction to Transactional Content Management instead. That was much better and was an overview of EMCs offerings around BPM and content management. Of course focused around the traditional examples like claims management and with integration with document capture products like Captive. However, I am very much interested in TaskSpace which provides a good streamlined interface for workflows but with inline preview of associated objects. I also had no idea that there was a solution for direct-attached scanners using a web client.

After that I had a meeting with David LeStrat and Gideon Ansell. David is PM for the new Magellan client and Gideon works with usability issues in the Documentum product line. We met last year where we provided some info around our network visualization technologies. We presented our project and what we trying to achieve and talked about how influences from the Web 2.0 movement could be used in the Documentum platform. Personalization is important just as the personal page is to actually make the user a node in the system. The current WebTop/DAM clients offers an good interface to interact with content object but still mainly around a folder structure idea. However, there is no place to enter my personal details and my skills. Magellan will offer that and in a sense provide the first step towards a community idea. I also mentioned the need to integrate Magellan with an Instant Messaging Solution. Finally we talked a little about our aspiration to provide a GIS-oriented interface to consume objects with geocoordinates. Currently this is just a mashup based on Google Maps but we of course need a solution that works without internet access. See upcoming posts for more information about Magellan and the other new interfaces.

Multiple perspectives on stored content, please!

I just read a post about the need for taxonomies at Chuck’s blog and I found a need to discuss how I view taxonomies and why they still seem relevant to me. From my experience there are no silver-bullets and the most important thing to respect is the need for multiplicity and multiple perspectives on things.

In this case I actually see no contradiction between folksonomies and corporate taxonomies. We want to create context our our content, right? And we most likely need users to do some part of that since they know the subject best. So we need to motivate them to provide context on top of that which the tech platform automatically can provide.

That does also mean that there is no need use in choose between either of the approaches. The just provide a metadata layer on top of the content, right? And as someone said, you have really not have too much metadata. Sometimes we will have more and sometimes less depending on a lot of factors.

Folksonomies can be analysed and used to fuel taxonomy development. And taxonomies will most likely either inspire or deterr people from certain ways of tagging.

The key is that it does not matter HOW we provide context. The payback comes when we consume it. Then all these different context layers provide us means to provide many different views of the same information. Tags will provide one, taxonomies in metadata a second and relationships between objects a second.

The more I read about this subject I think a good way to solve this is to build your infrastructure around Documentum repositories and them provide a multitude of different interfaces on top of that. Maybe in the form of wikis, blogs, search, GIS, timelines and others. The key is to store stuff the the right way. Only then can we create cool and usable interfaces for consuming it.

Data-deduplication is so cool!

I just read a blog post by Chuck Hollis at EMC about data deduplication and remembered how important that is to an EMC-platform. However, to me it is not just a way to improve or optimize storage, it is a critical part of any information architecture.

Data deduplication should be a core feature in platforms such as EMC Documentum which put the use of this feature up on the business side of things. Since everything stored in Documentum is an object which may or may not have an attachment in the form of a document that objekt can be exposed to users in one or many folders. The key thing is that these linkages opens up for interesting ways of using data deduplication.

Imagine a corporate environment with thosands of users. A lot of important documents in the company will be used many times by many people in different contexts. Since many of them likely are used as references in different projects such as corporate strategy, marketing documents so forth they are essentially read-only. However since a lot of users need these exact documents they will be imported many times in the repository and not only taking up unneccessary space but also create problems when these documents are updated. People will ask “hey, are all of these documents I found the same version?”

So my solution would be that we have a job running on import that highlights to the user that this particular document is already available and ask the user if they want to use the existing one instead. That renders a link being created to that document in their Folder or Project space.

We can also continously run a job doing reports on the the current status of the repository to see how many duplicates we have and what kind of content is duplicated the most. Documentum Reporting Services could be used to do that for instance. If we want a proactive Knowledge Management function they can either consolidate that directly or create tasks to users asking them if they agree to deduplicate some of their content. However, we need to push hard to have someone to create a really cool and usable interface to manage these “content conflicts”.

This will further help companies manage vital documents and further reduce the confusion of which document is the correct and updated one.

From a technology stand-point the first step would be to use a simple hash function to find exact duplicates but the next step should be to use vector-based indexing technology found in both Autonomy and FAST ESP to also detect similarity levels and possibly use that for further refinement of similar content. That way we de-duplicate the same content found in different formats and have the option of removing one of them or maybe just make one the rendition of the other.

Figuring out the collaboration tools in our intranet toolbox

I see myself as a translator between people speaking military and information technology languages. These two groups come from different worlds and have very different views of the world. The military people likes to speak about requirements from a very abstract standpoint and think the details should be worked out by someone else. They rarely have any knowledge of what the market can offer so when they are invited to a technical demo of somekind they usually approve it if it looks somewhat useful. The IT-people on the other hand have a tendency to strive for cheap, simple and safe solutions which will get adopted if some people in uniform accepts it. So what does that leave us?

Well, it means that if a military officer see a need to collaborate around a text and they are shown a wiki they almost immediately embrace it since it does not really matter if it is a good solution. It is far better than anything they have to today. That is why I think we need to break down our needs into a basic set of tools and define what makes them differ. These tools need to be mutually exclusive:

  • Asynchronous messaging (leave a message that someone picks up later
  • Real-time communication (text, audio and video chat and virtual meeting rooms
  • Presence (know what other people are doing right now)
  • Real-time collaboration on content (Multiple people writing in the same document at the same time)
  • Asynchronous collaboration on content (Multiple people writing at different times in the same document)
  • And then you could add the spatial dimension to this. It is way different to collaborate if you are at the same place or not.

    So that means that a Forum is basically a thread-based structuring of asynchronous messaging while a blog is a individually based contribution of content that can be asynchronously referenced and commented. On the other hand an application such as CoWord is doing real-time collaboration on content. The point of all this is to have some common set of references when evaluation different products and their potential use in the organisation. There is a huge difference in the way a real-time messaging tool can be used versus a tool which is asynchronous and lets you leave a message. Using a wiki in a live virtual meeting is therefore not the best solution.

    A wiki on the other hand is a good example of asynchronous collaboration on content. So is most features found in enterprise content management systems such as Documentum and Alfresco.

    Real-time collaborative editors have not really been a success yet even though their seem to be an obvious need for it in today’s connected world. Wikipedia has a nice article summarizing what is available today. Imaging four people working on a operation order where they each have their own cursor and writing at the same time. Science-fiction? No, try CoWord…

    This toolset also gives us the chance to question product or rather protocol selections we take for granted. Take email for instance. Most people agree that we need that. The concept is familiar for many people and we know what it is used for. Or do we? Since it is basically the essence of asynchronous messaging I argue that it is exactly what we need and not email which is just one implementation of it. An implementation based on standards developed in the early 70-ies which is hopelessly out-dated when it comes to things like security and efficiency. So maybe we should require that particular kind of messaging instead and open up for other technical solutions for it rather than POP/SMTP/IMAP. This is of course particularly appropriate in closes systems which has no connections to the internet which means that “email-interoperability” is no issue.

    So before we cheer at some new collaboration or social media application we should analyze them against our toolbox items and it will be much easier to find out if we like the product or maybe just some part of it.

    There are more stuff in the toolbox but I will come back to them in a later post.

    About visibility when working among men

    Today the Swedish site IT-tjej (IT-girl) has been launched here in Sweden and I found a lot of interesting articles that made me think. Those of who know me know that I am not especially fond of Microsoft but the article featuring their female IT-evangelist Maria Lundahl was really intersting for me. She said that she enjoy the visibility she gets in the male-dominated IT-business and that people often are surprised when she explains what she is working with and why she loves it.

    I am probably the only open transsexual woman working as a career officer in the Swedish Armed Forces so I know I am visible. Probably more than most people, even most women in male-dominated workplaces. Yesterday I wrote about my desire to blend in as “just another woman” and what do I think about this visibility then?

    Well, first of all I guess it depends if I am visible because I am a woman or because I have a transsexual background or possibly both. I have decided to be open with my background just because I have been in a closet to long already. I believe in honesty and openness. However, it is not the first thing I say to new people when I meet them. So that is where I want to blend in and be just another woman. However, that is where my visibility as the only woman kicks in. The hard part is that I have not yet learned to figure out when people notice me because I am a woman or because “there is something unusual with me”.

    Even though I am a career military officer I am passionate about IT and the possibilities it gives us. I am rather nerdy in the sense that I like to dive into the technical details of big enterprise systems such as EMC Documentum and FAST ESP to understand what makes them tick. So I guess that is unfortunately also something that is not so common among women. I hope this new website can be a place for all women (nerdy or not) in the IT-business to meet, learn, grow and get some inspiration from each other. I think that all women that work in something that is not generally seen as a typical female line of work sooner or later need to get together and compare notes from our daily experiences.

    So do I like the visibility…Not necessarily but I must confess that people seem to remember who I am which usually is good 🙂 Whether it is because I am a nerdy female missionary of ECM-systems or an unsual women with a transsexual background I don’t know. And in most cases I guess I have to learn not to care to much.

    So like Maria I get many arched eyebrows when I tell them what I do for a living. I guess people expect me to be a hair dresser or a nurse or something. However, it does not always bother me too much since I guess I am breaking new grounds both for women but especially for those with a transsexual background.

    An interesting meeting in Gothenburg

    The possibilities of the Internet never stops to amaze me. There is a real shift in the way we people interact and get together nowadays. As many of you know I am an avid internet user and I follow around 50 blogs or so using the RSS feature in Safari. Mainly I read blogs in three categories: personal/LGBT-blogs, Mac-related blogs and finally enterprise technology blogs. For a year or so I have been following the CTO Blog. One of the people who reguarly writes there is their Global CTO Andy Mulholland and I have read his posts with great interest and commented on a few of them. One of the posts I commented: Debunking the Myths of Long Tail and so much more!

    One day I got an email from him where he asked if I would like to meet him when he was visiting Sweden later on. Well, yesterday we met and had a rather intense session for two and half hours straight. It was really interesting to listen to the experience of a CTO of a major corporation both from their internal experiences but also from some of their customers. The main focus of the discussion was around people and how we can support our very human desire to collaborate and communicate. As a self-confessed IT-visionary I of course want utilize the most of the possibilities out there and create the truly integrated enterprise platform. On the other hand we are more and more challenged by easy-to-use tools that people find on the internet in the Web 2.0 era. Tools like Google Docs, Facebook, Twitter, Slideshare and many others. That means that a lot of people find the functionalities that the enterprise IT-departments offer rather cumbersome, limited and old-fashioned.

    Anyway, I left the meeting completely exhausted and had a lot to think about.

    Microsoft buys FAST Search & Transfer

    I must confess I was not happy to read this news. Microsoft apparently decided that their own Enterprise Search products was not good enough so once again they decided to go on a shopping spree. The norwegian company FAST who makes a very capable search platform was the target this time. FAST is currently one of the big players in the search engine market together with Autonomy (which have bought Verity) and possibly a few others like Endeca.

    The question now is what happens to the product. Both in terms of overall quality but also whether or not it will exist as a stand-alone product. The main intention for Microsoft is of course to boost the terrible search experience in today’s version of Sharepoint. Hopefully the 3500 installations of FAST ESP around the world is enough incentive to keep it as a stand-alone product. Another issue that concerns me is the technology aspect. Today, FAST is not built on Microsoft technology and the question what Microsoft thinks of having products based on Java, Python and even some open source components such as Tomcat. And what about FAST support for heavy-weight ECM platforms such as EMC Documentum? One thing is for sure, there will not be a FAST ESP for Mac OS X Server 🙂