Data-deduplication is so cool!

I just read a blog post by Chuck Hollis at EMC about data deduplication and remembered how important that is to an EMC-platform. However, to me it is not just a way to improve or optimize storage, it is a critical part of any information architecture.

Data deduplication should be a core feature in platforms such as EMC Documentum which put the use of this feature up on the business side of things. Since everything stored in Documentum is an object which may or may not have an attachment in the form of a document that objekt can be exposed to users in one or many folders. The key thing is that these linkages opens up for interesting ways of using data deduplication.

Imagine a corporate environment with thosands of users. A lot of important documents in the company will be used many times by many people in different contexts. Since many of them likely are used as references in different projects such as corporate strategy, marketing documents so forth they are essentially read-only. However since a lot of users need these exact documents they will be imported many times in the repository and not only taking up unneccessary space but also create problems when these documents are updated. People will ask ”hey, are all of these documents I found the same version?”

So my solution would be that we have a job running on import that highlights to the user that this particular document is already available and ask the user if they want to use the existing one instead. That renders a link being created to that document in their Folder or Project space.

We can also continously run a job doing reports on the the current status of the repository to see how many duplicates we have and what kind of content is duplicated the most. Documentum Reporting Services could be used to do that for instance. If we want a proactive Knowledge Management function they can either consolidate that directly or create tasks to users asking them if they agree to deduplicate some of their content. However, we need to push hard to have someone to create a really cool and usable interface to manage these ”content conflicts”.

This will further help companies manage vital documents and further reduce the confusion of which document is the correct and updated one.

From a technology stand-point the first step would be to use a simple hash function to find exact duplicates but the next step should be to use vector-based indexing technology found in both Autonomy and FAST ESP to also detect similarity levels and possibly use that for further refinement of similar content. That way we de-duplicate the same content found in different formats and have the option of removing one of them or maybe just make one the rendition of the other.

Share

Figuring out the collaboration tools in our intranet toolbox

I see myself as a translator between people speaking military and information technology languages. These two groups come from different worlds and have very different views of the world. The military people likes to speak about requirements from a very abstract standpoint and think the details should be worked out by someone else. They rarely have any knowledge of what the market can offer so when they are invited to a technical demo of somekind they usually approve it if it looks somewhat useful. The IT-people on the other hand have a tendency to strive for cheap, simple and safe solutions which will get adopted if some people in uniform accepts it. So what does that leave us?

Well, it means that if a military officer see a need to collaborate around a text and they are shown a wiki they almost immediately embrace it since it does not really matter if it is a good solution. It is far better than anything they have to today. That is why I think we need to break down our needs into a basic set of tools and define what makes them differ. These tools need to be mutually exclusive:

  • Asynchronous messaging (leave a message that someone picks up later
  • Real-time communication (text, audio and video chat and virtual meeting rooms
  • Presence (know what other people are doing right now)
  • Real-time collaboration on content (Multiple people writing in the same document at the same time)
  • Asynchronous collaboration on content (Multiple people writing at different times in the same document)
  • And then you could add the spatial dimension to this. It is way different to collaborate if you are at the same place or not.

    So that means that a Forum is basically a thread-based structuring of asynchronous messaging while a blog is a individually based contribution of content that can be asynchronously referenced and commented. On the other hand an application such as CoWord is doing real-time collaboration on content. The point of all this is to have some common set of references when evaluation different products and their potential use in the organisation. There is a huge difference in the way a real-time messaging tool can be used versus a tool which is asynchronous and lets you leave a message. Using a wiki in a live virtual meeting is therefore not the best solution.

    A wiki on the other hand is a good example of asynchronous collaboration on content. So is most features found in enterprise content management systems such as Documentum and Alfresco.

    Real-time collaborative editors have not really been a success yet even though their seem to be an obvious need for it in today’s connected world. Wikipedia has a nice article summarizing what is available today. Imaging four people working on a operation order where they each have their own cursor and writing at the same time. Science-fiction? No, try CoWord…

    This toolset also gives us the chance to question product or rather protocol selections we take for granted. Take email for instance. Most people agree that we need that. The concept is familiar for many people and we know what it is used for. Or do we? Since it is basically the essence of asynchronous messaging I argue that it is exactly what we need and not email which is just one implementation of it. An implementation based on standards developed in the early 70-ies which is hopelessly out-dated when it comes to things like security and efficiency. So maybe we should require that particular kind of messaging instead and open up for other technical solutions for it rather than POP/SMTP/IMAP. This is of course particularly appropriate in closes systems which has no connections to the internet which means that ”email-interoperability” is no issue.

    So before we cheer at some new collaboration or social media application we should analyze them against our toolbox items and it will be much easier to find out if we like the product or maybe just some part of it.

    There are more stuff in the toolbox but I will come back to them in a later post.

    Share

    About visibility when working among men

    Today the Swedish site IT-tjej (IT-girl) has been launched here in Sweden and I found a lot of interesting articles that made me think. Those of who know me know that I am not especially fond of Microsoft but the article featuring their female IT-evangelist Maria Lundahl was really intersting for me. She said that she enjoy the visibility she gets in the male-dominated IT-business and that people often are surprised when she explains what she is working with and why she loves it.

    I am probably the only open transsexual woman working as a career officer in the Swedish Armed Forces so I know I am visible. Probably more than most people, even most women in male-dominated workplaces. Yesterday I wrote about my desire to blend in as ”just another woman” and what do I think about this visibility then?

    Well, first of all I guess it depends if I am visible because I am a woman or because I have a transsexual background or possibly both. I have decided to be open with my background just because I have been in a closet to long already. I believe in honesty and openness. However, it is not the first thing I say to new people when I meet them. So that is where I want to blend in and be just another woman. However, that is where my visibility as the only woman kicks in. The hard part is that I have not yet learned to figure out when people notice me because I am a woman or because ”there is something unusual with me”.

    Even though I am a career military officer I am passionate about IT and the possibilities it gives us. I am rather nerdy in the sense that I like to dive into the technical details of big enterprise systems such as EMC Documentum and FAST ESP to understand what makes them tick. So I guess that is unfortunately also something that is not so common among women. I hope this new website can be a place for all women (nerdy or not) in the IT-business to meet, learn, grow and get some inspiration from each other. I think that all women that work in something that is not generally seen as a typical female line of work sooner or later need to get together and compare notes from our daily experiences.

    So do I like the visibility…Not necessarily but I must confess that people seem to remember who I am which usually is good 🙂 Whether it is because I am a nerdy female missionary of ECM-systems or an unsual women with a transsexual background I don’t know. And in most cases I guess I have to learn not to care to much.

    So like Maria I get many arched eyebrows when I tell them what I do for a living. I guess people expect me to be a hair dresser or a nurse or something. However, it does not always bother me too much since I guess I am breaking new grounds both for women but especially for those with a transsexual background.

    Share

    An interesting meeting in Gothenburg

    The possibilities of the Internet never stops to amaze me. There is a real shift in the way we people interact and get together nowadays. As many of you know I am an avid internet user and I follow around 50 blogs or so using the RSS feature in Safari. Mainly I read blogs in three categories: personal/LGBT-blogs, Mac-related blogs and finally enterprise technology blogs. For a year or so I have been following the CTO Blog. One of the people who reguarly writes there is their Global CTO Andy Mulholland and I have read his posts with great interest and commented on a few of them. One of the posts I commented: Debunking the Myths of Long Tail and so much more!

    One day I got an email from him where he asked if I would like to meet him when he was visiting Sweden later on. Well, yesterday we met and had a rather intense session for two and half hours straight. It was really interesting to listen to the experience of a CTO of a major corporation both from their internal experiences but also from some of their customers. The main focus of the discussion was around people and how we can support our very human desire to collaborate and communicate. As a self-confessed IT-visionary I of course want utilize the most of the possibilities out there and create the truly integrated enterprise platform. On the other hand we are more and more challenged by easy-to-use tools that people find on the internet in the Web 2.0 era. Tools like Google Docs, Facebook, Twitter, Slideshare and many others. That means that a lot of people find the functionalities that the enterprise IT-departments offer rather cumbersome, limited and old-fashioned.

    Anyway, I left the meeting completely exhausted and had a lot to think about.

    Share