Category: Technology

Data-deduplication is so cool!

I just read a blog post by Chuck Hollis at EMC about data deduplication and remembered how important that is to an EMC-platform. However, to me it is not just a way to improve or optimize storage, it is a critical part of any information architecture.

Data deduplication should be a core feature in platforms such as EMC Documentum which put the use of this feature up on the business side of things. Since everything stored in Documentum is an object which may or may not have an attachment in the form of a document that objekt can be exposed to users in one or many folders. The key thing is that these linkages opens up for interesting ways of using data deduplication.

Imagine a corporate environment with thosands of users. A lot of important documents in the company will be used many times by many people in different contexts. Since many of them likely are used as references in different projects such as corporate strategy, marketing documents so forth they are essentially read-only. However since a lot of users need these exact documents they will be imported many times in the repository and not only taking up unneccessary space but also create problems when these documents are updated. People will ask “hey, are all of these documents I found the same version?”

So my solution would be that we have a job running on import that highlights to the user that this particular document is already available and ask the user if they want to use the existing one instead. That renders a link being created to that document in their Folder or Project space.

We can also continously run a job doing reports on the the current status of the repository to see how many duplicates we have and what kind of content is duplicated the most. Documentum Reporting Services could be used to do that for instance. If we want a proactive Knowledge Management function they can either consolidate that directly or create tasks to users asking them if they agree to deduplicate some of their content. However, we need to push hard to have someone to create a really cool and usable interface to manage these “content conflicts”.

This will further help companies manage vital documents and further reduce the confusion of which document is the correct and updated one.

From a technology stand-point the first step would be to use a simple hash function to find exact duplicates but the next step should be to use vector-based indexing technology found in both Autonomy and FAST ESP to also detect similarity levels and possibly use that for further refinement of similar content. That way we de-duplicate the same content found in different formats and have the option of removing one of them or maybe just make one the rendition of the other.

Figuring out the collaboration tools in our intranet toolbox

I see myself as a translator between people speaking military and information technology languages. These two groups come from different worlds and have very different views of the world. The military people likes to speak about requirements from a very abstract standpoint and think the details should be worked out by someone else. They rarely have any knowledge of what the market can offer so when they are invited to a technical demo of somekind they usually approve it if it looks somewhat useful. The IT-people on the other hand have a tendency to strive for cheap, simple and safe solutions which will get adopted if some people in uniform accepts it. So what does that leave us?

Well, it means that if a military officer see a need to collaborate around a text and they are shown a wiki they almost immediately embrace it since it does not really matter if it is a good solution. It is far better than anything they have to today. That is why I think we need to break down our needs into a basic set of tools and define what makes them differ. These tools need to be mutually exclusive:

  • Asynchronous messaging (leave a message that someone picks up later
  • Real-time communication (text, audio and video chat and virtual meeting rooms
  • Presence (know what other people are doing right now)
  • Real-time collaboration on content (Multiple people writing in the same document at the same time)
  • Asynchronous collaboration on content (Multiple people writing at different times in the same document)
  • And then you could add the spatial dimension to this. It is way different to collaborate if you are at the same place or not.

    So that means that a Forum is basically a thread-based structuring of asynchronous messaging while a blog is a individually based contribution of content that can be asynchronously referenced and commented. On the other hand an application such as CoWord is doing real-time collaboration on content. The point of all this is to have some common set of references when evaluation different products and their potential use in the organisation. There is a huge difference in the way a real-time messaging tool can be used versus a tool which is asynchronous and lets you leave a message. Using a wiki in a live virtual meeting is therefore not the best solution.

    A wiki on the other hand is a good example of asynchronous collaboration on content. So is most features found in enterprise content management systems such as Documentum and Alfresco.

    Real-time collaborative editors have not really been a success yet even though their seem to be an obvious need for it in today’s connected world. Wikipedia has a nice article summarizing what is available today. Imaging four people working on a operation order where they each have their own cursor and writing at the same time. Science-fiction? No, try CoWord…

    This toolset also gives us the chance to question product or rather protocol selections we take for granted. Take email for instance. Most people agree that we need that. The concept is familiar for many people and we know what it is used for. Or do we? Since it is basically the essence of asynchronous messaging I argue that it is exactly what we need and not email which is just one implementation of it. An implementation based on standards developed in the early 70-ies which is hopelessly out-dated when it comes to things like security and efficiency. So maybe we should require that particular kind of messaging instead and open up for other technical solutions for it rather than POP/SMTP/IMAP. This is of course particularly appropriate in closes systems which has no connections to the internet which means that “email-interoperability” is no issue.

    So before we cheer at some new collaboration or social media application we should analyze them against our toolbox items and it will be much easier to find out if we like the product or maybe just some part of it.

    There are more stuff in the toolbox but I will come back to them in a later post.

    About visibility when working among men

    Today the Swedish site IT-tjej (IT-girl) has been launched here in Sweden and I found a lot of interesting articles that made me think. Those of who know me know that I am not especially fond of Microsoft but the article featuring their female IT-evangelist Maria Lundahl was really intersting for me. She said that she enjoy the visibility she gets in the male-dominated IT-business and that people often are surprised when she explains what she is working with and why she loves it.

    I am probably the only open transsexual woman working as a career officer in the Swedish Armed Forces so I know I am visible. Probably more than most people, even most women in male-dominated workplaces. Yesterday I wrote about my desire to blend in as “just another woman” and what do I think about this visibility then?

    Well, first of all I guess it depends if I am visible because I am a woman or because I have a transsexual background or possibly both. I have decided to be open with my background just because I have been in a closet to long already. I believe in honesty and openness. However, it is not the first thing I say to new people when I meet them. So that is where I want to blend in and be just another woman. However, that is where my visibility as the only woman kicks in. The hard part is that I have not yet learned to figure out when people notice me because I am a woman or because “there is something unusual with me”.

    Even though I am a career military officer I am passionate about IT and the possibilities it gives us. I am rather nerdy in the sense that I like to dive into the technical details of big enterprise systems such as EMC Documentum and FAST ESP to understand what makes them tick. So I guess that is unfortunately also something that is not so common among women. I hope this new website can be a place for all women (nerdy or not) in the IT-business to meet, learn, grow and get some inspiration from each other. I think that all women that work in something that is not generally seen as a typical female line of work sooner or later need to get together and compare notes from our daily experiences.

    So do I like the visibility…Not necessarily but I must confess that people seem to remember who I am which usually is good 🙂 Whether it is because I am a nerdy female missionary of ECM-systems or an unsual women with a transsexual background I don’t know. And in most cases I guess I have to learn not to care to much.

    So like Maria I get many arched eyebrows when I tell them what I do for a living. I guess people expect me to be a hair dresser or a nurse or something. However, it does not always bother me too much since I guess I am breaking new grounds both for women but especially for those with a transsexual background.

    An interesting meeting in Gothenburg

    The possibilities of the Internet never stops to amaze me. There is a real shift in the way we people interact and get together nowadays. As many of you know I am an avid internet user and I follow around 50 blogs or so using the RSS feature in Safari. Mainly I read blogs in three categories: personal/LGBT-blogs, Mac-related blogs and finally enterprise technology blogs. For a year or so I have been following the CTO Blog. One of the people who reguarly writes there is their Global CTO Andy Mulholland and I have read his posts with great interest and commented on a few of them. One of the posts I commented: Debunking the Myths of Long Tail and so much more!

    One day I got an email from him where he asked if I would like to meet him when he was visiting Sweden later on. Well, yesterday we met and had a rather intense session for two and half hours straight. It was really interesting to listen to the experience of a CTO of a major corporation both from their internal experiences but also from some of their customers. The main focus of the discussion was around people and how we can support our very human desire to collaborate and communicate. As a self-confessed IT-visionary I of course want utilize the most of the possibilities out there and create the truly integrated enterprise platform. On the other hand we are more and more challenged by easy-to-use tools that people find on the internet in the Web 2.0 era. Tools like Google Docs, Facebook, Twitter, Slideshare and many others. That means that a lot of people find the functionalities that the enterprise IT-departments offer rather cumbersome, limited and old-fashioned.

    Anyway, I left the meeting completely exhausted and had a lot to think about.

    IT-tjej / IT-girl website

    IDG håller på att lansera en ny webbsajt inriktad på IT-tjejer. Tycker det verkar vara ett kul intiativ eftersom det ibland känns lite ensamt att vara tjej ha ett brinnande intresse för IT-frågor. De ska tydligen ha en releasefest torsdagen den 6 mars och jag ska se om jag kan/får gå.

    A new website aimed towards women/girls working in the IT-business is on its way. I think that is really cool since it sometimes feels like I am the only girl with a strong interest for IT-issues.

    IT-tjej

    Sci-Fi weapons soon a reality?

    To me one of the things that makes Sci-Fi shows so interesting is the feeling of being able to sneak-peak into a future where much more advanced technology is available to us. In Stargate SG-1 the story begins with when the team is confronted with a seemingly unbeatable enemy – the Gou’old. Later in the show they meet the even more powerful Asgaards appears which possess really advanced space ships and weapons. One of these impressive weapons from the Asgaards is the Rail Gun that one of the ships from Stargate Command gets outfitted with. So, I was really amazed when I read this article about, yes that is right, a rail gun being delivered to the US Navy. Now I am just waiting for space ships that are bigger than aircraft carriers that can travel many times that speed of light 🙂

    Microsoft buys FAST Search & Transfer

    I must confess I was not happy to read this news. Microsoft apparently decided that their own Enterprise Search products was not good enough so once again they decided to go on a shopping spree. The norwegian company FAST who makes a very capable search platform was the target this time. FAST is currently one of the big players in the search engine market together with Autonomy (which have bought Verity) and possibly a few others like Endeca.

    The question now is what happens to the product. Both in terms of overall quality but also whether or not it will exist as a stand-alone product. The main intention for Microsoft is of course to boost the terrible search experience in today’s version of Sharepoint. Hopefully the 3500 installations of FAST ESP around the world is enough incentive to keep it as a stand-alone product. Another issue that concerns me is the technology aspect. Today, FAST is not built on Microsoft technology and the question what Microsoft thinks of having products based on Java, Python and even some open source components such as Tomcat. And what about FAST support for heavy-weight ECM platforms such as EMC Documentum? One thing is for sure, there will not be a FAST ESP for Mac OS X Server 🙂

    Custom content model in Alfresco 2.9C

    Hmm…I guess I am supposed to working on the movie project. However, during Christmas I had the usual technical discussions with my brother and I got all excited about Enterprise Content Management again. I decided to show him some new features in Alfresco 2.9 Community edition (still in development though). Since I don’t work full time configuring *nix-systems I tend to forget some of the things and got some help from him. After that I was all inspired to try to set up a demo system for the NGO RFSL that I am working for. I believe that almost every organisation need some improvements in information management and RFSL is of course no difference.

    We rely heavily on email and their is a file server that some uses. Wouldn’t it be great if we could have central repository where we can store all our content in a smarter way. That includes support for metadata, versioning, permission, language versions and workflow for our most common tasks.

    So I decided to dive into Alfresco once more and not just try to master configuration on Mac OS X Server but also doing real customizations on the content model and the user interface.

    I first started reading the excellent guide provided by the people at ECM Architect which gave me a good start. However, rather soon I discovered that some of the stuff that I previously done in Documentum Application Builder (DAB) was not covered. So I had to resort to the Alfresco wiki and their forums to find the answers.

    Unlike using DAB development for Alfresco is done by editing XML-files that resides in /tomcat/shared/classes/alfresco/extension where both the content model files resides, the configuration to the web client and finally a text file with mappings between attributes and the text that should be visible in the labels.

    When all these things have been edited and saved the next thing is to restart Alfresco’s tomcat server and keep a close look at the alfresco.log file where all potential error are shown. It usually ends up being a Java trace segment but often with references to line numbers so it is a bit easier to find the errors.

    What I was trying to achieve was the same things that I have done in Documentum a month ago. Creating a content model with one base type which I can then inherit attributes from. However, finding the proper structure of that is not easy. My first idea was to have a “rfsl:doc”-type as the base one and then inherit “rfsl:video” and “rfsl:audio” from them. However, since aspects provide a really good and flexible way of handling attributes it is not easy to decide which attributes that should be in the type and which should be in the aspect. It is also important to bear in mind that having content types for like presentations (which could be in both PowerPoint, PDF, Impress and Keynote-format) is one way of creating easy filtering of content. That means having a “rfsl:pres” type but if that will have more or less exactly the same attributes as “rfsl:doc” it boils down to just a filtering issue and the question is if it is worth it. Will users understand how to select which content type to use?

    Another important thing for me was to have drop-down menues with values instead of empty fields. In Documentmum that is called Value Assistance and is merely a text string which has nothing to do with the actual content model. Instead the value assistance is to be seem as a template to create entries in the repository. That means that it is possible to remove stuff from value assistance while the value is still in the repository.

    In Alfresco this is handled through constraints in the actual content model. Those triggers the values described in the constraints list forming a drop-down menu in the web client interface. However, the values are constraints which means that there are validations going on. So if you have removed or changed spelling of something in the constraints you will get and error when you try to display a content object with the previous value in the repository.

    Constraints should come first in the content model file and looks like this:

    Header Separator Generator

    This particular constraint is used in an attribute called “rfsl:documentkind”. Remember that the reference in the constraints section must be exactly the same as the name of the attribute.
    Another big headache was to organize the attributes in the web client interface. Again in DAB there is something called “display configurations” which refers to different Documentum clients being used in the installation. In them you just reorder the attributes as you like and insert line dividers as you see fit. For Alfresco this done in the file called “web-client-config-custom.xml”.

    The order in the file determine the order they are shown in the web client and if you want a separator it can be created using:

    Header Separator Generator only

    And in context it could look like this if we wanted a separator in the beginning:

    Header Separator Generator

    Segway declared a moped

    Segway

    Today it was an article in a big newspaper called Dagens Nyheter (DN) about a disabled person who wanted to use a SegWay personal transporter to increase his mobility. Unfortunately the The Swedish Road Administration (Vägverket) has declared the Segway to be moped. However, they also decided that it had not all the equipment needed to be used as one so it is not allowed on public roads. And if you are using one in a closed area you are required to use a moped helmet since it is almost a moped. So this disabled person needs a special permit to be able to legally use his beloved Segway. Apparently a lot of paper work with doctor’s statement and all.

    This is so typically Sweden. The Segway is ok to use in the US and a lot of European countries but not here of course. I had the pleasure of using it several times and my experience is that it is far different from a moped. And you usually ride differently and more carefully than a moped. Show me a moepd that can maneuver that precise and to be able to stop almost as once. No, that is just stupid. I had actually been thinking of getting one if I eventually get accepted at the National Defense College here in Stockholm. It would be so cool and so “me” to ride one to work for that year. Well, I guess not…I have to go to Disney World to ride one again I guess. Swedes will certainly miss a whole lot of fun with those futuristic machines…

    Microsoft and its methods to push their own formats as a standard

    I can’t help saying “see for yourself how they really are” when I am reading about the latest scandal in the Swedish Standardization Organisation (SIS) regarding standards for document formats. Open Document (ODF) used by the open source package Open Office/Star Office/NeoOffice has already been selected as a standards by ISO. However, as usual Microsoft has no intention of supporting this and is instead pushing its own XML-based format called OOXML. This standard is up for a vote in the ISO in a few days and Sweden’s SIS had recently a vote to decide how Sweden will vote. Microsoft used promises of marketing assistance to urge their partners to become members of SIS and thus being able to vote. As a result the majority shifted and OOXML was accepted as a standard. Computer Sweden have written about it here. Fortunately the SIS have now declared the vote not guilty based on formal reasons which Dagens Nyheter writes about here. Why is it so hard for Microsoft to participate in international standardization efforts instead of insisting of doing it alone all the time? Don’t they realize that all giants sooner or later will lose its grips on the market just like IBM did 20 years ago….

    IBM has now become a good example how to embrace standards and still make money. An even better example is Sun Microsystems that is showing great profits after lots of open source initatives. Even Apple, a company that lots of people used to think of as a very proprietary company nowadays not only implements every standard it finds but base their whole operating system on open source code in the form of FreeBSD/Darwin.