Saturday, December 4, 2010

automatic document classification with Alfresco Part 2

In the first part of this article, i explained how you can use Lucene to query a document (Word, PDF etc...), and find matches for specific keywords, which was necessary for us in order to automatically identify the document's category based on its content.

We've chosen a simple approach to demonstrate the automatic classification extension : if a document contains the name of a category, then it belongs to it, of course we can use other approaches like assigning multiple keywords to a category, example : if a document contains one of the following words "java, .Net, c#..." then assign it to category "Software development", it can easily be implemented once you finish reading and understanding this article, and of course how you implement it depends on your specific needs, you might need some more advanced classification algorithm.

Tuesday, November 30, 2010

Alfresco automatic document classification : Part 1

Alfresco is capable of handling multiple classifications, or hierarchies of classification, it's a very useful feature, and can make your life a lot easier when looking for documents, especially the ones with no indexed content like pictures, scanned documents etc...
Classifying a document in Alfresco can be as easy as few clicks on the browser, however it can be very time-consuming process if you are uploading many documents every day, or if you are migrating to Alfresco : Imagine having to manually classify a few thousands of documents!
If you are still classifying documents manually, analyzing their content, and sorting them into categories, you might be interested in finding out how you can extend Alfresco to automatically classify your documents for you.

Friday, November 26, 2010

Getting started with Alfresco

During my internship at TGR, one of the project's requirements was indexing and managing documents, and that was my first experience with Alfresco,  which is an open source Enterprise Content Management (ECM), it combines a collection of content-centric technologies like Document Management (DM), Records Management (RM), and other technologies that should make your life, if your field of work is Content Management, easier.

Customizing Alfresco can be a bit tricky, and hard to grasp at first, but after doing it few times, you start to get the feel of it, and this post, as well as the future articles about Alfresco, are all here for you to get there.