Tuesday, September 21, 2010

Setup the TIFF iFilter for SharePoint 2010

If your running SharePoint 2010 on Windows Server R2 or Windows 7 the Tiff iFilter is a great add on that will OCR all your scanned Tiff files. The feature is turned off by default due to the additional load it can put on processing, but its easy to enable and greatly benefits searching.

The first step to enable the iFilter is adding the feature to the server. This is done through Server Manger. Click on Features in the tree and then Add Features on the right.

image

Check the box next to Windows Tiff IFilter.

image

Click next to confirm your selection then install the feature.

Now the the feature is installed it needs to be enabled and configured. Click on the start button and type gpedit.msc into the search window to launch the local group policy editor.

image

Under Computer Configuration select Administration Templates, then select the OCR folder.

image

This is were you will configure OCR for Tiffs.

image

The first option is to force OCR for all pages in a tiff. The TIFF iFilter attempts to optimize performance by skipping blank pages or pages that have non-textual content such as pictures. In my scenario almost every tiff was an invoice so I enabled this to ensure that no pages were missed during the OCR process.

image

The next setting is for the OCR language that you wish to check for. By default this will be the server system language, however if you have several different languages that you are expecting you can enable them here as long as they are part of the same code page. For example you could check for English, Dutch, French, and Italian since they all belong to the Western European code page. If you wanted to look for Japanese and English though you would have to use some other means of OCR since the Tiff iFilter does not support using more than one code page. Since I was only using English Tiff I left this setting as “Not Configured”.

image

If you have already installed SharePoint prior to these steps you will need to restart the associated SharePoint services. I was using a stand alone development environment so I was able to just reboot my server.

Now that you have the Tiff iFilter installed a full crawl will need to be run to OCR the documents. To run a full crawl open up central administration and click on manage service application.

image

Click on Search Service Application.

image

Under crawling click content sources.

image

Start a full crawl.

image

Once the crawl has finished go to your document library and search for some text that should be in one of your tiffs to see if everything worked.

image

Results!

image

Getting full text search results from the OCR data collected by the Tiff iFilter is easy, free, and a greatly improves searching in SharePoint 2010.

No comments:

Post a Comment