Wednesday, May 19, 2004 9:45 PM
Searching in SharePoint: IFilter & Indexing PDF Documents
I always tell everyone that SharePoint is very extensible and customizable, and this is really true. For example, let's take a look at the search functionality in SharePoint. By default only Office documents (which are in a document library for example) are indexed by the Indexing Service so they can be found by using the search functionality of SharePoint. Of course in the real world there are a lot more document types that are used, for example a lot of companies have PDF documents. So I get quite a lot questions of people asking if PDF documents can be indexed too. The good news is that the Indexing Service can be extended by using the IFilter interface:
The IFilter interface scans documents for text and properties (also called attributes). It extracts chunks of text from these documents, filtering out embedded formatting and retaining information about the position of the text. It also extracts chunks of values, which are properties of an entire document or of well-defined parts of a document. IFilter provides the foundation for building higher-level applications such as document indexers and application-independent viewers.
Even better news is that Adobe has a free IFilter DLL for PDF documents!
Adobe PDF IFilter is a free, downloadable Dynamic Link Library (DLL) file that provides a bridge between a Microsoft indexing client and a library of Adobe PDF files. It consists of code that understands the Adobe PDF file format as well as code that can interface with the indexing client. When an indexing client needs to index content from PDF documents, it will look in its registry for an appropriate DLL and it will find the Adobe PDF IFilter. Adobe PDF IFilter will return text to the indexing client. The indexing client will then index the results and return the appropriate results to the user.
For more info on how to install it, take a look at Eric Legault's post. If you look in the internet you'll find plenty of other IFilter implementations, for example this one for JPEG files. There's even an IFilter Shop! Some other cool IFilter implementations: Visio 2003, XML, MP3.
Filed under: SharePoint