Searching in SharePoint: IFilter & Indexing PDF Documents

I always tell everyone that SharePoint is very extensible and customizable, and this is really true. For example, let's take a look at the search functionality in SharePoint. By default only Office documents (which are in a document library for example) are indexed by the Indexing Service so they can be found by using the search functionality of SharePoint. Of course in the real world there are a lot more document types that are used, for example a lot of companies have PDF documents. So I get quite a lot questions of people asking if PDF documents can be indexed too. The good news is that the Indexing Service can be extended by using the IFilter interface:

The IFilter interface scans documents for text and properties (also called attributes). It extracts chunks of text from these documents, filtering out embedded formatting and retaining information about the position of the text. It also extracts chunks of values, which are properties of an entire document or of well-defined parts of a document. IFilter provides the foundation for building higher-level applications such as document indexers and application-independent viewers.

Even better news is that Adobe has a free IFilter DLL for PDF documents!

Adobe PDF IFilter is a free, downloadable Dynamic Link Library (DLL) file that provides a bridge between a Microsoft indexing client and a library of Adobe PDF files. It consists of code that understands the Adobe PDF file format as well as code that can interface with the indexing client. When an indexing client needs to index content from PDF documents, it will look in its registry for an appropriate DLL and it will find the Adobe PDF IFilter. Adobe PDF IFilter will return text to the indexing client. The indexing client will then index the results and return the appropriate results to the user.

For more info on how to install it, take a look at Eric Legault's post. If you look in the internet you'll find plenty of other IFilter implementations, for example this one for JPEG files. There's even an IFilter Shop! Some other cool IFilter implementations: Visio 2003, XML, MP3.

5 Comments

  • And don't forget that 'forgotten' pdf icon you have to add to the icons.xml file :)

  • So I was wondering why you were bothering to tell us about the pdf IFilter (as it ought by now to be old news) and nearly about to skip the rest, when I saw the last sentence. Now *that* was new to me! Thanks, Jan.



    (People looking for a pdf icon and where the icons.xml file is can search the WSS FAQ - www.wssfaq.com - for "pdf" or "icons" which will give them the item.)

  • It's not clear from any of the documentation I have found so far whether iFilters need to be installed on the SQL Server (where the content database is) or on the IIS Server (where Windows SharePoint Services is running).



    Anyone know?

  • Hi..jan..



    i tried to install the ifilter from the adobe's site and also have installed the filter. but now will my query work the same way as it was before.. like it would now search for pdf files as well.. so m i suppose to add some lines of code to accomplish "search for text in pdf file" or it shall automatically be done by installing the ifilter.

    I use ms indexing service, iis 6 and win xp.

    For my search page using asp is

    ixsso.query object.

    So can u plz guide if i need to do some changes or it should run the same way but give me results for pdf files as well



    cheers

    sr301

  • UPDATE: By trial and error it has become clear that if you have SQL on a different server then you need to install the iFilter on the SQl Server not on the IIS server.

Comments have been disabled for this content.