Getting Microsoft Office SharePoint Server 2007 to search inside PDF files

Posted by Adrian O'Connor Thu, 06 Dec 2007 16:40:00 GMT

It is possible to get MOSS 2007 to search inside your PDF documents. This solution has not been tested (by us) on WSS3, so we’d be interested to hear back from anybody who tries.

Acrobat reader

Previously, searching PDF documents with Index Server required an IFilter plugin from Adobe. The most recent version of the IFilter is 6.0, and it is now several years old.

Adobe have taken a different approach now, and bundled the IFilter as part of the PDF reader application. This means that you need to download and install Adobe Acrobat Reader on your SharePoint server.

http://www.adobe.com/products/reader/

SharePoint configuration

There is very little configuration to be done inside SharePoint itself, aside from configuring the search service to index files with the extension .pdf.

You do this in SharePoint 3.0 Central Administration (see the Microsoft Office Server folder in the Start Menu on the server).

You should open the search configuration screen in the Shared Services section. Click File Types, and in the File Types list click Add New. You should enter pdf and click OK.

Registry Changes

There are two registry keys that you need to change. These will register the Adobe PDF IFilter with the Office Search service.

The values that need to be changed are:
HKEY_LOCAL_MACHINE
SOFTWARE\
  Microsoft\
    Office Server\
      12.0\
        Search\
          Setup\
            ContentIndexCommon\
              Filters\
                Extension\
                  .pdf

HKEY_LOCAL_MACHINE
SOFTWARE\
  Microsoft\
    Shared Tools\
      Web Server Extensions\
        12.0\
          Search\
            Setup\
              ContentIndexCommon\
                Filters\
                  Extension\
                    .pdf

Both values should be changed to:

{E8978DA6-047F-4E3D-9C78-CDBE46041603}

Add the Adobe Program Directory to the System Path

So that the search service can find the DLL which provides the IFilter service, you must add the Adobe Acrobat Reader program directory to the system path.

Right click My Computer > Properties Click Advanced > Environment Variables

In the lower part of the window, scroll down to find the variable Path. Double click on it, and at the end of the ‘Variable value’ add ;C:\Program Files\Adobe\Reader 8.0\Reader

Note the semi-colon – that separates this path from the one before it. Note also that if you installed to a non-default location that you must change this value accordingly.

Restart the Office Search service

To register your changes, you should restart the Office Search service. Open a command shell (Start -> Run -> cmd [press enter]) and type the following commands:

sc stop osearch [press enter]
sc start osearch [press enter]

Finally

If you already have PDF documents in SharePoint that you want to be included in the search, you must Reset all crawled content in Search Settings and then, in Content sources and crawl settings, start a new Full Crawl. These screens are in the SharePoint Central Configuration application’s Shared Services that we used in step 2.

Comments

Leave a response

Comments