OnDemand User Group

Support Forums => Report Indexing => Topic started by: teeraw on March 02, 2016, 02:49:13 AM

Title: Steps of PDF Indexing with PPD
Post by: teeraw on March 02, 2016, 02:49:13 AM
Hello,

I'm new to PDF indexing with PPD, after I tried to take a look documentation (Also included this forum: http://www.odusergroup.org/forums/index.php?topic=1724.0)

But still not clear on step by step for this new PDF indexing.
So can anyone help to confirm my understanding as below? (If it's wrong, please suggest)

1) We need to have OD Server 9.5 with PDF Indexing Module installed.
2) We need to have PDF with PPD generated by some tools supported
3) Take a look on index built in PDF file by right-click on file then open with text editor
4) Find /PieceInfo tag, and take note for indexes listing
5) Create application group with same indexes name to item#4 and some special system indexes as needed (load_date, app_id)
6) Create application
      - View Information: Data Type: PDF , or Data Type: User definied with pdf extension
      - Indexer Information:
INDEXSTARTBY=1
RESTYPE=ALL
INDEXMODE=INTERNAL
      - Load Information: Revised to matched indexes listing in item#4

7) Create folder
8) Test arsload


Note: More questions, can we load PDF file with password protected?

Many Thanks.
Teera W.





Title: Re: Steps of PDF Indexing with PPD
Post by: Lars Bencze on March 07, 2016, 05:09:07 AM
Hi Teera,

Happy to give you some feedback.

1: You can also use CMOD 9.0, with PDF Indexer, and THEN install patch 9.0.0.3. But 9.5 is perfect.
2, 3, 4 & 5: Well, the normal way to go would probably be to tell the person who creates the PDF with PPD in it which index fields to include... but your method is not wrong, it will work.
6: DEFINITELY "PDF" and not User defined...
Indexer information; don't forget to set Indexer = "PDF"...  ;) Your parameters are correct.
Load Information; remember to set "Data Compression", you have loads of data to gain by using compression now.
Our tests show that sizes went from 15-40% compression without the PDF Indexer to 70-80 % compression with. We use "OD77".

7: Yes.

Your note; Bud Paton can tell you, but I am about 90% sure that I read somewhere that this can NOT be done with password-protected PDF files.
If your files are password protected, I would suggest you go with another indexing method.
Title: Re: Steps of PDF Indexing with PPD
Post by: teeraw on March 11, 2016, 03:45:59 AM
Hi Teera,

Happy to give you some feedback.

1: You can also use CMOD 9.0, with PDF Indexer, and THEN install patch 9.0.0.3. But 9.5 is perfect.
2, 3, 4 & 5: Well, the normal way to go would probably be to tell the person who creates the PDF with PPD in it which index fields to include... but your method is not wrong, it will work.
6: DEFINITELY "PDF" and not User defined...
Indexer information; don't forget to set Indexer = "PDF"...  ;) Your parameters are correct.
Load Information; remember to set "Data Compression", you have loads of data to gain by using compression now.
Our tests show that sizes went from 15-40% compression without the PDF Indexer to 70-80 % compression with. We use "OD77".

7: Yes.

Your note; Bud Paton can tell you, but I am about 90% sure that I read somewhere that this can NOT be done with password-protected PDF files.
If your files are password protected, I would suggest you go with another indexing method.

Thank you, Lars Bencze.
Title: Re: Steps of PDF Indexing with PPD
Post by: Santosh Panuganti on April 29, 2016, 04:36:53 AM
Hi Everyone,

I was trying to load the Page Piece Dictionary PDF files into CMOD 9.5 with below information

%%/PieceInfo <</IBM-ODIndexes <</Private<</DocId(15)/AcctNumber(123)/TaxYear(2015)/StmtDate(04/28/16)>>/LastModified(D:20160428210404Z)>>>>

but i was not able to load pdf into CMOD, because indexing is failing with below message. Please help me here, how can i achieve loading file into CMOD.

ARS4302I Indexing started, 3125631 bytes to process
ARS4901I INDEXSTARTBY=1
ARS4901I RESTYPE=ALL
ARS4901I INDEXMODE=INTERNAL
ARS4901I PARMDD=/tcsan/TTest/PRD.1099.WOD.C.PDF.20160428230224.parm
ARS4901I INPUTDD=/tcsan/TTest/PRD.1099.WOD.C.PDF.20160428230224
ARS4901I OUTPUTDD=/tcsan/TTest/PRD.1099.WOD.C.PDF.20160428230224.out
ARS4901I INDEXDD=/tcsan/TTest/PRD.1099.WOD.C.PDF.20160428230224.ind
ARS4901I RESOBJDD=/tcsan/TTest/PRD.1099.WOD.C.PDF.20160428230224.res
ARS4902I Number of input pages = 1174
ARS4940E Index not found by page 1
ARS4922I ARSPDOCI 9.5.0.4 completed code 1
ARS4309E Indexing failed
ARS4318E Processing failed for file >PRD.1099.WOD.C.PDF.20160428230224<
ARS4327E Processing has stopped.  The remaining files will NOT be processed.