Author Topic: PDF Indexer issue (Read 3216 times)

DDP021 · « **on:** August 08, 2013, 04:16:23 AM »

Hello all....Just wanted to post an issue we found when attempting to load a PDF to OnDemand...We are running the PDF through the "Report Wizard" in order to get the indexing parameters..What we found is certain PDF's, depending on what created them, will result in errors when loading the PDF...We use Adobe Acrobat...But we had a who was using another PDF producer (iText 5.4.3)...What is happening is, the Report Wizard seems to work fine...We highlight the "Trigger" and "Fields" we want to use to setup the indexing..But what is happening, when we attempt to load the PDF, we receive errors indicating that either the "Trigger" or "Field" isn't found..When we take the same PDF, save it as a Word Doc and then create a new PDF using Adobe Acrobat, run it through the Report Wizard, it loads successfully...What is really strange is, this user had modified their original PDF using the iTEXT PDF Producer to add an indexing page....Their original PDF was created using a different PDF Producer called AMYUNI...They loaded successfully..So we're not sure if it's the actual PDF Producer or HOW he is modifying the PDF's adding this new page...Anyone else run into this kind of issue? I've included a screen shot of one of the errors we received indicating the field wasn't found

TRIGGER1=UL(0.15,10.53),LR(0.42,11.49),*,'TOPAS-BEL-MET'
FIELD1=UL(0.29,9.34),LR(0.53,9.72),0,(TRIGGER=1,BASE=0)
FIELD2=UL(0.43,9.31),LR(0.67,9.68),0,(TRIGGER=1,BASE=0)
FIELD3=UL(0.86,9.15),LR(1.15,9.68),0,(TRIGGER=1,BASE=0)
FIELD4=UL(1.17,8.92),LR(1.42,9.67),0,(TRIGGER=1,BASE=0)
INDEX1='MANDANT',FIELD1,(TYPE=GROUP)
INDEX2='SMANDANT',FIELD2,(TYPE=GROUP)
INDEX3='FONDSID',FIELD3,(TYPE=GROUP)
INDEX4='POSTING_DATE',FIELD4,(TYPE=GROUP)
INDEXSTARTBY=1
RESTYPE=ALL
parmdd=/prod/ode/cmod/arstmp4/XTSPDF01.XTSPDF01.pdf.parm
inputdd=/prod/ode/cmod/arsload4/XTSPDF01.XTSPDF01.pdf
outputdd=/prod/ode/cmod/arstmp4/XTSPDF01.XTSPDF01.pdf.out
indexdd=/prod/ode/cmod/arstmp4/XTSPDF01.XTSPDF01.pdf.ind
resobjdd=/prod/ode/cmod/arstmp4/XTSPDF01.XTSPDF01.pdf.res
Number of input pages = 9
Field 1 not found on page 1
ARSPDOCI completed code 1

pankaj.puranik · « **Reply #1 on:** August 09, 2013, 09:16:18 AM »

Here is what I would suggest.
From what I understand you have PDFs from two sources.
For one source, they load successfully but for the other they fail to load.

1. Take one sample PDF from each source.
2. Run arspdump on both the files. Ref. - http://publib.boulder.ibm.com/infocenter/cmod/v8r3m0/index.jsp?topic=%2Fcom.ibm.ondemand.mp.doc%2Fars1d171399.htm
3. Then open the output of the arspdump and search for the Trigger, Field, etc that you defined using the graphical indexer.
4. Compare the UL, LR that you get with the values below.

I suppose that for the files that load successfully, these values will match.
Whereas for the other it would not.

TRIGGER1=UL(0.15,10.53),LR(0.42,11.49),*,'TOPAS-BEL-MET'
FIELD1=UL(0.29,9.34),LR(0.53,9.72),0,(TRIGGER=1,BASE=0)
FIELD2=UL(0.43,9.31),LR(0.67,9.68),0,(TRIGGER=1,BASE=0)
FIELD3=UL(0.86,9.15),LR(1.15,9.68),0,(TRIGGER=1,BASE=0)
FIELD4=UL(1.17,8.92),LR(1.42,9.67),0,(TRIGGER=1,BASE=0)

It may also happen that due to the modification being done in the PDFs, the TRIGGERS and FIELD coordinates are getting off than the original.
It could also happen that there is additional string in the box that has TRIGGER or the fields.
I have been through similar issue where we were using iText to modify the PDFs to insert hidden triggers, etc.

So you may end up creating a separate CMOD Application for one set of PDFs and another for the set that fials to load. Offcourse with different coordinates.

Hope this helps.

Cheers
Pankaj.

LWagner · « **Reply #2 on:** October 11, 2013, 07:42:35 AM »

We tried using a 1 point font of white on white text to embed fixed position text as index values in PDFs.

Contractors use an unknown PDF generator and the scanlines, as we call them, read inconsistently, so I have to go with standard report text data. But even those have issues with arsload.

Simply using BI Publisher as programmers do in-house, we get 100% reliability from the one point scanline.

I have opened three PMRs on arsload to deal with the problems I am having. Two are related to embedded characters not being removed as specified in the load information panel. One for a date field, and one for a decimal field.

OnDemand User Group

News:

Author Topic: PDF Indexer issue (Read 3216 times)

DDP021

PDF Indexer issue

pankaj.puranik

Re: PDF Indexer issue

LWagner

Re: PDF Indexer issue