Author Topic: Creating "Stacked" PDF out of Exstream  (Read 15391 times)

Corinne

  • Newbie
  • *
  • Posts: 2
    • View Profile
Creating "Stacked" PDF out of Exstream
« on: October 04, 2010, 01:56:19 PM »
Does anyone have experience creating PDF output from Exstream to load into OnDemand? 

We are using Extream to create the PDF output and the index file but cannot figure out how to produce the PDF in the correct format ("stacked") to work with OnDemand.  Can anyone help?

Thank you.
Corinne

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2138
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Creating "Stacked" PDF out of Exstream
« Reply #1 on: October 04, 2010, 02:26:55 PM »
I'm not familiar with 'Extream', but I think you're talking about loading PDF files using the Generic Indexer.

You don't have to necessarily 'stack' these PDFs into a single file -- the Generic Indexer will let you specify an individual file name for each PDF, and concatenate them into 'objects' at load time.

Can you give us more information about what is wrong with the way Extream is producing files now?
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Corinne

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: Creating "Stacked" PDF out of Exstream
« Reply #2 on: October 05, 2010, 05:17:09 AM »
Yes, we are using the Generic Indexer.

Extream can produce the index file in the format that the Generic Indexer expects, but the PDF output is simply one PDF.  Therefore, when we load the files and try to view them in OnDemand we get an error (pulling up one individual indexed document) telling us that the PDF is not properly formatted - which makes sense. 

How would I go about specifying an individual file name for each PDF, and concatenate them into 'objects' at load time?  We are looking at potentially 6000 indexed documents...

Thanks!

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2138
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Creating "Stacked" PDF out of Exstream
« Reply #3 on: October 06, 2010, 04:37:14 AM »
Got it.  It concatenates all the docs together into one PDF.  I'm fairly certain that if you can't break this file up into individual PDFs, that you'll have to use the PDF Indexer.  It's been years since I've played with it, so I'll let someone else fill in the blanks about it.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1001
    • View Profile
Re: Creating "Stacked" PDF out of Exstream
« Reply #4 on: October 08, 2010, 04:11:57 PM »
Hello Corinne,

I am not sure to understand exactly what Exstream is doing, but maybe you can clarify my understanding, and maybe I can help you afterwards!

My understanding so far is the following:

Understanding 1)
You have, let's say, 10 PDFs, and Exstream will just concatenate each PDF file together. Really a file concatenation.
And he creates a Generic Indexer to archive this big multi stacked PDF into OnDemand.
So it means, If you cut the PDF with the values of GROUP_OFFSET and GROUP_LENGTH, you will have for everypiece of file, a valid PDF.

OR

Understanding 2)
Exstream does a valid PDF which inside this PDF contains the concatenation of all the PDF you want to archive, and then with then Extream creates a Generic Index, with some OFFSET and LENGTH...
And then if you use the GROUP_OFFSET and GROUP_LENGTH to cut the PDF in smaller file, then you don't have valid PDF, because you are cut in the internal of the big PDF's container.




From your explanation, I suppose that "Understanding 1)" is the correct answer... but are you sure that the GROUP_OFFSET and GROUP_LENGTH are correctly calculated? Because one small error, and then your pointer in OnDemand will retrieve the wrong part of the file and the output will be a corrupted PDF file.

If indeed it is "Understanding 2)", then you can forget Generic Indexer, because that doesn't work at all. And as suggested Justin, you must use PDF Indexer for it, but the PDF Indexer has many limitations...


To know if it's 1) or 2) can you cut the "stacked" PDF you receive from Exstream into piece with the value of the index file (look at the GROUP_OFFSET, GROUP_LENGTH)?

Cheers,
Alessandro
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2138
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Creating "Stacked" PDF out of Exstream
« Reply #5 on: October 09, 2010, 07:04:31 AM »
Hi Alessandro...

I think the situation that Corrine is trying to describe is:
   File.pdf (  Pg1  Pg2  Pg3 ... )
   ... where the individual pages are documents that should be able to be retrieved separately.

And *not* a series of complete PDF files, simply concatenated into a single file:
   AllPDFs.out ( [Pg1.pdf] [Pg2.pdf] [Pg3.pdf] [...] )

 ...which is why the Generic Index won't work.

Corrine -- can you clarify this for us?

-JD.

IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1001
    • View Profile
Re: Creating "Stacked" PDF out of Exstream
« Reply #6 on: October 11, 2010, 04:07:08 AM »
Hi Alessandro...

I think the situation that Corrine is trying to describe is:
   File.pdf (  Pg1  Pg2  Pg3 ... )
   ... where the individual pages are documents that should be able to be retrieved separately.

And *not* a series of complete PDF files, simply concatenated into a single file:
   AllPDFs.out ( [Pg1.pdf] [Pg2.pdf] [Pg3.pdf] [...] )

 ...which is why the Generic Index won't work.

Corrine -- can you clarify this for us?


Hi Justin,
Well apparently I need to get some training of clear message with you :-D Because that is exactly what I wanted to say, but... with 100X more words than you!!!  ::)

I need holidays!!!

Cheers,
Alessandro
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

LWagner

  • Guest
Re: Creating "Stacked" PDF out of Exstream
« Reply #7 on: November 01, 2010, 07:48:12 AM »
And what is the final result ? On how to do this ? I will need to do the same thing with 40,000 - 70,000 documents daily.

If using the PDF Indexer with an external index file, then the name of each PDF file must be listed with each document's indexes.  My IBM OnDemand Sales Support described four scenarios based on our taking AFP to PDF.

To use the Generic Indexer with PDF files requires knowing the coordinate locations of the text to be extracted, or have Tag Logical Elements in the AFP document which identify the field values to be used.

jo19021

  • Jr. Member
  • **
  • Posts: 12
    • View Profile
Re: Creating "Stacked" PDF out of Exstream
« Reply #8 on: January 11, 2011, 04:34:39 PM »
Not sure if this thread is still open or not.  We load AFP to OD so I will give you the procedure for that and it may work.  I know this is an OD blog, so please forgive me as I speak "exstream". 
First - you will have to create x-number of variables in exstream application equal to the index values of your ondemand folder(ex. acctnum, stmtdate, etc).  Next, add x search keys to your exstream environment section, pointing them to your x variables.  Now add the x new search keys to your application and be sure to set the placement as "before each customer".   This should add your index information to your PDF file.  You will then have to change your Ondemand application "Indexer Information" tab to look for your variable names at the locations they appear in the output file.  Hope that helps. 

LWagner

  • Guest
Re: Creating "Stacked" PDF out of Exstream
« Reply #9 on: January 11, 2011, 04:54:35 PM »
My Exstream support is too busy with other issues, so meanwhile I did get a solution working with IBM.

The Exstream team gave me a test PDF many months ago, that I was unable to use for some time, mainly since I thought I needed something else to go with it.  :(

To not lose too much time, I also posed the problem to IBM.  IBM finally told me I could nevertheless make the indexer type = PDF, and identify upper-left and lower-right coordinates for each field.

It took some work, but I was able to get the values, which turned out to be at the hundredths of an inch decimal value, and load and index the PDF sample file.  I can't give more details, since I did this almost a month ago. I just needed rectangular coordinates that were VERY tight to the print fields on a normal size page.  I'm sure I worked at the trigggers first, or one of them, and when ARSLOAD indicated I had a good value, I worked each successive field very closely, too, until they were all done.
----
COORDINATES=IN
INDEXSTARTBY=1
TRIGGER1=ul(7.63,0.24),lr(8.03,0.45),*,'Page 1'
TRIGGER2=ul(6.40,8.95),lr(6.86,9.13),0,'AMOUNT'
FIELD1=ul(4.51,0.67),lr(6.28,0.88),0,(TRIGGER=1,BASE=0)
FIELD2=ul(1.02,9.36),lr(3.50,9.57),0,(TRIGGER=1,BASE=0)
FIELD3=ul(1.02,9.51),lr(3.50,9.72),0,(TRIGGER=1,BASE=0)
FIELD4=ul(0.66,1.35),lr(1.34,1.55),0,(TRIGGER=1,BASE=0)
FIELD5=ul(7.10,8.93),lr(8.15,9.14),0,(TRIGGER=2,BASE=0)
INDEX1='ACCOUNT NUMBER',FIELD1,(TYPE=GROUP)/* ACCOUNT NUMBER */
INDEX2='NAME',FIELD2,(TYPE=GROUP)/* NAME */
INDEX3='ADDRESS',FIELD3,(TYPE=GROUP)/* ADDRESS */
INDEX4='CAN NO',FIELD4,(TYPE=GROUP)/* CAN NO */
INDEX5='AMOUNT DUE',FIELD5,(TYPE=GROUP)/* AMOUNT DUE */

pankaj.puranik

  • Sr. Member
  • ****
  • Posts: 374
    • View Profile
Re: Creating "Stacked" PDF out of Exstream
« Reply #10 on: January 11, 2011, 10:09:29 PM »
To define these co-ordinates (UL and LR), did you use the graphical indexer?
I think that's the easiest way to define the co-ordinates.
Just draw rectangular boxes around the fields and CMOD generates the co-ordinates for you.

LWagner

  • Guest
Re: Creating "Stacked" PDF out of Exstream
« Reply #11 on: January 12, 2011, 09:48:26 AM »
I tried setting the Indexer to "PDF" and opening a sample PDF file using Parameter Source "Sample Data", and got an error: "Adobe Acrobat (AcroExch.App rc=2147221005) could not be loaded."


LWagner

  • Guest
Re: Creating "Stacked" PDF out of Exstream
« Reply #12 on: January 12, 2011, 10:04:50 AM »
 :(

I get the same error by trying to bring up the Report Wizard.

So I solved the loading of the PDF inspite of the missing pieces.

LWagner

  • Guest
Re: Creating "Stacked" PDF out of Exstream
« Reply #13 on: January 12, 2011, 03:25:21 PM »
 ::)  I found my coordinates identifier.  I used ARSPDUMP, which identified the coordinates of every text string in the PDF file.  Hundreds of them.  I then needed to test and identify the right ones to use for triggers and fields for the Indexer Parameters.  I also tried adjusting the values slightly, which never worked.  :P

My sample ARSPDUMP code is below:
-----------------------------------
//PDFDUMP  EXEC PGM=ARSPDUMP,REGION=0M,                 
//        PARM='/-f //DD:INDD -o //DD:OUT'             
//STEPLIB  DD DISP=SHR,DSN=SYS2.CMOD.SARSLOAD           
//ADOBERES DD DISP=SHR,DSN=SYS2.CMOD.USERPARM(ADOBERES)
//ADOBEFNT DD DISP=SHR,DSN=FFF.OD840.ADOBEFNT.WORK     
//TEMPATTR DD DISP=SHR,DSN=SYS2.CMOD.ADOBEPDF.TEMPATTR 
//INDD     DD DISP=SHR,DSN=SMPE.OD840.BILL.FINAL.PDF2   
//OUT      DD SYSOUT=*                                 
//SYSTMP01 DD UNIT=SYSDA,DSN=&&SYSTM1,DISP=(NEW,PASS), 
//            SPACE=(CYL,(6,6))                         
//SYSTERM  DD SYSOUT=*                                 
//SYSPRINT DD SYSOUT=*                                 
-------------------------
sample output:
============
Place                                           
ul.h = 5.37 ul.v = 8.58 lr.h = 5.66 lr.v = 8.79
your                                           
ul.h = 5.67 ul.v = 8.58 lr.h = 5.91 lr.v = 8.79
payment                                         
ul.h = 5.92 ul.v = 8.58 lr.h = 6.37 lr.v = 8.79
stub                                           
ul.h = 6.38 ul.v = 8.58 lr.h = 6.61 lr.v = 8.79
in                                             
ul.h = 6.62 ul.v = 8.58 lr.h = 6.73 lr.v = 8.79
the                                             
ul.h = 6.74 ul.v = 8.58 lr.h = 6.92 lr.v = 8.79
provided                                       
ul.h = 6.93 ul.v = 8.58 lr.h = 7.38 lr.v = 8.79
envelope                                       
ul.h = 5.37 ul.v = 8.73 lr.h = 5.83 lr.v = 8.94

Ed_Arnold

  • Hero Member
  • *****
  • Posts: 1144
    • View Profile
Re: Creating "Stacked" PDF out of Exstream
« Reply #14 on: January 13, 2011, 11:48:09 AM »
"Adobe Acrobat (AcroExch.App rc=-2147221005) could not be loaded."
"Unable to initialize document."

http://www-01.ibm.com/support/docview.wss?uid=swg21211278

When I use the administration client to define my index parameters through the graphical interface, I receive message:
ACROEXCH.APP -2147221005

http://www-01.ibm.com/support/docview.wss?uid=swg21141770
#zOS #ODF