Author Topic: Generic Indexer Help  (Read 6177 times)

johnnoel

  • Guest
Generic Indexer Help
« on: May 03, 2011, 10:07:36 AM »
I am getting the following error when trying to use the Automating ARSLOAD in OnDemand Multiplatform (Windows)

arsload: Processing file >M:\arsload\HLQ.JOBNAME.PDF37A.DECSAUTO-V08.2011001.1035000.ard<
 
arsload: 05/03/11 10:17:48 -- Indexing started, 2073 bytes to process
arsload: Generic Indexer requires the data to have already been indexed.  Index the data or verify your input file and resubmit the job
arsload: Output/Indexer file was not created
arsload: 05/03/11 10:17:48 Indexing failed
arsload: Processing failed for file >M:\arsload\HLQ.JOBNAME.PDF37A.DECSAUTO-V08.2011001.1035000.ard<
arsload: Unloading of data was NOT performed

I am not sure if the ard file shouldn't be the actual index file. I am using it like it is. What should the contents of the ARD file be? I know what the .ind file should be and the out file is really a PDF file using Generic Indexer.

I have the generic index file and the PDF file. What should the ard file be, etc?

How did you solve this problem?

Thanks,
John

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1002
    • View Profile
Re: Generic Indexer Help
« Reply #1 on: May 03, 2011, 10:30:12 AM »
Hello John,

Normally when you use Generic Index, you should have 2 files:

data file : sample.pdf
index file: index.ind    <- .ind is important

and then when you launch the arsload you use the index name without the extension:

arsload -h archive -u admin -p password -a application -g applicationgroup index


Now if you want to use ARSLOAD as a service / daemon, then you need these two files (data + index) AND a trigger, and the trigger is this .ard file.
So the files would have been named like:

sample.pdf
index.ard.ind
index.ard    <- this one is empty, just a trigger

I've never used the service/daemon way of ARSLOAD, so maybe somebody can give a more detailed information on that topic.

Cheers,
Alessandro
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

johnnoel

  • Guest
Re: Generic Indexer Help
« Reply #2 on: May 03, 2011, 01:02:43 PM »
Thank Alessandro!

I had some help from IBM. The ARD is empty and just a trigger

I had to have 3 files with the following naming convention:

HLQ.JOBNAME.PDF37A.DECSAUTO-V08.2011001.1035000.ard
HLQ.JOBNAME.PDF37A.DECSAUTO-V08.2011001.1035000.ard.ind
HLQ.JOBNAME.PDF37A.DECSAUTO-V08.2011001.1035000.ard.out

where PDF37A is the Application Group and DECSAUTO-V08 is the Application.

In the generic indexer file I had to specify the full path for the out file. I can't use PDF as the out name because the Automatic Loading will try to process ARD or PDF files. I had to rename the .PDF to .OUT

Thanks!

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1002
    • View Profile
Re: Generic Indexer Help
« Reply #3 on: May 03, 2011, 01:38:53 PM »
Hello John,

in the documentation (http://publib.boulder.ibm.com/infocenter/cmod/v8r4m1/index.jsp?topic=/com.ibm.ondemand.indexingmp.doc/ars1d171309.htm) it says the following:

Quote
The GROUP_FILENAME: parameter in the .IND file specifies the full path name of the actual input file to be processed.

It means that you need to have the trigger according to the naming convention given, the same for the index.
BUT for the data (out file), you can have ANY name you want, with ANY extension you want, as long as you define the same name in the GROUP_FILENAME: parameter and you data file.

so it is legal to have the following (except for the full path as written in the doc):

HLQ.JOBNAME.PDF37A.DECSAUTO-V08.2011001.1035000.ard
HLQ.JOBNAME.PDF37A.DECSAUTO-V08.2011001.1035000.ard.ind    <- here I have GROUP_FILENAME:my_beautiful_file.pdf
my_beautiful_file.pdf

;D

Cheers,
Alessandro
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

LWagner

  • Guest
Re: Generic Indexer Help
« Reply #4 on: May 14, 2011, 05:58:32 PM »
If you are in a z/OS environment, use ARSPDUMP to dump the PDF text and find out the locations on the pages of the text you need.

But PDF load to CMOD on z/OS is time intensive, and works best from Windows index servers.

We now process 100+ PDFs in our bill nightly, from Windows up to z/OS, it take 5 hours on 8 servers, with about 18 minutes dedicated to indexing, and a 90 second upload.  In CMOD 8.4.0.3, we get a 16 fold increase in file size from 30 Mb to 500 Mb, and get about 18% compression from OD77.

If you'd like to know more about our experiences with PDFs in CMOD, drop me a line.

Larry Wagner
Everything CMOD in Administration and Tech Support
LA DWP