Author Topic: Ingestion a .xls file  (Read 7309 times)

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Ingestion a .xls file
« on: January 27, 2015, 08:51:51 AM »
Hi,  Currently we are ingesting files into CMOD that are xls.  We defined the Application to use the GENERIC Indexer...The Application then sends 3 files to an ARSLOAD directory on the CMOD server...Those 3 files are the TRIGGER, INDEX and Data File...The Trigger File is formatted as ApplicationName.ApplicationGroupName.ARD....The Index file is formatted ApplicationName.ApplicationGroupName.ard.ind...And the data file (xls) can be called anything but it always ends in .xls....Within the .ind file, the path to the .xls file to the arsload directory is specified...The issue is, after the file is ingested, the Trigger and Index files are removed BUT the .xls file remains in the ARSLOAD directory...It has to be something on how the ARSLOAD directory is configured...We do have another application that is also sending .xls files to CMOD, but they call their xls files, ApplicationName.ApplicationGroupName.ard.out...It would appear that ARSLOAD4 is configured to remove files with ard as the 3rd delimiter...Does anyone know how to configure the ARSLOAD directory to automatically remove a .xls file after ingestion?

jeffs42885

  • Guest
Re: Ingestion a .xls file
« Reply #1 on: January 27, 2015, 10:02:29 AM »
Can you show the arsload daemon you are running?

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #2 on: January 27, 2015, 10:15:23 AM »
Unfortunately I cannot...I'm just and Admin (setting up Applications, Application Group and Folders)...Don't have access to the ARSLOAD demons...I know we are running numous ARSLOAD's...I did some testing using a testing setup called DAVE and here's what I found...

First test

Called the datafile DAVE.DAVE.ARD.OUT   < File ingested and was automatically removed from ARSLOAD4

Second test

Called the datafile DAVE.DAVE.ARD.OUT.XLS < File ingested but datafile remained on ARSLOAD4

Third and final test

Called the datafile DAVE.DAVE.XLS.OUT < File ingested but datafile remained on ARSLOAD4


So apparently in the config of ARSLOAD4, its defined to only remove files that have 4 delimiters (with first 2 being Application and Application Group name) and end in . ARD.OUT…..

I'm just trying to get some info on what needs to be changed on the config of the ARSLOAD4 daemon to, if possible, allow automatic deletion of any file name (particularly ending in .xls in this case) not matter what its called..That way the application doesn't have to rename their data files to have to be ApplicationName.ApplicationGroupName.ARD.OUT in order for them to automatically get removed after ingestion.

jeffs42885

  • Guest
Re: Ingestion a .xls file
« Reply #3 on: January 27, 2015, 11:06:17 AM »
Heres the daemon I am currently using:

arsload -vf -I instance -B AG.IGN.IGN.IGN -d /ondemand -c /arstmp -t 60

-v verbose
-f unload data if fails
-I instance
-B format of incoming file
-d/-c proessing directories
-t run every 60 seconds

Perhaps the daemon that you are using has the -n flag.... -n Do not remove files. when I do a test of a new report, I move all of them to a separate directory and issue the following, which doesn't remove the file in case we need to do something with the indexing..This isn't the daemon either.

arsload -nvf -I instance -g AG FILENAME

Hope this helps some!

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #4 on: January 27, 2015, 11:24:50 AM »
Thanks Jeff!!!!...I will definitely pass this onto our systems support area...Hopefully they can do something with it...Appreciate all the info/help!

take care

Dave

jeffs42885

  • Guest
Re: Ingestion a .xls file
« Reply #5 on: January 27, 2015, 12:18:28 PM »
np. make sure you let us know your findings!

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #6 on: January 28, 2015, 04:02:11 AM »
Here's what I got back from our systems group on how the ARSLOAD daemon is configured

/usr/lpp/ars/bin/arsload -v -f -c /prod/ode/cmod/arstmp4 -B AG.APP.EXT -d /prod/ode/cmod/arsload4 -A DATASET -h xsa00e70 -t 60

There is no -n flag showing, but other files are being removed after ingestion..

jeffs42885

  • Guest
Re: Ingestion a .xls file
« Reply #7 on: January 28, 2015, 06:27:58 AM »
So the file that you are sending is..fubar.whatever.xls.OUT

what about sending it without the xls, and just fubar.whatever.OUT

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #8 on: January 28, 2015, 06:54:55 AM »
The initial data file sent by the application ended in .xls...It would ingest, but not get removed after ingestion from the ARSLOAD directory..However, the Trigger and Index files were automatically removed....I tried numerous other file name scenarios...The only instance that automatically removed the datafile was when it was named AppGrpName.AppName.ARD.OUT...Coincidentially, the other two files (trigger & index) were names similar AppGrpName.AppName.ARD (trigger) and AppGrpName.AppName.ARD.IND....In others the only case where I found that the data file would get automatically removed after ingestion was when it was called AppGrpName.AppName.ARD.OUT...I tried removing the ARD and the file did not get deleted after ingestion..It needed BOTH ARD and OUT...

Another bit of info, in this same ARSLOAD, we are loading PDF files...They are named, AppGrpName.AppName.PDF and they as well get automatically after ingestion..But in the PDF case, its only one data file..There is no Trigger or Index file being sent...

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #9 on: January 28, 2015, 10:29:38 AM »
somewhere in the ARSLOAD process, it requires the data file to end in .OUT...In our case, the application is trying to load .xls files and wanted to keep their data file names the same as they currently are (ending in .xls)...The issue isn't LOADING the files because the end in .XLS its DELETING them after they are ingested...The ONLY scenario we found where the data file gets automatically deleted using ARSLOAD is when it is called, APPGRP.APP.ARD.OUT....The application wants to send multiple xls files at one  time, using the GENERIC INDEXER so obviously they can't send all the XLS files under the same name because they'll get the, "File Name Already exists"...

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #10 on: January 29, 2015, 05:39:51 AM »
Did some digging online and found this on IBM's website:

After successfully loading the data, the system deletes the input file that is specified on the GROUP_FILENAME: parameter if the file name extension is .OUT, and for daemon mode processing, the rest of the input file name is the same as the .ARD file name. The system also deletes the .IND file (the Generic indexer parameter file) and the .ARD file (the dummy file that is used to initiate a load process when the ARSLOAD program is running in daemon mode). See Loading data for more information.

So based on this, and the results we've seen, the datafile MUST end in .ard.out in order for it to get automatically deleted when using ARSLOAD...Our issue is, the application wants to send multiple spreadsheets at one time so they can't all be named the same..Otherwise they get 'File already exists' message when loading it to the server...I don't know if anyone else has had this kind of issue...Having the output file end in .ARD.OUT isn't an issue if they are only sending one file at a time...But the whole basis of using the generic indexer and sending a .ind file was they could send more than one datafile at a time...But if the datafile(s) always have to be named the same, it doesn't make that possible...Anyone have any input or workaround???

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #11 on: February 03, 2015, 07:13:22 AM »
Just to give some more additional information on this issue...One time per month, they need to send 5000 XLS files to a specific CMOD definition..Each of these spreadsheets have a specific CID number associated with them...Each of these spreadsheets have a unique name that all end in .xls....Using the GENERIC INDEXER, they need/want FTP these 5000 individual spreadsheets to our ARSLOAD folder on the CMOD Server..Then send one .IND file containing all the indexing information for each of  these spreadsheets and then one trigger....Below is a sample of their .ind file..The Trigger and ind files are always called, AppGrpName.AppName.ARD and AppGrpName.AppName.ARD.IND. 
With the current configuration of the ARSLOAD process, in order for the datafiles (.xls) to get deleted off  the server after ingestion, they have to be called, AppGrpName.AppName.ARD.OUT...Obviously with 5000 unique spreadsheets needing to be sent at one time, its not possible to have them all named  the same..The 5000 files ingest with no problem, it's the automatic deletion after ingestion that is the issue..Anyone know how to accomplish this?

CODEPAGE:819
GROUP_FIELD_NAME:POSTING_DATE
GROUP_FIELD_VALUE:30-SEP-2014
GROUP_FIELD_NAME:CID
GROUP_FIELD_VALUE:9970808717
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/arsload1/ProcessingChargesByCIDDetailed9970808717Sep2014.xls
GROUP_FIELD_NAME:POSTING_DATE
GROUP_FIELD_VALUE:30-SEP-2014
GROUP_FIELD_NAME:CID
GROUP_FIELD_VALUE:9970809303
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/arsload1/ProcessingChargesByCIDDetailed9970809303Sep2014.xls
GROUP_FIELD_NAME:POSTING_DATE
GROUP_FIELD_VALUE:30-SEP-2014
GROUP_FIELD_NAME:CID
GROUP_FIELD_VALUE:9977330014
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/arsload1/ProcessingChargesByCIDDetailed9977330014Sep2014.xls

jw

  • Guest
Re: Ingestion a .xls file
« Reply #12 on: February 03, 2015, 08:30:43 AM »
Hi DDP021,

If you do not want to write a script to clean them up, then the easiest way is to setup a crontab to do the cleanup based upon the age (say 5 days old) of the files, for example,

0 8 * * * find /arsload1 -type f -mtime +5 -name "*.xls" -exec rm {} \\;


Not sure whether this is acceptable in your case.

Good luck.

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingestion a .xls file
« Reply #13 on: February 03, 2015, 08:51:46 AM »
Thanks JW....Our systems guys were looking for a means to somehow do a cleanup of the XLS files...Just was curious if the ARSLOAD process could be "tweeked' to not just look for .out files to automatically delete after ingestion...But it also appears the data file must also have the same naming convention as the trigger and index files in order for it to get deleted...The best route would be to setup a script or something to delete the .xls files separately...Thanks for the help!!!

LWagner

  • Guest
Re: Ingestion a .xls file
« Reply #14 on: April 21, 2015, 02:01:10 PM »
-n with arsload will delete source files on load.
-N causes the files to be retained. That seems to be your default.