Author Topic: Ingesting a large CSV file..Need HELP!...  (Read 2380 times)

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Ingesting a large CSV file..Need HELP!...
« on: June 22, 2017, 04:50:32 AM »
We are currently ingesting a CSV file using the Generic Indexer.  Applications sends us a data file (.OUT) and then a Index (.IND) and Trigger...With this being a CSV file, there is no indexing being done...Here is the .ind file
CODEPAGE:819
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload4/MM0IA711.MM0IA711.ARD.OUT

What we are seeing is it will fail with this type of error,  Unable to read from offset 0 for length 3372275922 from the file >/prod/ode/cmod/arsload4/MM0IA711.MM0IA711.ARD.OUT<
Guessing this is due to the size??

It will occasionally load but when we look at the 87 message, we see this, Warning:  A document of size >386292596< bytes was processed.  It is not recommended to store documents in OnDemand greater than 50MB in size.  Although data may store successfully, it is possible that the data may not be able to be retrieve

Is there a setting we can change on the Application?...Normally we would check LARGE OBJECT but I believe this is only for ACIF indexer

Appreciate any help or suggestions

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2229
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Ingesting a large CSV file..Need HELP!...
« Reply #1 on: June 22, 2017, 06:48:35 AM »
You might be better off storing this as a .zip file -- because as a .csv file, it's not something CMOD natively understands how to parse and break up (and consequently, deliver back to the user) the way it does for line data files.  Zipping it (and setting the compression in the Application to 'Disable') would likely be the best way to do this.

And as much as I loathe to say this...  It might be better fit if this file was stored in CM8 or Filenet, as I believe they don't have the file size restrictions/preferences that IBM CMOD does

Also, if there aren't any fields in your index file, how are you querying for this file?  Just loading with a bunch of defaults?

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Ingesting a large CSV file..Need HELP!...
« Reply #2 on: June 22, 2017, 07:04:43 AM »
Thanks Justin for all the info.....

We spoke to the application area and they indicated they are doing reruns and combining them which is causing such a big file to be sent..We've been loading this file for a few years without any issues so we assumed they were doing something different.

All they are doing is sending us the 3 files (trigger, index and data file)..They send them to an ARSLOAD directly on our CMOD server.  First sending the data and index and then the trigger...There is no form a indexing being done..We are just loading the .csv file in bulk...Never knew of any way to "break up" .csv file. 

Knowing this. How would they send us the .out file as a zip?  Right now, on the application definition we set the VIEW INFORMATION as .csv ..

Right now, on the application we have Data Compression as OD77 and Compressed Object size as 100k


Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2229
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Ingesting a large CSV file..Need HELP!...
« Reply #3 on: June 22, 2017, 10:16:34 AM »
Ah.  I just took a closer look at your file size -- I thought it was 337MB -- but it's 3.37GB.  And I'm pretty sure CMOD can't load single documents over 2GB in size.

As for loading .zips, you'd need to create a new Application with a user defined data type.  (Also, if you're going to go down this road, make sure you use the -9 option (maximum compression) in the zip command to make sure the files are as small as the Zip tool cam make them, since you'll presumably be storing it for a loooong time.  :)

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR