Author Topic: PDF from ARSDOC GET can see only one file  (Read 451 times)

teera_aoo

  • Jr. Member
  • **
  • Posts: 18
    • View Profile
PDF from ARSDOC GET can see only one file
« on: February 29, 2024, 01:51:13 AM »
I have requirement to export all data from CMOD to original pdf with indexes of them. (To use on another system)

So, I try to used 'arsdoc get' command. But output of PDF will be merged in very strange format cannot use to another system,
that is when open by normal PDF viewer will show only first file that exported from CMOD but I exported about 10 pdf files from CMOD.
When inspect filesize of PDF merged, I see filesize come from 10 pdf combined. (Ex. each file is 100K, merged file is 1000K)
In index file will show offset/lenght of each PDF. I think it managed as layers of PDF.


arsdoc get -hlocalhost -uadmin -ppassword -f "PDF-TIV" -g  -N -c -i "where doc_no like '%'" -o PDF-TIV.pdf
Note:
 - When arsload back to CMOD, it will spilit to 10 PDF file correctly
 - Option -g -N -c needed to used together.

Screenshot: https://u.pcloud.link/publink/show?code=XZCVdJ0ZotyT0txuaOYVyljCldfCzSOyuzd7

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: PDF from ARSDOC GET can see only one file
« Reply #1 on: February 29, 2024, 06:26:09 AM »
It's not a strange format -- it's the CMOD Generic Index format (v2) which has been around for nearly 20 years, and is well documented:
https://cmod.wiki/dox/CMODv10.5/IndexingReference.pdf

If you're not loading the data to another CMOD server, you need to write a utility to do the splitting and convert the metadata, or work with someone who has already done that work.  ;)

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

teera_aoo

  • Jr. Member
  • **
  • Posts: 18
    • View Profile
Re: PDF from ARSDOC GET can see only one file
« Reply #2 on: February 29, 2024, 07:53:47 AM »
Hi Justin,

Thank you for your reply.
Yes, I quite okay with generic index file because I often to use for loading PDF and other types to CMOD.

The generic index file that I have ever used as below format. (Offset=0, Length=0)
Code: [Select]
...
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:File1.pdf
               
...
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:File2
...


But I mentioned about PDF merge files that got from arsdoc_get, they're not append file2, file3, ... to page 2, page 3, ...
But it append/build something like a layers or binary combined.

Indexer has reference to same file but changed on offset and length

Code: [Select]
...
GROUP_OFFSET:0
GROUP_LENGTH:102187
GROUP_FILENAME:PDF-TIV2.pdf.0.PDF-TIV2.PDF-TIV2.out
               
...
GROUP_OFFSET:102187
GROUP_LENGTH:681891
GROUP_FILENAME:PDF-TIV2.pdf.0.PDF-TIV2.PDF-TIV2.out
...

Do you know or can share document that explain about this PDF's specification or utllity to split it back to original?




rjrussel

  • Full Member
  • ***
  • Posts: 141
    • View Profile
Re: PDF from ARSDOC GET can see only one file
« Reply #3 on: February 29, 2024, 10:14:28 AM »
Document 1 start at 0 and is 102187 bytes. Document 2 starts 102187 and is 681891 bytes. You need to write something to extract individual documents using that logic. That is all there is to it.

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: PDF from ARSDOC GET can see only one file
« Reply #4 on: February 29, 2024, 10:59:25 AM »
Page 255 of the document I linked.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

teera_aoo

  • Jr. Member
  • **
  • Posts: 18
    • View Profile
Re: PDF from ARSDOC GET can see only one file
« Reply #5 on: March 01, 2024, 03:29:58 AM »
Hi Justin,

Thank you.
It seem arspdoci used for managed pdf indexer for CMOD loading (Like a pdf indexer that use in Admin Client to setup index position from PDF document)
But what I find out is how to cut PDF by offset/lenght specified in .ind file from arsdoc get.

Thank you