Author Topic: Manually retrieving a document?  (Read 2161 times)

Mattbianco

  • Jr. Member
  • **
  • Posts: 17
    • View Profile
Manually retrieving a document?
« on: July 31, 2019, 05:19:54 AM »
Hello!

We have been running CMOD for Multiplatform in a cache-only mode for a very long time, and are currently on v9.5.
While doing a large amount of exports with arsdoc get, I came across some documents that appear to have been corrupted somehow, that could not be extracted. They can also not be opened with the thick client nor with the Content Navigator web gui.

Using arsdoc get with option -n makes it fail on the second document in my selection, and when doing it without the -n flag, it fails on the eleventh. That is probably just an ordering issue, though. I don't think that's worth investigating. I've done single-document queries so I know exactly which documents are damaged.

The messages from arsdoc while retrieving are:

2019-07-24 12:08:26.270740: (2): ARS6073I Retrieving document for userid 'ADMIN' ...
2019-07-24 12:08:26.294203: ARS6096E Retrieve unsuccessful
2019-07-24 12:08:26.294313: ARS2094E The server failed while retrieving a document

There is nothing in the system log when this happens. Using the native windows client or the web client (Content Navigator) on the corrupted documents also logs nothing into the System Log.

Using arsdoc query with the -D flag, adds this information to the output: ,801FAAB,148456,36678,976552,243727,U,O,0,1,0.
Am I correct in thinking that the compressed document should be somewhere in /arscache/cache1/retr/HDB/DOC/801FAAB ?

Example:

lrwxrwxrwx    1 archive  db2adm           38 Apr 29 2016  /arscache/cache1/retr/HDB/DOC/801FAAB -> /arscache/cache8/19130/HDB/DOC/801FAAB
-r--------    1 archive  db2adm      7798612 Apr 01 2012  /arscache/cache8/19130/HDB/DOC/801FAAB

ARSMAINT has no complaints about the links, and the permissions and ownership on the retr link and the actual file it points to are correct, and the link target can be read with tools like wc or strings, so there is no underlying I/O error.

So, I guess what I'm wondering now is how to find the offset and length in the DOC file. Could it be the second (148456) and third (36678) parameters in the "Document handle" that the -D flag provides to the arsdoc query output?
If so, I guess the only thing I need to figure out is how to do an OD77 decompress "by hand"  ;D - Is there a command line tool for that?

Obviously the document retrieval in the clients should work if the above works, but I still would like to give it a final try to just to be absolutely certain that these documents are lost forever.

Any further ideas? Anyone knows what _all_ the parts of the "document handle" are?

Thanks!
Matt

Lars Bencze

  • Full Member
  • ***
  • Posts: 116
  • CMOD Expert at Skandia
    • View Profile
    • INACTIVE - Bezland Consulting
Re: Manually retrieving a document?
« Reply #1 on: September 11, 2019, 05:21:43 AM »
Hi Matt,

You probably want to use the commands "arsadmin retrieve" and "arsadmin decompress". Here's the onscreen help for the latter:

arsadmin decompress
ARS1013I Usage: arsadmin decompress [options]
   Version:  9.5.0.9
   decompress Decompress a file
      -b <off> Offset to begin at.  (Default 0)
      -c <type> Document Compress Type
         'D' Disable Compression
         'F' OD77Lite Compression
         'H' OD77HW Compression
         'L' LZW12 Compression
         'N' No Compression
        'O' OD77 Compression (Default)
         'X' OD77LiteHW Compression
         'Z' LZW16 Compression
      -l <len> Length to end at.  (Default file size)
      -o <out_file> Output File
      -s <src_file> Input File
      -1 <trace_file> Trace file
      -2 <trace_level> Trace level
OnDemand for MP expert. #Multiplatforms #Admin #Scripts #Performance #Support #Architecture #PDFIndexing #TSM/SP #DB2 #CustomSolutions #Integration #UserExits #Migrations #Workflow #ECM #Cloud #ODApi

Mattbianco

  • Jr. Member
  • **
  • Posts: 17
    • View Profile
Re: Manually retrieving a document?
« Reply #2 on: October 02, 2019, 05:16:37 AM »
Thank you so much, Lars!

In the -D information from arsdoc query, in my case: ,801FAAB,148456,36678,976552,243727,U,O,0,1,0

It looks like the 4th value (976552) is indeed a valid document offset in the 801FAAB file. When giving it as the "begin" parameter to arsadmin decompress, I do get a valid PDF document output (if I adjust it to another nearby value I get an error message). If I pass the 5th value (243727) as the length I get the same results. If I supply a smaller length, it appears to just hang, so I figure it must be the compressed length or something like that.

However, in my case, the resulting PDF Document doesn't look like it belongs to the document metadata, so I'm still confused about what the h*** is going on here. Unfortunately.  :(

Still curious about the 2nd and 3rd parameters: 148456,36678.

But, arsadmin decompress was very valuable to learn about, nonetheless.
Thanks again!

Mattbianco

  • Jr. Member
  • **
  • Posts: 17
    • View Profile
Re: Manually retrieving a document?
« Reply #3 on: October 02, 2019, 05:27:07 AM »
Hi again :-)

I did some more digging, and the PDF i got matched the metadata of another document.

The document I was hoping to find had these attributes:
801FAAB,148456,36678,976552,243727,U,O,0,1,0,2423-1-0-801FAA-15416-15416-2424

And the document I found had these attributes:
801FAAB,222153,37134,976552,243727,U,O,0,1,0,2423-1-0-801FAA-15428-15428-2424

So I guess I don't fully understand how the data in the cache filesystem is stored.
Any ideas?

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2228
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Manually retrieving a document?
« Reply #4 on: October 02, 2019, 06:16:54 AM »
Alessandro gave a really good presentation on CMOD Cache Internals during the 2019 Technical Webinar Series:  http://www.odusergroup.org/forums/index.php?topic=2640.0

Start there and pop back with any questions you might have.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Mattbianco

  • Jr. Member
  • **
  • Posts: 17
    • View Profile
Re: Manually retrieving a document?
« Reply #5 on: March 15, 2024, 04:21:19 AM »
Still curious about the 2nd and 3rd parameters: 148456,36678.

They are the offset and length of the decompressed document within the compressed document block.
The 4th and 5th parameter (976552,243727) are the offset and length of that compressed block within the 801FAAB object (DOC) file.

So, first decompress the 243727 bytes starting at byte 976552 of 801FAAB, and then take the 36678 bytes starting at 148456 of that decompressed output, for the actual document.