OnDemand User Group

Support Forums => MP Server => Topic started by: JeanineJ on January 23, 2024, 09:11:46 AM

Title: How Can I Optimize ARSDOC GET
Post by: JeanineJ on January 23, 2024, 09:11:46 AM
I've got to extract a little over 170K pdf documents (or more) from my CMOD instance to give to the external entity that bought one of our subsidiaries. I had intended to use ARSDOC GET with the LOAD ID (-X ) and -c -g -N parameters to get the documents but my tests last week on a large load (4182 pdf documents) took almost 4 hours to run. I don't know if I'll have to do -c but I haven't yet heard from the external entity on how they need the pdf documents (they don't use CM or CMOD). TSM seems to be the sticking point since all of the documents have migrated out of cache and are now in TSM. IBM indicates that TSM is the problem. I'm open to suggestions of ways to speed this up without affecting my internal and external users. I'm on CMOD 10.5.0.5 with DB2 and TSM on a RHEL7 server. TIA
Title: Re: How Can I Optimize ARSDOC GET
Post by: jsquizz on January 23, 2024, 11:21:01 AM
Hm, strange.

I did a full system extraction using arsdoc get with -X, no issues. Perhaps your DB is not tweaked right or there's a setting off in the the RHEL (Based on your reply to my post..) server.

Maybe give it "more juice"

But as far as arsdoc goes, just as a force of habit I usually do-

arsdoc get -u <user> -p <stash> -agcNv -G <ag> -X loadID -o <AgName>

or

Query the data tables, grab the DOC_NAME

Lets say theres 100 tables, maybe do three lists, of the doc_names do

List1:
FAA1
FAA2
FAA3
FAA4

List2:
FAA5.. etc

I think when I did the extract in 2021 I did something like:

Code: [Select]
#!/usr/bin/bash
#script1.bash
<DefineVariables or config file..>
while read DOC_NAME; do
   arsdoc get -u ${USER} -p ${PASS} -h ${HOST} -agcNv -i "where doc_name like '%${DOCNAME}%': -o ${AGNAME}
done < list1.list

nohup ./script1.bash > round1.out 2>&1 &


there's a few ways to do this. I know you can use arsadmin as well, but I have never taken that leap.
Title: Re: How Can I Optimize ARSDOC GET
Post by: JeanineJ on January 23, 2024, 11:37:17 AM
I had a ticket open with IBM and they've pretty much said it's TSM causing the lag. I'd like to bypass looking in CACHE first only because I know for a fact that all the documents are in TSM only. I'm waiting on my TSM guy to be available so I can get him to open another ticket for TSM to see if it can be tuned to work faster. The external entity may need the pdf's individually because they definitely don't use CMOD so I don't see the value of generating a generic index file. But what do I know.
Title: Re: How Can I Optimize ARSDOC GET
Post by: jsquizz on January 24, 2024, 06:18:03 AM
I had a ticket open with IBM and they've pretty much said it's TSM causing the lag. I'd like to bypass looking in CACHE first only because I know for a fact that all the documents are in TSM only. I'm waiting on my TSM guy to be available so I can get him to open another ticket for TSM to see if it can be tuned to work faster. The external entity may need the pdf's individually because they definitely don't use CMOD so I don't see the value of generating a generic index file. But what do I know.

That is a very valid point. Unfortunately, I've seen that also. Perhaps checkout ARSADMIN - I'm not very familiar with it but you can essentially retrieve documents as a "bunch" for lack of a better term
Title: Re: How Can I Optimize ARSDOC GET
Post by: Justin Derrick on January 24, 2024, 12:07:27 PM
No matter what you do, if your documents are stored inside TSM, and on tape, it's going to take a while to get them back.

You could re-cache them (https://cmod.cloud/professionalservices/ibm-cmod-cache-optimization/) but the effort may not be worth it for only 170k documents.

-JD.