Author Topic: Migration question, each load is 1 doc + arsmaint segment date Question  (Read 1271 times)

jsquizz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
got a 2 questions, hopefully quick.

I'm working on a migration (100 million docs..) and The client loads documents one by one. Each 87 record contains a single record.

It looks like this is the case for all of their documents after combing through their logs. My script I plan on writing will do an arsdoc and retrieve all files based on doc_name, but I am wondering since it's going to be running rapidly generating millions of files at a time, how bad this could get, taxing on resources.. This is my only option at the time.

I did this approach about a year ago and it worked perfect, but it was a mixture of documents like this and some single AFP files that had 30000 docs per, no issues there.


Next part of this going back to part 1, I'm going to be extracting this and reloading it into a 10.1 system. They are currently using "date loaded" as their segment date and that's what arsmaint is using to expire on.

Any suggestions on how to handle this since I am re-ingesting, and basically resetting the clock. They hinted to me that the last time they were going to try this they ran into this, So..they just stopped..

Thanks all!
#CMOD #DB2 #AFP2PDF #TSM #AIX #RHEL #AWS #AZURE #GCP #EVERYTHING

Norbert Novotny

  • Jr. Member
  • **
  • Posts: 46
    • View Profile
Hi,

I would go with export/import similar as I have just reply to your other older post/question.
However this covers Archive indices only, e.g. a case where you need to just migrate CMOD and (e.g. TSM) storage stays the same.
If you also need to export underlying data, I have used export using dsmc (as it is a bit faster) as:
Code: [Select]
dsmc retrieve  -server=TSMServerName -virtualnode=TSMNode -password=NODEPasswd -replace=no -ifnewer -filesonly -filelist=InputFileList MyLocalDir/
The "InputFileList " has all of the doc names but no more then ~5000 per one execution (this is just my experience) one name per line.

Important: Don't forget each of doc_name as 1234FAAA does have hidden 1234FAA1 file not visible in CMOD but present on the storage.

However, arsadmin retrieve works also just fine. One comment, though:  In a case there is an error while retrieving data from storage e.g. corruption, not found or so, then arsadmin would stop with partially retrieved file and would not progress with rest in acase you have specified multiple files on one command line!!
Doing arsadmin retrieve one by one would be very slow as it does: instantiate process, authenticate, write system log record and clean up after, for each of your 100M files !!! That's a lot of fluff for nothing. However, if you still want to go this route I would suggest at least to disable logging for respective AG(s).

Once exported you can loaded to target system with arsadmin store. Also, if you have configured a new Node names on your target and you have copied your segment tables, don't forget to update NIDs (mostly pri_nid) as well arsres.pri_nid

Hope this helps,
 N.

Norbert Novotny
Legal archiving - Swisscom AG

Mobile:  +41-On-request

Dev: #SQL, #Perl, #Java, #C

Interests: #CMOD, #Multiplatforms, #DB2, #Oracle, #TSM, #ERM, #Performance