OnDemand User Group
Support Forums => MP Server => Topic started by: jsquizz on August 14, 2020, 09:07:27 AM
-
This isn't rocket science or reinventing the wheel, but I have a request to verify that a series of documents are loaded, or are not loaded into CMOD.
I approached this as basic as possible, I have a list like this that I loop through ARSDOC query, when the response is less than 1, I then echo the "missing" metadata to a text file for the business to resend. That part works fine.
12341234,313433
12341333,312343
12343333,331111
My problem is that they have 200,000,000 rows they need to validate. Using arsdoc query + bash, this is extremely slow as I expected. It would take 1month to complete going 24/7
We could go through the database tables directly, which is probably what our DBA will want us to do. Just wondering if anyone else has any good ideas.
-
In shell, comm command could be a life saver if you have your files sorted and in same format.
You can identify the delta (record in in file 1 but not in file 2; or, in file 2 but not in file 1) between 2 files.
I compared smaller files and it was very fast for what I needed to do.
I am not sure how it could perform in a file with 200,000,000 records. Maybe you can try a proof of concept with smaller file size and find out for yourself if this is something you could leverage partially into your general solution.
Best of luck,
Mehmet
-
bump. So I am left with a list of account_num, plan_num.. After we did some reconciliation with our sending teams and our dba, this list should be a listing of documents that need to be resent to CMOD. I wrote a small bash script that loops through the list with an arsdoc query, if it exists, do nothing. If it doesnt exist, it echos it to a text file or running list.
Doing this line by line with arsdoc query is taking a long time. I have a small subset of data in my test environment thats around 130k rows. It takes a few hours to run. The real prod data, is something like 1.3million rows if not more.
Does anyone have advice on this? I mean I could always search the database tables too. I figure for the first run through, I would use arsdoc query.
Thanks!