OnDemand User Group

Support Forums => MP Server => Topic started by: jsquizz on August 14, 2020, 09:07:27 AM

Title: Reconciliation via arsdoc query
Post by: jsquizz on August 14, 2020, 09:07:27 AM
This isn't rocket science or reinventing the wheel, but I have a request to verify that a series of documents are loaded, or are not loaded into CMOD.

I approached this as basic as possible, I have a list like this that I loop through ARSDOC query, when the response is less than 1, I then echo the "missing" metadata to a text file for the business to resend. That part works fine.

12341234,313433
12341333,312343
12343333,331111

My problem is that they have 200,000,000 rows they need to validate. Using arsdoc query + bash, this is extremely slow as I expected. It would take 1month to complete going 24/7

We could go through the database tables directly, which is probably what our DBA will want us to do. Just wondering if anyone else has any good ideas.
Title: Re: Reconciliation via arsdoc query
Post by: Mehmet S Yersel on August 14, 2020, 11:53:34 AM
In shell, comm command could be a life saver if you have your files sorted and in same format.
You can identify the delta (record in in file 1 but not in file 2; or, in file 2 but not in file 1) between 2 files. 

I compared smaller files and it was very fast for what I needed to do.
I am not sure how it could perform in a file with 200,000,000 records. Maybe you can try a proof of concept with smaller file size and find out for yourself if this is something you could leverage partially into your general solution.

Best of luck,
Mehmet
Title: Re: Reconciliation via arsdoc query
Post by: jsquizz on September 02, 2020, 12:28:59 PM
bump. So I am left with a list of account_num, plan_num.. After we did some reconciliation with our sending teams and our dba, this list should be a listing of documents that need to be resent to CMOD. I wrote a small bash script that loops through the list with an arsdoc query, if it exists, do nothing. If it doesnt exist, it echos it to a text file or running list.

Doing this line by line with arsdoc query is taking a long time. I have a small subset of data in my test environment thats around 130k rows. It takes a few hours to run. The real prod data, is something like 1.3million rows if not more.

Does anyone have advice on this? I mean I could always search the database tables too. I figure for the first run through, I would use arsdoc query.

Thanks!