Author Topic: ARSLOAD Issue - PDF Indexer with PPD loading very slow  (Read 840 times)

jsquizz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
ARSLOAD Issue - PDF Indexer with PPD loading very slow
« on: September 08, 2021, 08:07:13 AM »
Hi All,

We are loading from a remote CMOD Server (RHEL/DB2 V11/Cache):

Code: [Select]
Load Version <9.5.0.10>  Operating System <Linux> <#1 SMP Mon Feb 22 18:03:13 EST 2021.3.10.0-1160.21.1.el7.x86_64>  OS Userid <CMODADM>  Install Location </opt/workload/ibm/ondemand/V9.5/> Data(unlimited KB) Stack(8192 KB) Core(0 512-blocks) Cpu(unlimited seconds) File(unlimited 512-blocks) Nofiles(16384) Threads(0) Processes(4096)
..Into a fresh CMOD V10.5 install. We are noticing slow load times with a series of application groups. These reports are indexed using PPD's. Client is complaining that these batches of files are taking too long to index and load. Our problem is we have like 50 million docs to load and are only doing 100k/day or something like that. Unfortunately, I can't test this in my dev box..Because there's some kind of issue getting me a sample file. Something with the way they are generated.. Whatever that means.

Some numbers from the system load. from the system load (From the existing production system giving me the "issue") I noticed that index time is taking a lot longer than expected, and also than what i've tested with a sample PPD file from IBM. A much larger file, with much more documents, took less time.

Code: [Select]
       

Elapsed             IDX Time                   Docs                 Pages  Comp                    In                             Out
149.6544         122.1313                 1000                 6945 OD77             39392339             18658650
84.2975          60.9946                 1000                 5975 OD77             38167441             18119565
77.5499          50.2290                  463                 2348 OD77             15994287              4661084
106.8029          85.7542                 1000                 6281 OD77             38336344             18174458

Sample PPD Lab from IBM:
Code: [Select]
Application Group Load: Name(PPD) LoadId(5284-3-0-12FAA-20120123000000-20120123000000-5285) File(file_with_ppd.pdf) InputSize(111236081) OutputSize(50530342) Rows(6444) Time(56.6456) Appl(PPD) InputFileSize(163878229)
What are some considerations that I should look at on this remote server, where the PDF indexer is being used? One of the first things I noticed was the ulimits. I compared them to another CMOD system that uses ACIF, and noticed a difference. Unfortunately we are in the middle of a freeze right now, and I cant touch this..I also can't lay down the 10.5 binaries either.

System presenting the slow loading-
Code: [Select]
Data(unlimited KB) Stack(8192 KB) Core(0 512-blocks) Cpu(unlimited seconds) File(unlimited 512-blocks) Nofiles(16384) Threads(0) Processes(4096)
Existing CMOD System using ACIF, zero issues ever loading.
Code: [Select]
Data(unlimited KB) Stack(unlimited KB) Core(0 512-blocks) Cpu(unlimited seconds) File(unlimited 512-blocks) Nofiles(1000000) Threads(0) Processes(unlimited)
What kind of resources should this remote server be spec'ed out to. Wondering if throwing more memory/CPU at it will speed things up. I also noticed that the load time itself seems long for being such a small file, despite the CMOD boxes/PDF Indexer boxes being in  different data centers. Mentioned "possible" network latency.. But The client didnt like that answer.
#CMOD #DB2 #AFP2PDF #TSM #AIX #RHEL #AWS #AZURE #GCP #EVERYTHING