Author Topic: PDF Indexer with large PDF files  (Read 1628 times)

Steve Bechtolt

  • Jr. Member
  • **
  • Posts: 56
    • View Profile
PDF Indexer with large PDF files
« on: August 14, 2018, 11:25:21 AM »
I was just wondering what the largest PDF file, in terms of number of pages, anyone has processed.
and what is the timing for indexing these large PDF files.
Steve Bechtolt
IBM Certified Solutions Expert - IBM Content Management - OnDemand Multiplatform
ERM as a Service - DXC Technology

Lars Bencze

  • Full Member
  • ***
  • Posts: 116
  • CMOD Expert at Skandia
    • View Profile
    • INACTIVE - Bezland Consulting
Re: PDF Indexer with large PDF files
« Reply #1 on: August 23, 2018, 08:54:44 AM »
I see that you have not yet received an answer to this, so I'll give you a partial one.

When I look at one of the OD systems I manage, we use "PagePiece Info" PDF indexing a lot for the bigger loads. We also try to divide big loads into batches of 20,000 or 50,000 documents each.
I'm sorry but we do not store Page Count for our batches, but a reasonable estimate would be somewhere around 1,5 to 2,5 on average pages per document.

The time needed to index these files are dependent on a lot of things, here are some:
what type of PDF indexing you use :) , how many fields you have defined, how complex the structure of the PDF file is (including compression, which should be avoided), (number of ) CPU(s and their) speed, amount of RAM available, how fast your disks are (to read the large files)... etc etc.

With the fairly small OD servers we have, and around a dozen fields defined, the indexer seems to handle about 100 documents per second. So a 20,000 doc batch takes about 200 seconds to index etc.
I am pretty sure that you can achieve much faster indexing rates with better hardware.

I hope this gives you a hint, and that someone else can give you a better answer.
OnDemand for MP expert. #Multiplatforms #Admin #Scripts #Performance #Support #Architecture #PDFIndexing #TSM/SP #DB2 #CustomSolutions #Integration #UserExits #Migrations #Workflow #ECM #Cloud #ODApi