Author Topic: PDF Indexing - Page Piece Dictionary  (Read 1703 times)

Steve Bechtolt

  • Jr. Member
  • **
  • Posts: 56
    • View Profile
PDF Indexing - Page Piece Dictionary
« on: February 11, 2020, 12:18:47 PM »
Just curious if anyone has performed some timing differences between normal PDF indexing (triggers and fields) versus CMOD indexes in the PDF Page Piece Dictionary?

We have some very large PDF files 500,000 - 1,000,000+ pages that currently take about 3-4 hours to index and load.
Steve Bechtolt
IBM Certified Solutions Expert - IBM Content Management - OnDemand Multiplatform
ERM as a Service - DXC Technology

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2229
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: PDF Indexing - Page Piece Dictionary
« Reply #1 on: February 11, 2020, 12:51:08 PM »
I haven't tested myself, but I've been told it's extremely fast -- similar to loading AFP data that are pre-indexed with TLEs.  I'll see if I can get Bud to chime in with some real-world numbers.  :)

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

jsquizz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
Re: PDF Indexing - Page Piece Dictionary
« Reply #2 on: February 12, 2020, 08:13:34 AM »
Just curious if anyone has performed some timing differences between normal PDF indexing (triggers and fields) versus CMOD indexes in the PDF Page Piece Dictionary?

We have some very large PDF files 500,000 - 1,000,000+ pages that currently take about 3-4 hours to index and load.

I've seen bud demo PPD's several times at workshops. He did it on a VM on his laptop and it was very very quick.

Assuming on a RHEL box with plenty of CPU/Memory, it will FLY. I myself, have not had the chance to play with it unfortunately.
#CMOD #DB2 #AFP2PDF #TSM #AIX #RHEL #AWS #AZURE #GCP #EVERYTHING

fnb4321

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
Re: PDF Indexing - Page Piece Dictionary
« Reply #3 on: February 12, 2020, 09:06:49 AM »
  I have done some timings.  We used to have some medium sized credit card PDF files.

They took about 40 minutes to load using the PDF Indexer X/Y coordinates.

We had the vendor convert to PPD files and the same size files take less than 5 minutes to load.

We noticed a HUGE difference loading PPD files as opposed to using X/Y coordinates