Author Topic: PDF Indexing - Page Piece Dictionary  (Read 1793 times)

sisusteve

  • Guest
PDF Indexing - Page Piece Dictionary
« on: February 11, 2020, 12:18:47 PM »
Just curious if anyone has performed some timing differences between normal PDF indexing (triggers and fields) versus CMOD indexes in the PDF Page Piece Dictionary?

We have some very large PDF files 500,000 - 1,000,000+ pages that currently take about 3-4 hours to index and load.

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: PDF Indexing - Page Piece Dictionary
« Reply #1 on: February 11, 2020, 12:51:08 PM »
I haven't tested myself, but I've been told it's extremely fast -- similar to loading AFP data that are pre-indexed with TLEs.  I'll see if I can get Bud to chime in with some real-world numbers.  :)

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

jsquizz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 577
    • View Profile
Re: PDF Indexing - Page Piece Dictionary
« Reply #2 on: February 12, 2020, 08:13:34 AM »
Just curious if anyone has performed some timing differences between normal PDF indexing (triggers and fields) versus CMOD indexes in the PDF Page Piece Dictionary?

We have some very large PDF files 500,000 - 1,000,000+ pages that currently take about 3-4 hours to index and load.

I've seen bud demo PPD's several times at workshops. He did it on a VM on his laptop and it was very very quick.

Assuming on a RHEL box with plenty of CPU/Memory, it will FLY. I myself, have not had the chance to play with it unfortunately.
#CMOD #DB2 #AFP2PDF #TSM #AIX #RHEL #AWS #AZURE #GCP #EVERYTHING

fnb4321

  • Guest
Re: PDF Indexing - Page Piece Dictionary
« Reply #3 on: February 12, 2020, 09:06:49 AM »
  I have done some timings.  We used to have some medium sized credit card PDF files.

They took about 40 minutes to load using the PDF Indexer X/Y coordinates.

We had the vendor convert to PPD files and the same size files take less than 5 minutes to load.

We noticed a HUGE difference loading PPD files as opposed to using X/Y coordinates