OnDemand User Group

Support Forums => Report Indexing => Topic started by: Steve Bechtolt on February 11, 2020, 12:18:47 PM

Title: PDF Indexing - Page Piece Dictionary
Post by: Steve Bechtolt on February 11, 2020, 12:18:47 PM
Just curious if anyone has performed some timing differences between normal PDF indexing (triggers and fields) versus CMOD indexes in the PDF Page Piece Dictionary?

We have some very large PDF files 500,000 - 1,000,000+ pages that currently take about 3-4 hours to index and load.
Title: Re: PDF Indexing - Page Piece Dictionary
Post by: Justin Derrick on February 11, 2020, 12:51:08 PM
I haven't tested myself, but I've been told it's extremely fast -- similar to loading AFP data that are pre-indexed with TLEs.  I'll see if I can get Bud to chime in with some real-world numbers.  :)

-JD.
Title: Re: PDF Indexing - Page Piece Dictionary
Post by: jsquizz on February 12, 2020, 08:13:34 AM
Just curious if anyone has performed some timing differences between normal PDF indexing (triggers and fields) versus CMOD indexes in the PDF Page Piece Dictionary?

We have some very large PDF files 500,000 - 1,000,000+ pages that currently take about 3-4 hours to index and load.

I've seen bud demo PPD's several times at workshops. He did it on a VM on his laptop and it was very very quick.

Assuming on a RHEL box with plenty of CPU/Memory, it will FLY. I myself, have not had the chance to play with it unfortunately.
Title: Re: PDF Indexing - Page Piece Dictionary
Post by: fnb4321 on February 12, 2020, 09:06:49 AM
  I have done some timings.  We used to have some medium sized credit card PDF files.

They took about 40 minutes to load using the PDF Indexer X/Y coordinates.

We had the vendor convert to PPD files and the same size files take less than 5 minutes to load.

We noticed a HUGE difference loading PPD files as opposed to using X/Y coordinates