OnDemand User Group

Support Forums => Report Indexing => Topic started by: Douglas on August 08, 2018, 02:02:40 PM

Title: PDF Page Piece info - help
Post by: Douglas on August 08, 2018, 02:02:40 PM: Hello

I am getting a 1000 PDFs and an Index file. I can successfully store the 1000 PDFS using the Generic Indexer, it is quick and dirty. (1000 PDFs, totaling 182MBs, stores in 26 seconds and compresses to 116MBs)

The PDFs also have the Page Piece Info. So when I run a test using the PDF indexer on a batch of 100 Documents compressed into 1 PDF file (totaling 14MBs, it stores in 73 seconds, and compresses to 18 MBs)

For a batch of 250 documents compressed into 1 PDF file (input PDF is 35 MBs, it stores in 512 seconds and compresses to 99Mbs)

Why are my PDFs drastically increasing their size after they are stored when I am using Page Piece Info?
And why are they taking do long to store?

Before                  After        Time
1000 PDFs 182Mbs 116 MBs     26
1 PDF (100 Documents)    14MBS      18 MS        73
1 PDF (250 Documents)    35MBS      99 MS        512
Title: Re: PDF Page Piece info - help
Post by: Nolan on August 08, 2018, 05:28:26 PM: If I remember what Bud said, you should not have compression enabled. PDF is already compressed so try turning it off and run your tests again.
Title: Re: PDF Page Piece info - help
Post by: Justin Derrick on August 09, 2018, 03:52:55 AM: I think the compression that Bud was talking about is the internal PDF compression, not the data & resource compression settings in the Application Definition.

I'd be opening a PMR with IBM if my files tripled in size during a load. :)

-JD.
Title: Re: PDF Page Piece info - help
Post by: Douglas on August 09, 2018, 05:26:04 AM: The PDFs are not compressed, they are just merged together

I merge them to:

1) treat the load as 1 load instead of multiple loads
2) hopefully maximize resource stripping and compression when CMOD stores the PDF
Title: Re: PDF Page Piece info - help
Post by: Nolan on August 09, 2018, 06:06:51 AM: I still think you need to try running the load with Compression turned off in the Load tab of the Application id. We also had that issue of increasing storage size but that was back on Z/OS.
Title: Re: PDF Page Piece info - help
Post by: Douglas on August 09, 2018, 07:33:44 AM: I did run a load test with DOC compressions set to None and Disabled, and others

I am running on ZOS.... 9.5.0.1

Comp type   Input        Output         Rows
OD77        104964644    98569326        250
OD77 19441637 18070032       100
None        602067       602675          10
Disable     602067       602676        10
Disable     19441645     19445485       100
LZW12     19441649     25353227     100
LZW16     19441634     24814181       100
Title: Re: PDF Page Piece info - help
Post by: Ed_Arnold on August 09, 2018, 09:09:39 AM: Quote from: Douglas on August 09, 2018, 07:33:44 AM

Comp type   Input         Output         Rows
OD77        104,964,644    98,569,326        250
OD77 19,441,637 18,070,032         100
None        602,067 602,675          10
Disable       602,067 602,676         10
Disable     19,441,645 19,445,485         100
LZW12 19,441,649 25,353,227       100
LZW16 19,441,634 24,814,181 100

Those are the types of numbers I'd expect to see if the input was already compressed.

Ed
Title: Re: PDF Page Piece info - help
Post by: Nolan on August 09, 2018, 09:30:34 AM: Two comments. You should patch up to 9.5.0.10 and I am certain that PDF indexing on Z/OS is not supported at V9.5. You should be running the PDF indexing on a Windows or Linux box and pushing it in to the Z/OS server.
Title: Re: PDF Page Piece info - help
Post by: Douglas on August 09, 2018, 11:50:55 AM: I will take a look at v9.5.0.10

I am loading from Windows -> to ZOS
Title: Re: PDF Page Piece info - help
Post by: Nolan on August 13, 2018, 12:18:01 PM: Rereading your original post makes me wonder how you are 'merging' your PDFs into one file. It sounds like the tool you are using is not really merging but just concatenating the PDFs. I suspect that is what is throwing the indexer off. I haven't used such tools so I can be sure but I believe the recommendation from Bud is that the source application should create 1 large PDF document with the PPD details to allow for the maximum compression.