Author Topic: Issues opening PDF files above 800 MB (Read 3555 times)

bruce.mchendry · « **on:** December 03, 2015, 10:37:08 AM »

Hi,
We are still on MP ver 9.0.0.3 and due to varied input streams we use the "generic" PDF indexer. We have an issue with files opening that are over 800 MB in size. I do have a PMR running on this but I'm thinking about exploring a couple different angles. Using the desktop client the large files give an error that simple says
"Connection Can Not Be Established with xxxx Server". If we try in ICN it generates an error " File Does Not Begin with "%PDF-\" Local\EWH-10048-0". Interestingly this all pertains to PDF files and if I grab the PDF manually it opens fine in Adobe Reader ( and we are set up for it to do this in ICN )
Yesterday I had a user contact me that while using the desktop thick client opening a large PDF he got an OnDemand error " \plugins\ARSPDF.API could not be loaded" Now what did get my interest is that this user, and others, have both Adobe Reader and Acrobat Pro on their machines and for security reasons Pro just got updated to ver 11 (XI) which includes Distiller XI.
I'm wondering if anyone else has seen these error's or has had any experience with issues with users who have Adobe Reader and Acrobat Pro installed ? It's a long shot but I know we've seen issues on the past with PDF's and users who have both. I'm working with the desktop team to see if this is consistent with all the users who access these large PDF's or it's a coincidence. Appreciate any guidance or suggestions.
Cheers,
Bruce

Alessandro Perucchi · « **Reply #1 on:** December 03, 2015, 08:52:09 PM »

Hello,

I don't know if the problem is more on the server side, or the client side.
When you said that you could see the PDF document after getting it. How did you get this PDF? In CMOD? If yes, how did you extract it? Via CMOD Windows Client or with "arsdoc get" ?? Or via another method?
If not from CMOD, it means that you still have the original file.

If the "arsdoc get" command was used to retrieve the document, then the document is "valid" in CMOD, and then the problem is more in the client side.
If the document is not valid with "arsload get", then the problem is probably with the server side.

I've checked the corrections done in the client / server / ODWEK since V9.0.0.3, and I have found that:

Server side corrections: (source http://www-01.ibm.com/support/docview.wss?uid=swg27046038&aid=1)
PI07517 - PDF INDEXER AND ARSPDOCI ABEND
PI10297 - Arsload can crash when indexing PDF files larger than 150MB
PI38617 - Error Code 536936462 when running PDF Indexer

ODWEK (for ICN) (source http://www-01.ibm.com/support/docview.wss?uid=swg27046038&aid=3)
I have found nothing relevant here...

Windows Client side (source for V9.0.0.4 and V9.5.0.0 http://www-01.ibm.com/support/docview.wss?uid=swg27044156&aid=5)
I have found nothing relevant here....

Windows Client side (source for V9.5.0.3 http://www-01.ibm.com/support/docview.wss?uid=swg27046127&aid=3)
I have found nothing relevant here....

From what I can see, there are only corrections on the server side for the PDF Indexer.
Is this related to your problem, I have no idea, maybe it is not.
And maybe the problem is more in the client side.

In that case, if your suspicion is correct, would it be possible to test to see a document with a workstation that has ONLY the Acrobat Reader, and then ONLY the Acrobat Pro?

From my experience in the past, it was always a bad idea to have both Acrobat Reader and Acrobat Pro installed at the same time on a system because of some conflicts between the 2 products. And CMOD was using always the wrong one, especially if acrobat reader was open, then the Windows Client was not using Acrobat Pro... or things like that.
Now that was more than 5 years ago... since then I never heard anything on that subject. So maybe today this is not a problem anymore, or maybe it is still...
But from my memory, the error messages were different from what you get. So I am not sure that the problem I had, and the one you have are related.

Hope that my rumbling gives you some hints...

bruce.mchendry · « **Reply #2 on:** December 06, 2015, 03:19:28 PM »

Thanks for the reply and information, I have some more reading to do.
Cheers

bruce.mchendry · « **Reply #3 on:** January 07, 2016, 06:26:34 AM »

As this is still an ongoing issue I have had a PMR running with IBM for a bit. I got this in an email last night and will be working on it shortly. Just thought I'd share for reference. I'm also preparing to hit the dev environment by the end of the month to start migrating to ver 9.5 and have been reading there are improvements to the handling of PDF files in ver 9.5

Here's an excerpt from the email :
REMOVERES (page 297)
Indicates whether or not to remove unused resources before the indexer collects resources and creates the indexes.
The input file is examined and a new copy is saved in the Content Manager OnDemand temporary directory. This new copy is then used for processing, and
the original input file is not changed. You can change the location of the temporary directory by specifying the PDF parameter TEMPDIR. Ensure that the temporary
directory has enough space to hold the file. If a file contains many unused resources, you can greatly reduce the size of the resource file and speed up the
indexing process by using this parameter. If a file does not contain any unused resources, then do not specify this parameter. You can use this parameter without
resource collection.
Tip: Because this parameter rewrites the input file, it can be used to repair minor syntax errors in the PDF.
Required? No
Default Value NO
Syntax - REMOVERES=value
Options and values
The value can be one of the following:
YES
The unused resources are removed before the indexer collects resources (if requested) and creates the indexes.
NO
The unused resources are not removed before the indexer collects resources (if requested) and creates the indexes

Justin Derrick · « **Reply #4 on:** January 07, 2016, 08:23:22 AM »

I don't think I've ever seen an 800MB PDF. I have a morbid curiousity about what's inside that PDF that makes it so large.

If you open up the PDF File with Adobe Acrobat (the full version), you should have the option to 'optimize' it, and one of the buttons will give you a breakdown as to where the storage is allocated (images, fonts, data, etc.).

Would you mind sharing the breakdown of that file with us?

bruce.mchendry · « **Reply #5 on:** January 07, 2016, 09:19:54 AM »

Hi,
I will try this out and see what happens. Then I'll know if I can share, a lot of these docs are considered personal-confidential so I'll see if I can share or not. Thanks for the suggestion, lets see what I get. btw The doc / PDF has almost 13,000 pages.
Cheers

Alessandro Perucchi · « **Reply #6 on:** January 08, 2016, 05:36:04 AM »

Quote from: JBNC on January 07, 2016, 08:23:22 AM

I don't think I've ever seen an 800MB PDF. I have a morbid curiousity about what's inside that PDF that makes it so large.

in some industries the size of a PDF can be bigger than 2GB, and that's something normal... so 800MB is not something surprising.
So if you have like Bruce said more than 13'000 pages, and you get some images/graphics in some pages, etc... then it will nicely grow into huge proportion.

That's why, personally, I don't care about the breakdown of it at all.
What could be interesting would be the topology of the PDF, like a map of what resource is used where, and which image is used where, etc... then it will give you a nice idea on how to optimize it, or what could be problems or...
But to get only a table saying: images 50%, data 20%, fonts 30%.... well except for the "morbid curiosity"

that doesn't help to solve any problems!

Cheers, and happy new year!!!!!!!!!!!

OnDemand User Group

News:

Author Topic: Issues opening PDF files above 800 MB (Read 3555 times)

bruce.mchendry

Issues opening PDF files above 800 MB

Alessandro Perucchi

Re: Issues opening PDF files above 800 MB

bruce.mchendry

Re: Issues opening PDF files above 800 MB

bruce.mchendry

Re: Issues opening PDF files above 800 MB

Justin Derrick

Re: Issues opening PDF files above 800 MB

bruce.mchendry

Re: Issues opening PDF files above 800 MB

Alessandro Perucchi

Re: Issues opening PDF files above 800 MB