95
« on: October 08, 2015, 07:27:37 AM »
Hi,
We have now implemented an Application that actually uses the "newest" way to index PDF files, namely using PagePiece info.
Environment is Windows 2008, DB2 9.7 and CMOD Server 9.0.0.6. TSM is on a separate W2K8 server.
We have successfully delivered several batches/files of 20000 documents to OnDemand, and so far no problems.
(Well, to be fair, it was nothing BUT trouble on CMOD 9.0.0.3, we had to install the 9.0.0.6 fix pack to remedy that.)
As this is our first document type that uses that, I would say we are still beginners, but it works like a charm!
We are currently looking into some performance issues.
1. Indexing takes surprisingly long time - I suspect this is because of that the PDF is still slightly compressed, although the creator has indeed tried to switch compression off. Also, the PDF files are "linearized" after they are created. (The performance is not bad at all, but it takes nearly 4 minutes to index where I was expecting around 0,5-1 minute.)
2. One of the main benefits with using PDF Indexing is the resource collection. So far however, we have been unable to make OnDemand recognize that this is the same resources in each batch - it says "New" on every load.
At first, we had subsets of (customized) character sets/fonts made for every batch, but when we turned that off and sent the full set(s) with every batch, it still did not catch it - every file is deemed to have its own unique set of PDF resources.
Has anyone already solved this type of tuning problem?
I would be happy to hear your info on how to tune this solution to perfection. In return, I will be happy to share our experiences of this!
I will try to attach an image to this post. The lines highlighted in blue have full font sets and "no PDF compression". The lines below that use compression and have subsetted font sets.