OnDemand User Group
Support Forums => Report Indexing => Topic started by: tjspencer2 on April 09, 2015, 07:24:46 AM
-
We are outsourcing the createment of statements and we're using PDF files and PDF Indexer.
My vendor provided me a file of PDF statements that they said had 3039 statements.
When I load the statements into CMOD via PDF Indexer, I only get 3038??
We are using "Page 1 of" as our trigger to uniquely identify the first page of a statement.
When I open their file in Adobe X and do an advanced search I find 3039 instances of the phrase "Page 1 of"
But PDF Indexer only loads 3038 statements??
I've looked at this phrase across all 3039 statements but can only get 3038 to load.
Has anybody ever encountered this? How could I troubleshoot? I'm baffled!! :(
-
I've seen this before and it was a headache to troubleshoot.
You mentioned that you've looked at this phase across all 3039 statements, but just out of curiousity..tucked Away in this document, could there be a statement with two pages? In the case I saw, it was the to: address had extra lines..
Examples
This would load one page/statement working as expected
Jeff S
1234 Main St
Anytown CA, 90210
Something like this would cause the statement to pour over into the next page:
Jeff S
CMOD Person
1234 Main St
Anytown CA, 92010
Jeff S
1234 Main St
Suite 234
Anytown CA, 92010
-
I think there's somethign to the first statement rolling over onto the first page of the duplicate statement and indexer not interpreting the first page of the 2nd statement as a new statement.
Just so I'm straight on uniqueness, there's nothing ensuring uniqueness of statements right?
The statement that isn't loading is completely identical to the one that precedes it.
-
Depending on your version of CMOD, uniqueness may be enforced automatically.
-
So uniqueness isn't our issue - as there are some accounts for which we create two statements for in CMOD.
What's happening is that there are a couple of accounts for which we generate the same statement multiple times and create in CMOD.
For some instances of these, the "Page 1 of" trigger isn't being interpreted as the beginning of a new statement but instead as the continuation of an existing statement.
In our 167,000 statements this happens 5 times and the result is for these 5 statements, they're combined with the statement ahead of them :(
Is there a way to analyze the PDF file to see control characters that may not be visible in the PDF file when viewing it? Is it even possible for a control character to be in a PDF file and that not be visible?
-
Is it possible for you to share the indexing script/parameter information that you used?