Author Topic: PDF indexer (with PPD) - multi-document PDF, how to set unique title for docs  (Read 2242 times)

adamosullivan

  • Newbie
  • *
  • Posts: 4
    • View Profile
Hello,

We are using the PDF indexer 10.5 on a Windows Server, loading multi-document PDF files, using the Page-Piece Dictionary.

How can we set the PDF Title value to something unique for each individual document (originally in the multi-document PDF) when loaded into CMOD?

(To view the Title value I am referring to, in Adobe, if you right click on the document and select "Document Properties", on the first tab "Description", there is a field for "Title:")

This Title value appears to apply to the entire PDF (in our case, an entire multi-document PDF).  However, we would like each of these individual documents (originally from the multi-document PDF) to have unique title values when loaded into CMOD.

For example:  if we have a multi-document PDF file (multi.pdf), containing individual documents doc-a, doc-b, and doc-c.

multi.pdf
  doc-a
  doc-b
  doc-c

We index / load multi.pdf, and then retrieve doc-a.  We would want the PDF title value for doc-a to be "Some Doc A title", if we retrieve doc-b, its title should be "Different title for Doc B", and so on.

I've been reading through the documentation for the page-piece dictionary and indexing parameters, but have not come across much related to this scenario.

Thanks in advance!

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2229
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Since the CMOD PDF Indexer doesn't modify these fields, whichever tool that produces your PDFs would have to perform this change.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

adamosullivan

  • Newbie
  • *
  • Posts: 4
    • View Profile
Thanks for the quick reply!  (and my apologies in advance if I am overlooking something -- I am not an expert on the PDF format)

In that case, if I have 100% control over the creation of the input multi-document PDF, how would this work -- what changes would I make?  As I understand it, there is only 1 title field for any PDF file (including the example multi-doc PDF file).

If that is correct, how would I store the unique titles for each individual document (e.g. doc-a & doc-b) contained within the multi-doc PDF file?  (I am not aware of support for this in the PDF standards, but I could have missed it).

Ed_Arnold

  • Hero Member
  • *****
  • Posts: 1200
    • View Profile
Perhaps index by the PDF metadata?

If each document has a unique title it should be easy to set up.

Google it and if you need more detail ask here.

Ed
#zOS #ODF

adamosullivan

  • Newbie
  • *
  • Posts: 4
    • View Profile
Ed -- thanks.  I have been looking at the index by PDF metadata option as well, but it appears we run into the same issue:  All PDF files (including the multi-doc PDF files we are processing) have only a single set of this metadata (from the PDF's Info Dictionary) for the entire file.

The IBM documentation for metadata indexing seems to confirm this:  "Because the metadata keywords apply to the entire document, you can index the document only as one group."   (https://www.ibm.com/support/knowledgecenter/SSQHWE_10.1.0/com.ibm.ondemand.ir.doc/dodix005.htm)


In my testing, I have found that after indexing/loading a multi-doc PDF file, when the individual documents are later retrieved, they each have an identical copy of the input multi-doc PDF file's metadata fields (e.g. Title, Author, Keywords, etc.).

However, our input multi-doc PDF files often contain multiple different types of documents, (which, of course, have different titles).

Put another way, I can easily access the title string for each individual document using the PDF indexer, but I don't know how to put that string into the Title metadata field for the individual documents (or if that is even possible with the current PDF Indexer functionality)

Based on what Justin posted earlier, it sounds like the PDF indexer does not modify these metadata fields.  Perhaps setting these metadata fields (such as Title) on the individual documents is not possible with multi-document input PDF files?

Ed_Arnold

  • Hero Member
  • *****
  • Posts: 1200
    • View Profile
Adam -

Understood.

Not my strong area, but perhaps the generic indexer is going to have to be the way to go.

Yes, you'll have to figure out he beginnings and offsets of each individual doc.

Ed
#zOS #ODF

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2229
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Sorry to go all the way back to the original post...  But if you want the window (that a retrieved document is displayed in) to have a specific name related to the metadata, you can enable this checkbox:
 Folder Window -> Field Information Tab -> Select the field name-> Defaults Pane-> View Title.

Check the online help in the Admin Client for more info.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR