Author Topic: Generic indexer  (Read 3825 times)

pankaj.puranik

  • Guest
Generic indexer
« on: April 13, 2011, 09:05:22 AM »
Hi

If I have multiple documents in the input file then I would have to specify the GROUP_OFFSET and GROUP_LENGTH.
Suppose I have an input file with multiple word documents, how can I find the values for each GROUP_OFFSET and GROUP_LENGTH.

Thanks
Pankaj.

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1002
    • View Profile
Re: Generic indexer
« Reply #1 on: April 13, 2011, 01:04:02 PM »
Hello Pankaj,

if you have 1 index file, and several separate word documents, then your index file might look like that:

Code: [Select]
CODEPAGE:923

COMMENT:DOCUMENT 1
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word1.doc

COMMENT:DOCUMENT 2
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word2.doc

COMMENT:DOCUMENT 3
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word3.doc

COMMENT:DOCUMENT 4
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word4.doc

Well, if you have all the word files concatenated together, then you need and must know the offset and length of each file inside the concatenated file.

Code: [Select]
CODEPAGE:923

COMMENT:DOCUMENT 1
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:1000
GROUP_FILENAME:wordsingle.concat

COMMENT:DOCUMENT 2
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:1001
GROUP_LENGTH:1203
GROUP_FILENAME:wordsingle.concat

COMMENT:DOCUMENT 3
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:2205
GROUP_LENGTH:800
GROUP_FILENAME:wordsingle.concat

COMMENT:DOCUMENT 4
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:3006
GROUP_LENGTH:997
GROUP_FILENAME:wordsingle.concat

But if you don't have the offset/length.... then you must ask the people who provided you with this file. OR you need to know exactly how a word file is structured and find it with some tools.

Cheers,
Alessandro
« Last Edit: April 16, 2011, 02:29:53 AM by AlessandroPerucchi »
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Generic indexer
« Reply #2 on: April 15, 2011, 06:23:53 AM »
Minor Correction Alessandro...

In your second sample, the first GROUP_LENGTH is 1000, the next GROUP_OFFSET needs to be incremented by 1 -- so, 1001.  You've got this mistake throughout your example.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1002
    • View Profile
Re: Generic indexer
« Reply #3 on: April 16, 2011, 02:30:32 AM »
Minor Correction Alessandro...

In your second sample, the first GROUP_LENGTH is 1000, the next GROUP_OFFSET needs to be incremented by 1 -- so, 1001.  You've got this mistake throughout your example.

-JD.

Hello Justin,

Thanks, I've corrected the example!

Cheers,
Alessandro
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML