Author Topic: Generic Indexer For MS-Word  (Read 4148 times)

SunnyManeeth

  • Guest
Generic Indexer For MS-Word
« on: June 09, 2014, 05:06:43 AM »
Hi Team,

Am working on Generic Indexer and want to load an MS-Word Document.

The word document has around 100 pages i want to find out the Page Length, Page Offset of the Document. Is there any option to find these things by using ars commands.

Thanks
Sunny :)

Frederick Tybalt

  • Full Member
  • ***
  • Posts: 124
    • View Profile
    • Personal Website
Re: Generic Indexer For MS-Word
« Reply #1 on: June 09, 2014, 07:33:56 AM »
If you need to load the entire word document, the offset will be 0 and length will be the byte size of the word file. If it is a segmented document provide the offset and length accordingly
rIcK
======------------------======
www.rick.co.in | www.tekbytz.com

SunnyManeeth

  • Guest
Re: Generic Indexer For MS-Word
« Reply #2 on: June 10, 2014, 02:41:22 AM »
Thanks Frederick,

I thought of writing a java utility so that i can read page by page and gives the length and offset of the page.

Is there any tool so that we can know about the length and offset of the Document pages because we deal with the many number of pages in a single document.
 
Is there any best scenarios so that i can work on that. Can you suggest me on that.

Thanks
Sunny :)

Frederick Tybalt

  • Full Member
  • ***
  • Posts: 124
    • View Profile
    • Personal Website
Re: Generic Indexer For MS-Word
« Reply #3 on: June 10, 2014, 07:36:10 AM »
Not sure if there is any utility that would split and calculate the document size. If you know the in and out of word docs, using Java would be a best option.
rIcK
======------------------======
www.rick.co.in | www.tekbytz.com

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Generic Indexer For MS-Word
« Reply #4 on: June 10, 2014, 08:10:08 AM »
I thought of writing a java utility so that i can read page by page and gives the length and offset of the page.

You can't just carve out a portion of a Word file, and expect Word to know what to do with it.  Consider converting the file to AFP or PDF first.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

SunnyManeeth

  • Guest
Re: Generic Indexer For MS-Word
« Reply #5 on: June 11, 2014, 10:10:08 PM »
Thanks Frederick and Justin,

I'll look into this, hope i will get the best scenario.

Thanks
Sunny :)

LWagner

  • Guest
Re: Generic Indexer For MS-Word
« Reply #6 on: June 30, 2014, 03:18:23 PM »
Sunny:

How are your results ?

If you converted the Word doc to PDF, you could then use arspdump to dump text of the PDF so you can deteremine the location of any text you want to use for index values.

But you still won't be able to spit the PDF , unless it is a CONTAINER type PDF with many PDF document files in it.

SunnyManeeth

  • Guest
Re: Generic Indexer For MS-Word
« Reply #7 on: July 02, 2014, 04:28:38 AM »
Hi LWagner,

I have done this in two scenarios
     (1) Converting the Doc to PDF, so that we can use the Report indexing to load the document to the CMOD.
     (2) Splliting the Document into different pages and then load it to the CMOD using Generic indexer.
But both works as the same, its better to convert the Document into PDF and then load it to the system.

Thanks
Sunny :)