Author Topic: Generic Indexer For MS-Word (Read 4148 times)

SunnyManeeth · « **on:** June 09, 2014, 05:06:43 AM »

Hi Team,

Am working on Generic Indexer and want to load an MS-Word Document.

The word document has around 100 pages i want to find out the Page Length, Page Offset of the Document. Is there any option to find these things by using ars commands.

Thanks
Sunny

Frederick Tybalt · « **Reply #1 on:** June 09, 2014, 07:33:56 AM »

If you need to load the entire word document, the offset will be 0 and length will be the byte size of the word file. If it is a segmented document provide the offset and length accordingly

SunnyManeeth · « **Reply #2 on:** June 10, 2014, 02:41:22 AM »

Thanks Frederick,

I thought of writing a java utility so that i can read page by page and gives the length and offset of the page.

Is there any tool so that we can know about the length and offset of the Document pages because we deal with the many number of pages in a single document.

Is there any best scenarios so that i can work on that. Can you suggest me on that.

Thanks
Sunny

Frederick Tybalt · « **Reply #3 on:** June 10, 2014, 07:36:10 AM »

Not sure if there is any utility that would split and calculate the document size. If you know the in and out of word docs, using Java would be a best option.

Justin Derrick · « **Reply #4 on:** June 10, 2014, 08:10:08 AM »

Quote from: Sunny on June 10, 2014, 02:41:22 AM

I thought of writing a java utility so that i can read page by page and gives the length and offset of the page.

You can't just carve out a portion of a Word file, and expect Word to know what to do with it. Consider converting the file to AFP or PDF first.

-JD.

SunnyManeeth · « **Reply #5 on:** June 11, 2014, 10:10:08 PM »

Thanks Frederick and Justin,

I'll look into this, hope i will get the best scenario.

Thanks
Sunny

LWagner · « **Reply #6 on:** June 30, 2014, 03:18:23 PM »

Sunny:

How are your results ?

If you converted the Word doc to PDF, you could then use arspdump to dump text of the PDF so you can deteremine the location of any text you want to use for index values.

But you still won't be able to spit the PDF , unless it is a CONTAINER type PDF with many PDF document files in it.

SunnyManeeth · « **Reply #7 on:** July 02, 2014, 04:28:38 AM »

Hi LWagner,

I have done this in two scenarios
(1) Converting the Doc to PDF, so that we can use the Report indexing to load the document to the CMOD.
(2) Splliting the Document into different pages and then load it to the CMOD using Generic indexer.
But both works as the same, its better to convert the Document into PDF and then load it to the system.

Thanks
Sunny

OnDemand User Group

News:

Author Topic: Generic Indexer For MS-Word (Read 4148 times)

SunnyManeeth

Generic Indexer For MS-Word

Frederick Tybalt

Re: Generic Indexer For MS-Word

SunnyManeeth

Re: Generic Indexer For MS-Word

Frederick Tybalt

Re: Generic Indexer For MS-Word

Justin Derrick

Re: Generic Indexer For MS-Word

SunnyManeeth

Re: Generic Indexer For MS-Word

LWagner

Re: Generic Indexer For MS-Word

SunnyManeeth

Re: Generic Indexer For MS-Word