Author Topic: Indexing a .txt file  (Read 8792 times)

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Indexing a .txt file
« on: March 27, 2013, 03:16:37 AM »
Does anyone know how to index a .txt or .csv file when using the Generic indexer?...We know they need to send a .ind file where indexing information would be contained...Most of our reports currently come from the mainframe..Which have carriage control F1 (ie 1 in column 1) for each top of page...We set up the indexer based on triggers, fields, indexes..(TRIGGER1=*,1,X'F1',(TYPE=GROUP)...Not sure how to accomplish that when there is no set page breaks if they are sending a .txt, .csv, .xls type file...Currently they only thing the users are overriding on the index file is POSTING DATE..I guess I'm asking if you can somehow define a top of page, or break, on these kinds of files..

CODEPAGE:1252
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:09/07/09
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload4/XTS0730A.XTS00001_0000005903.ARD.OUT

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Indexing a .txt file
« Reply #1 on: March 27, 2013, 03:59:40 AM »
You can use ACIF for text files, however you're right in that a regular text file may be missing a lot of the features you're accustomed to finding in a mainframe report.

You may still be able to trigger on things that appear in the file...  "Page 1 ", or a 'Top-Of-Form' character (Ctrl-L), etc.

It might make more sense to write your own indexer in the scripting language of your choice.  Perl is specifically designed for this.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Indexing a .txt file
« Reply #2 on: March 27, 2013, 07:44:47 AM »
Thanks Derek as always!...Yep, we currently use ACIF for most of our new mainframe reports we setup on CMOD...Was just wondering if there was a way to somehow translate how ACIF works (with Triggers, Fields, Indexes) to "sort" or "Index" the data using the .ind file.....I know they can "Hard Code" specific fields (ie Date) on their .ind file they send to us..But that is simple since that would always be the same per version....Currently anyone sending these types of files are happy with just 'dumping' their file and then opening it up in CMOD with the appropriate program (ie excel, word) and then doing a FIND on what they are looking for..Just trying to be pro-active incase we get a user who wants to send these types of files and asked if they can be indexed..I'm in no way a script language person...haha....Thanks again...Take care!!

pankaj.puranik

  • Sr. Member
  • ****
  • Posts: 374
    • View Profile
Re: Indexing a .txt file
« Reply #3 on: March 28, 2013, 07:15:24 AM »
Pay special attention to the extra space at the end of  "Page 1 ", or you would end up breaking the report on "Page 11"  :)

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Indexing a .txt file
« Reply #4 on: April 30, 2013, 04:43:56 AM »
Here is an attachment showing the .ind file and data file (.txt) the user is sending to our server..They are sending multiple data (.txt) files to load at one time..They are also sending a Trigger file...Below here is a screen shot of a partial index (.ind) file they send..As you can see, they are overriding the Posting Date for each file...They are asking of we can index these file on some  type of criteria (not sure what that would be so when they bring up the loaded regions in CMOD, they can then do a search on this field to bring up any associated version that meets this search...Right now we are just "dumping" each of these files into ERR which allow them to do a "find" on anything but limits them to just the version they are viewing at  the  time....We have these reports setup in CMOD to open up in WORD..Without these files having any "page breaks" we don't think it's even possible to do any kind of indexing..Is that correct?...Another monkey wrench to throw in this process is the data is in German...;-)

I've also added another attachment showing how the versions show up in CMOD...

CODEPAGE:1252
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:12/31/11
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload4/YAW55001_111231_0086_0000000186.TXT
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:12/16/11
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload4/YAW55001_111216_0045_0000006845.TXT
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:12/16/11
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload4/YAW55001_111216_0045_0000012145.TXT
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:12/16/11
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload4/YAW55001_111216_0045_0000004745.TXT
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:12/16/11
GROUP_OFFSET:0





DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Indexing a .txt file
« Reply #5 on: April 30, 2013, 05:23:49 AM »
To add, is there a way to be able to do a a "find" across multiple versions for doing a search?....Currently when they do a find they have first open up a version to view it, then do a find...Curious if there was a way to be able to do a "find" across multiple versions...Is this possible?

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Indexing a .txt file
« Reply #6 on: April 30, 2013, 06:42:07 AM »
You can add server side search to the folder if you think the users won't abuse it. 
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Indexing a .txt file
« Reply #7 on: May 02, 2013, 04:24:28 AM »
hmmmm.....Well I received a personal email from someone (see below)...They mention something about server side also...How would this be accomplished and what is the risk of doing something like this?..I know you mentioned about users not "abusing" it...;-)....

Hi David,

I have seen the german reports. Do you need some support in Germany / Europe?

According to the reports you want to combine to be able to do one search I think the most simple way is to collect them p.e. in one daily line data report and to index the combined report with acif indexer. Generic Indexer will be possible as well. This can also be done on the server side before loading the data.

regards Egon

Greg Ira

  • Full Member
  • ***
  • Posts: 240
    • View Profile
Re: Indexing a .txt file
« Reply #8 on: May 02, 2013, 07:46:20 AM »
When defining the folder you can create a field of type "Text Search".  This will allow the user to search for non indexed text in every document when entering their search criteria and will return any document that contains the requested text.  The problem with this is it basically does a full tablespace scan every time you run this and can drive CPU use through the roof and kill response times.

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2230
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Indexing a .txt file
« Reply #9 on: May 02, 2013, 08:01:10 AM »
Yes, it would be possible to combine the reports you want to search into a single one, then load it, but that would duplicate the data, leading to a higher cost of ownership.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

ewirtz

  • Full Member
  • ***
  • Posts: 134
    • View Profile
Re: Indexing a .txt file
« Reply #10 on: May 02, 2013, 08:06:21 AM »
Hi David,
the idea ist to concatenate the files, which ist ease if they are line data recorde. With this technique you can index p.e. with the generic indexer (excerpt from ondemand documentation):


COMMENT: One input file contains all documents
COMMENT:
COMMENT: Specify code page of the index data
CODEPAGE:500
COMMENT: Document #1G
ROUP_FIELD_NAME:rdate
GROUP_FIELD_VALUE:07/13/99
GROUP_FIELD_NAME:studentID
GROUP_FIELD_VALUE:0012345678
COMMENT: first document starts at beginning of file (byte 0)
GROUP_OFFSET:0
COMMENT: document length 8124 bytes
GROUP_LENGTH:8124
GROUP_FILENAME:ARS.ACCT.STUDENT.INF.LOAD.OUTPUT

COMMENT: Document #2
GROUP_FIELD_NAME:rdate
GROUP_FIELD_VALUE:08/13/99
GROUP_FIELD_NAME:studentID
GROUP_FIELD_VALUE:0012345678
COMMENT: second document starts at byte 8124
GROUP_OFFSET:8124
COMMENT: document length 8124 bytes
GROUP_LENGTH:8124
COMMENT: use prior GROUP_FILENAME:
GROUP_FILENAME:

COMMENT: Document #3
GROUP_FIELD_NAME:rdate
GROUP_FIELD_VALUE:09/13/99
GROUP_FIELD_NAME:studentID
GROUP_FIELD_VALUE:0012345678
COMMENT: third document starts at byte 16248
GROUP_OFFSET:16248
COMMENT: use prior GROUP_FILENAME:
GROUP_FILENAME:

This concatination could be performed during creating the documents on the client side or on the server side before loading the documents.

regards Egon

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Indexing a .txt file
« Reply #11 on: May 07, 2013, 03:18:58 AM »
Thanks Egon.....The user has over 200 Applications they need setup....The number of "loads" (ie Files) for each vary from less than 10 to over 4000....The reports also vary in retention...We've defined an Application Group for each needed retention period and  then load the appropriate file(s) under that Application Group depending on their retention need...We first started with their 1 yr retention reports...Most of them had over 200 "versions"...So we had them send them in groups of 50...So they would send 50 data files and then send their index file (see sample index file below) along with the Trigger...I'm not sure how they would "concatinate" the files..From your example it appears they are sending only one data file combining all the files and then indexing by the  group offset where the separation is for each file..For one, not sure how they determine what the GROUP_OFFSET number is???..How is the GROUP_OFFSET number found?..Secondly, I don't know if it's possible/feasible for them to even combine ALL their files into one, especially when, like I said, some of the total number of files exceed 4000...My guess is, in order for this to work, all the combined files need to  be sent all at  the same  time for each Application..Is that correct?...Thanks again for all the input!!

CODEPAGE:1252
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:09/07/09
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload3/XTS0730A.XTS00001.1
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:11/08/10
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload3/XTS0730A.XTS00001.2
GROUP_FIELD_NAME:Posting_Date
GROUP_FIELD_VALUE:12/09/08
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:/prod/ode/cmod/arsload3/XTS0730A.XTS00001.3

ewirtz

  • Full Member
  • ***
  • Posts: 134
    • View Profile
Re: Indexing a .txt file
« Reply #12 on: May 16, 2013, 01:29:59 AM »
Hi David,

the concatination is a simple technique task, which can be made on the server side.

example:
doc1 1000 Bytes ==> offset 0
doc2 2000 Bytes ==> offset 1000
(after conacatination)

You need the concatination to fullfill the business needs.

Of course only these document shall be concatinated that shall be searched with one search.

regards

Egon