Author Topic: New Line Character  (Read 8731 times)

Gobi21

  • Guest
New Line Character
« on: July 19, 2013, 06:55:50 AM »
Hi All,

I have a field definition for one of the variable as "string" in one of the application group.  In the input file, source team has added a newline character in that field. We ingested the files in CMOD and received error message "Invalid Generic format".

Is it due to the newline character added in the field?.. Normally CMOD will accept the newline character?.

Your help is very much appreciated. Thanks.

pankaj.puranik

  • Guest
Re: New Line Character
« Reply #1 on: July 24, 2013, 11:11:49 AM »
If you are using a Generic Indexer, you shoul dnot be having this issue.
Could you give more details about the problem.

Gobi21

  • Guest
Re: New Line Character
« Reply #2 on: July 25, 2013, 12:39:52 AM »
Hi Pankaj,

It is using generic index only.
Thanks for the response. Please find the below parsed IND file record snap which is used by arsload. CMOD is getting stucked up while reading this record.

Key><KeyName>COUNTERPRTY_NAME</KeyName>
<KeyValue><![CDATA[legal_
name0000001]]></KeyValue></Key>
<Key><KeyName>ENTITY_ID</KeyName>
<KeyValue></KeyValue></Key>


Let me know if you need any other detail.

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1002
    • View Profile
Re: New Line Character
« Reply #3 on: July 26, 2013, 08:18:59 AM »
Hello Gobi21,

hmmmmmmmmmmmmmmmmmmmmmmmm what is this format that you are using?

a normal Generic Index looks like that:

Code: [Select]
CODEPAGE:1208
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:fieldN
GROUP_FIELD_VALUE:valueN
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:MyFileWithPath

If you have some other format, then it won't work at all.

And to use the generic index format, you need to call "arsload" with your index file WITHOUT the .ind extension, otherwise you will have problems :-D

Maybe, you could be a more specific on what you do exactly? Because from what I see on the glimpse of information you gave us, something is wrong in your way to index documents.

Sincerely yours,
Alessandro
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

Gobi21

  • Guest
Re: New Line Character
« Reply #4 on: July 29, 2013, 02:14:51 AM »
Hi Alessandro,

Thanks for your reply. Here are the details.

I am not passing the .IND in the arsload command.

You can have a look in the IND file details and you can see the parsed line where we have issue in the processing.
IND File Format:
------------------
CODEPAGE:1208
COMMENT:------- Document 1 -------
GROUP_FIELD_NAME:COUNTERPRTY_NAME
GROUP_FIELD_VALUE:legal_name0000001
.
.
.
GROUP_OFFSET:10
GROUP_LENGTH:3870
GROUP_FILENAME:EGZATEST.TEST.DN.20130609.S001.V002.U002.Data.txt


Record in Issue:
-------------------
COMMENT:------- Document 7 -------
GROUP_FIELD_NAME:COUNTERPRTY_NAME
GROUP_FIELD_VALUE:Testi_
name6248320
GROUP_FIELD_NAME:ENTITY_ID
GROUP_FIELD_VALUE:
.
.
.

In the CMOD Admin, i am seeing the indexing as "Generic" for this Application.

we have the input file received from application team with the lines parsed as above and hence the IND file is also created in the same format. I want to know whether arsload will be able to process the IND file even if have parsed lines as above.(You may see that the GROUP_FIELD_VALUE value is available in two lines instead of one!!).. is it causing the issue in the CMOD processing?.

The documents till the parsed one are getting ingested sucessfully and the arsload throwed the "Invalid generic index file format: >name6248320<" while processing the parsed document. I am able to see the Load ID created in the log file for this processing.


Thanks again for your support.

Regards,
Gobi


Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: New Line Character
« Reply #5 on: July 29, 2013, 04:27:36 AM »
Either the source system needs to fix the index values (and remove newlines), or you need to fix the generic index after you receive it.

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1002
    • View Profile
Re: New Line Character
« Reply #6 on: July 29, 2013, 04:39:42 AM »
Hello Gobi21,

As Justin said, these are the only way to handle this kind of generic index.

These kind of new-lines are not supported in Generic index.

Sincerely yours,
Alessandro
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

Gobi21

  • Guest
Re: New Line Character
« Reply #7 on: July 29, 2013, 04:41:48 AM »
I will work to correct the Index file after the creation.

Thanks a lot Alessandro and Justin for your inputs.

Cheers,
Gobi

Gobi21

  • Guest
Re: New Line Character
« Reply #8 on: July 29, 2013, 07:09:59 AM »
Hi Justin,

We have this information notified in CMOD Documentation?. Customer is requesting reference from CMOD as evidence.

please help me if you have this detail.

Thanks,
Gobi


Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: New Line Character
« Reply #9 on: July 29, 2013, 07:50:41 AM »
The documentation can't anticipate all the ways in which bad data will prevent data from being loaded.  You can check the Indexing Reference for more information on the Generic Index Format.

The error message is proof that the file is broken.  Correcting the issue, then performing a successful load confirms that the newline in the middle of a field is the cause.

Good luck!

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

pankaj.puranik

  • Guest
Re: New Line Character
« Reply #10 on: July 29, 2013, 01:12:33 PM »
I think you can ask the customer back as to why are they inserting this new line character?
It doesn't make any sense. It clearly indicates a malfunction while they are producting it.

GROUP_FIELD_VALUE:Testi_
name6248320
GROUP_FIELD_NAME:ENTITY_ID

It should have been

GROUP_FIELD_VALUE:Testi_name6248320
GROUP_FIELD_NAME:ENTITY_ID

You can manually correct the IND ingest it and show them as a proof.

Gobi21

  • Guest
Re: New Line Character
« Reply #11 on: July 30, 2013, 12:14:02 AM »
Hi All,

Thanks for your valuable suggestions and comments.


Cheers,
Gobi

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2231
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: New Line Character
« Reply #12 on: July 30, 2013, 09:10:21 AM »
Don't forget to update us on the progress & solution!

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

Alessandro Perucchi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1002
    • View Profile
Re: New Line Character
« Reply #13 on: July 31, 2013, 01:43:53 AM »
Well I've been looking at the documentation, and here is what is written:

http://pic.dhe.ibm.com/infocenter/cmod/v9r0m0/topic/com.ibm.ondemand.indexingmp.doc/ars1d171333.htm?path=11_0_3_1_3_1#wq417
Code: [Select]
Options and values

The character string GROUP_FIELD_VALUE: identifies the line as containing an index value for an application group field. The string value specifies the actual index value for the field.

It says "identifies the line" and not "the lines", so this is a hint, that your value has only 1 line, and not several.

http://pic.dhe.ibm.com/infocenter/cmod/v9r0m0/topic/com.ibm.ondemand.indexingmp.doc/ars1d171318.htm?path=11_0_3_1#gi_params
Code: [Select]
To use the Generic indexer, you must create a parameter file that contains the indexing information for the input files. This section describes the parameter file that is used by the Generic indexer.

There are three types of statements that you can specify in a parameter file:

    Comments. You can place a comment line anywhere in the parameter file.
    Code page. You must specify a code page line at the beginning of the parameter file, before you define any groups.
    Groups. A group represents a document that you want to index. Each group contains the application group field names and their index values, the location of the document in the input file, the number of bytes (characters) that make up the document, and the name of the input file that contains the document.

Important:

    The parameter names in the parameter file are case sensitive and must appear in upper case. For example, GROUP_FIELD_NAME:account is valid, while group_field_name:account is not.
    When loading data using the Generic indexer, the locale must be set appropriately for the CODEPAGE: parameter. For example, if CODEPAGE:954 is specified, set the locale environment variable to ja_JP or some other locale that correctly identifies upper and lower case characters in code page 954.

Here it written, that the parameter file (index file) can have 3 types of statements:
  • comment
  • code page
  • groups

And per definition a line is a statement. so if we have something like

Code: [Select]
GROUP_FIELD_VALUE:Hello
World
GROUP_FIELD_NAME:...
then the "World is not a correct statement.


Well I'm reading the documentation with my knowledge of the product, and knowing how it behaves :-) I know what it implicitly means... that why I'm reading between the lines. :-D

And if you look there is NO example with included newline value for the index files, not a single one. This is maybe also a sign, that you cannot use multi-line values :-D

Hope that helps a little bit!! :-)

Sincerely yours,
Alessandro
« Last Edit: July 31, 2013, 01:45:37 AM by AlessandroPerucchi »
Alessandro Perucchi

#Install #Migrations #Conversion #Educate #Repair #Upgrade #Migrate #Enhance #Optimize #AIX #Linux #Multiplatforms #DB2 #Windows #Oracle #TSM #Tivoli #Performance #Audits #Customizing #Availability #HA #DR #JavaApi #ContentNavigator #ICN #WEBi #ODWEK #Services #PDF #AFP #XML

Gobi21

  • Guest
Re: New Line Character
« Reply #14 on: August 01, 2013, 05:49:30 AM »
Sure Justin. I have requested the source team to correct the file delivered from them. I have also informed the CMOD requirements and hopefully they will accept the changes in their side.

Hi Alessandro,

That helps a lot  :).. Thanks for your time and effort. very much appreciate..


Cheers,
Gobi