OnDemand User Group

Support Forums => Report Indexing => Topic started by: DDP021 on March 09, 2022, 08:56:05 AM

Title: Bad Character in data file
Post by: DDP021 on March 09, 2022, 08:56:05 AM

We have a file coming in that has a special character (A with a squiggly line above it) on an index field. When this character is on the data it causes the file to fail (see below). Once we edit the file and simply replace the A that had the line above it with a normal A, the file loads. Has anyone else had issues with special characters and loading?

Row 2776: The string "BWATER PURE AL TRD CO LTDÃ‚ HSBCUS" has a length of 38 and the field has a maximum length of 37

Title: Re: Bad Character in data file
Post by: jsquizz on March 09, 2022, 09:43:35 AM

I saw this a few months back, I think it's because there's a blank space after the special character - And it's increasing your field past the 38 threshold. I think unfortunately we were in a hurry, and i just recreated the app group and increased the length.

I'm not an ACIF indexing expert but I am sure that if you're using it - you can do some fancy footwork to accomodate.

Title: Re: Bad Character in data file
Post by: DDP021 on March 09, 2022, 11:14:51 AM

Thanks...We had JUST recreated a new APP GRP because of another index field they wanted to increase the size of. Hate you cant just go in to the current APP GRP and increase fields lengths on the fly

Title: Re: Bad Character in data file
Post by: jsquizz on March 14, 2022, 07:42:48 AM

Quote from: DDP021 on March 09, 2022, 11:14:51 AM

Thanks...We had JUST recreated a new APP GRP because of another index field they wanted to increase the size of. Hate you cant just go in to the current APP GRP and increase fields lengths on the fly

Yeah agree. Not a DBA, but I know there's limitations/restrictions to increasing column length and things like that.

I've usually pushed back on the business but hey - they pay the bills.

Title: Re: Bad Character in data file
Post by: DDP021 on March 14, 2022, 07:49:05 AM

Title: Re: Bad Character in data file
Post by: Ed_Arnold on March 14, 2022, 09:17:10 AM

Echoing what others have already said ---

We have seen this usually with a German "sharp S" ---- let me see if I can cut and paste one in here:

What is sharp S in German?
Image result for german sharp s
In German orthography, the letter ß, called Eszett (IPA: [ɛsˈtsɛt] ess-TSET) or scharfes S (IPA: [ˌʃaʁfəs ˈʔɛs], lit. "sharp S"),

When loading it'll often get converted to "ss" or one extra character. If the field is defined as length 10, and the sharp s makes it go to 11, then boom!

Ed

Title: Re: Bad Character in data file
Post by: Darrell Bryant on March 15, 2022, 07:45:56 AM

For our IBM i customers, we tell them the following about indexing and UTF-8 databases, and this probably applies to all platforms:

Some characters will use more than one byte when stored in a UTF-8 field
Latin lowercase and uppercase characters [a-z] [A-Z] and Arabic numerals [0-9] use only one byte
Accented characters might use two bytes
For languages such as Greek, Russian, & Arabic, we recommend that you create application group string fields with double the length you would use if the instance did not support UTF-8
For other languages, if your index values contain accented characters, you will need to make the fields longer
For example, ẞ requires two bytes
DBCS characters might use two or three bytes
When setting up new reports you might need to increase the field length determined by the graphical indexer to accommodate accented characters

Title: Re: Bad Character in data file
Post by: Justin Derrick on March 15, 2022, 11:02:33 PM

Just confirming what Ed and Darrell are saying here - in situations where I've moved customers from CMOD databases defined in the local codepage to Unicode, I had to increase the length of fields across the board to make room for double-byte characters. I'm looking at YOU, Norway... with all your extra vowels. :D

-JD.

Title: Re: Bad Character in data file
Post by: DDP021 on March 16, 2022, 02:51:02 AM

Thanks to everyone for their input. Like I'm sure most of you have run into, allot of the times the requestor isn't always totally aware of their data! What we ended up doing is renaming the existing AppGrp/App setups to different names. We had to keep them because of previous loaded data to them. We then created new definitions under the original names increasing the field length on the field causing the issue.