Author Topic: Bad Character in data file  (Read 1596 times)

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Bad Character in data file
« on: March 09, 2022, 08:56:05 AM »
We have a file coming in that has a special character (A with a squiggly line above it) on an index field.  When this character is on the data it causes the file to fail (see below).  Once we edit the file and simply replace the A that had the line above it with a normal A, the file loads.  Has anyone else had issues with special characters and loading?


  Row 2776:  The string "BWATER PURE AL TRD CO LTD  HSBCUS" has a length of 38 and the field has a maximum length of 37

jsquizz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
Re: Bad Character in data file
« Reply #1 on: March 09, 2022, 09:43:35 AM »
I saw this a few months back, I think it's because there's a blank space after the special character - And it's increasing your field past the 38 threshold. I think unfortunately we were in a hurry, and i just recreated the app group and increased the length.

I'm not an ACIF indexing expert but I am sure that if you're using it -  you can do some fancy footwork to accomodate.
#CMOD #DB2 #AFP2PDF #TSM #AIX #RHEL #AWS #AZURE #GCP #EVERYTHING

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Bad Character in data file
« Reply #2 on: March 09, 2022, 11:14:51 AM »
Thanks...We had JUST recreated a new APP GRP because of another index field they wanted to increase the size of.  Hate you cant just go in to the current APP GRP and increase fields lengths on the fly

jsquizz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 576
    • View Profile
Re: Bad Character in data file
« Reply #3 on: March 14, 2022, 07:42:48 AM »
Thanks...We had JUST recreated a new APP GRP because of another index field they wanted to increase the size of.  Hate you cant just go in to the current APP GRP and increase fields lengths on the fly

Yeah agree. Not a DBA, but I know there's limitations/restrictions to increasing column length and things like that.

I've usually pushed back on the business but hey - they pay the bills.
#CMOD #DB2 #AFP2PDF #TSM #AIX #RHEL #AWS #AZURE #GCP #EVERYTHING

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Bad Character in data file
« Reply #4 on: March 14, 2022, 07:49:05 AM »
 ;D

Ed_Arnold

  • Hero Member
  • *****
  • Posts: 1199
    • View Profile
Re: Bad Character in data file
« Reply #5 on: March 14, 2022, 09:17:10 AM »
Echoing what others have already said ---

We have seen this usually with a German "sharp S" ---- let me see if I can cut and paste one in here: 

What is sharp S in German?
Image result for german sharp s
In German orthography, the letter ß, called Eszett (IPA: [ɛsˈtsɛt] ess-TSET) or scharfes S (IPA: [ˌʃaʁfəs ˈʔɛs], lit. "sharp S"),


When loading it'll often get converted to "ss" or one extra character.  If the field is defined as length 10, and the sharp s makes it go to 11, then boom!

Ed
#zOS #ODF

Darrell Bryant

  • Full Member
  • ***
  • Posts: 104
  • Sed fugit interea fugit inreparabile tempus-Virgil
    • View Profile
Re: Bad Character in data file
« Reply #6 on: March 15, 2022, 07:45:56 AM »
For our IBM i customers, we tell them the following about indexing and UTF-8 databases, and this probably applies to all platforms:

  • Some characters will use more than one byte when stored in a UTF-8 field
  • Latin lowercase and uppercase characters [a-z] [A-Z] and Arabic numerals [0-9] use only one byte
  • Accented characters might use two bytes
  • For languages such as Greek, Russian, & Arabic, we recommend that you create application group string fields with double the length you would use if the instance did not support UTF-8
  • For other languages, if your index values contain accented characters, you will need to make the fields longer
    For example, ẞ requires two bytes
  • DBCS characters might use two or three bytes
  • When setting up new reports you might need to increase the field length determined by the graphical indexer to accommodate accented characters
#IBMi #iSeries #PDF #XML #400 Indexer #ASM

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2229
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Re: Bad Character in data file
« Reply #7 on: March 15, 2022, 11:02:33 PM »
Just confirming what Ed and Darrell are saying here - in situations where I've moved customers from CMOD databases defined in the local codepage to Unicode, I had to increase the length of fields across the board to make room for double-byte characters.  I'm looking at YOU, Norway... with all your extra vowels.  :D

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

DDP021

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Bad Character in data file
« Reply #8 on: March 16, 2022, 02:51:02 AM »
Thanks to everyone for their input.  Like I'm sure most of you have run into, allot of the times the requestor isn't always totally aware of their data!   What we ended up doing is renaming the existing  AppGrp/App setups to different names.  We had to keep them because of previous loaded data to them.  We then created new definitions under the original names increasing the field length on the field causing the issue.