Author Topic: Translating names and values for configuration objects with SQL Server  (Read 263 times)

daodug

  • Newbie
  • *
  • Posts: 4
    • View Profile
Hi,

I'm trying to set up some translations of folder names, field names and field values on a 10.5 system with Remote SQL Server and I'm having issues with ?? values being stored in the database and displayed in the admin UI after entering valid  names in non-latin languages (eg Arabic, Japanese, Chinese, Korean).

The documentation is verbose yet doesn't explain what the system is doing.  As it's a remote SQL server, the documentation is sparse in describing the installation and any dependencies.  The database server is an existing one (SQL Server 2016) that works fine with other products (eg IBM FileNet) in correctly storing and retrieving double byte characters.  The database system cannot be changed easily, so before I do anything dramatic I'd like to understand what the system is doing, what else I may have missed, and if anyone else is already using these translation features in the administration tool.

I've tried different database collations.  One document I found showed SQL_Latin1_General_CP1_CI_AS being used.  I can't find any documentation that describes what the collation for SQL Server should be.  The other databases on the same server that work with DBCS are using Latin1_General_CI_AS.  That same collation also doesn't work for OnDemand.  The same behaviour occurs, the data can be entered and displays correctly, when it's saved to the DB it's shown as ??, when it's retrieved in the UI, it's a series of ??.  My system is in Sydney, so the base locale info is en-au.  All of the latin languages (eg spanish, french, german) work.

I've discussed with a colleague the same actions on an OnDemand with DB2.  They work fine.  It seems to be some additional or different settings needed for SQL Server.

With all that background:
1. Has anyone got an English based OnDemand system using SQL Server that has any double byte character translations for Cabinet, Folder, Hold, Field or Field Value entries that persists the data correctly?  I'm using 10.5 but any indication on any version will give me more of an idea what the issue might be.
2. There are lots of pieces of documentation around code pages but no sensible definition of where and how they're used, exactly.  There's also lots of references to configuration files that don't exist anymore in the documentation.  Without some solid documentation I'm not ready to hack too much more.  Is this where I'm going wrong?

I think it's code page related, but the doco is truly awful.  There's no complete discussion that covers my corner case and I'm not ready to waste more time guessing.

Any guidance is greatly appreciated.

David

Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2029
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Sounds like your CMOD database isn't in unicode.  Can you confirm the codepage that the Content Manager OnDemand database was created with?

-JD.
IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

daodug

  • Newbie
  • *
  • Posts: 4
    • View Profile
Hi Justin,

I created the database following https://www.ibm.com/support/pages/creating-remote-microsoft-sql-server-database-ibm%C2%AE-content-manager-ondemand.

There's no mention of the collation, so as the install worked for double byte with FileNet, I left it alone.  The original collation was Latin1_General_CI_AS.  That didn't work.
I created a new DB with SQL_Latin1_General_CP1_CI_AS.  I found a reference (see PDF attachment https://www.ibm.com/support/pages/node/616131) that showed that as a selection on install.  It doesn't work for me.

The documentation https://www.ibm.com/support/knowledgecenter/SSEPCD_10.5.0/com.ibm.ondemand.installmp.doc/dodnl005.htm doesn't provide absolute facts.  It describes vague information and then more vague examples.  It describes that if you had this value then some other thing would be required, but doesn't provide either guidance on where the value is stored, how it is changed or what the other thing that might be required actually is, or what the possible values are.  Very obtuse, poorly constructed, incomplete and confusing documentation.

With SQL Server there is no 'code page' thing.  There's a collation.  There's no documentation I can find that specifies what the collation should be for any given OnDemand configuration.  My SQL Server install doesn't support UTF-8, without me upgrading, which I really don't want to do, yet, but I don't know if it's using UTF-8 either. Latin characters are fine, double byte ones are not.  How do you tell  it to do something different?  The documentation references ARS_ORIGINAL_CODEPAGE but wanders around from one release to another with possible indeterminate values.  It looks like for 10.5 a value of 0 is required.  There's no documentation that describes what value 0 means, or what other values are possible or what they do.

I've been down several rabbit holes already, rebuilding the DB, hacking columns to NVARCHAR, and without some absolute definition I'm just wasting my time.

There's no definition of what OnDemand is doing when it inserts data from the admin tool into the DB. How do you configure OnDemand to use a specific codepage?  Do I need to? How does that map to any concrete value in SQLServer?  It should be a trivial thing and it should be the same regardless of it being a remote SQL, so I'm hoping that someone that uses DBCS has some guidance on what the configuration values that work are.

Thanks,
David


Justin Derrick

  • IBM Content Manager OnDemand Consultant
  • Administrator
  • Hero Member
  • *****
  • Posts: 2029
  • CMOD Guru for hire...
    • View Profile
    • Tenacious Consulting
Wow...  This is way deep down in the weeds...  While I haven't worked with a CMOD + MSSQL server before, take this with a grain of salt...

The big-picture solution is to make the move to UTF-8.  It looks like you'll need to upgrade to begin resolving the issue.

The ARS_ORIGINAL_CODEPAGE parameter was introduced because historically, the database was built and configured to reflect the local language - so in Norway, your database would be created by default with a Norwegian codepage, and your data would be loaded with the Norwegian codepage, so everything worked...  Setting ARS_ORIGINAL_CODEPAGE to something other than zero allows some backwards compatibility for CMOD servers that were built before Unicode/UTF-* were a feature.

I've done database conversions before (Norwegian specifically!) and they were tricky and required a lot of testing / tweaking in order to work properly and find all of the edge cases -- my favourite edge case was a customer name fields that filled the entire field, and contained two double-byte characters, breaking one out of every several hundred loads, because the data wouldn't fit in the database field, even though it passed whatever testing arsload was doing. 

The best advice I have is to test your migration to UTF-8 throughly, and possibly consider doing a data migration to a whole new system if time and budgets permit.  You'll get the benefit of getting access to all the new features in CMOD v10.x as a bonus.  :)

IBM CMOD Professional Services: http://TenaciousConsulting.com
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Education & Webinars:  https://CMOD.Training/

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR