Uploaded image for project: 'Solution Center'
  1. Solution Center
  2. SOL-34

Correct settings for ODBC driver

    XMLWordPrintable

    Details

    • Type: Workaround
    • Status: Published
    • Affects Version/s: EXASolution 4.2.0, EXASOL 6.0.0, Exasol 6.1.0, EXASolution 5.0.0
    • Fix Version/s: None
    • Component/s: 3rd Party Tools, ODBC
    • Labels:
      None
    • Environment:
      Informatica PowerCenter. Windows 2008R2
    • Symptoms:
      Hide

      Reading special characters with Informatica, beyond the German or English ANSI codepage, results in character strings like "Ã, ¨".

      Show
      Reading special characters with Informatica, beyond the German or English ANSI codepage, results in character strings like "Ã, ¨".
    • Explanation:
      Hide

      Informatica allows to configure whether the ANSI interfaces or unicode interfaces of the ODBC driver are to be used.
      On the technical level of the ODBC-API, this means whether to call function SQLPrepareA (ANSI) or SQLPrepareW (Unicode).
      Other examples would be SQLConnectA/SQLExecDirectA versus SQLConnectW/SQLExecDirectW.

      The W-functions are using wchar_t data types. In Microsoft Windows, these are encoded by UTF-16 little-endian (previously, for Windows NT, it was UCS-2). Thus, there should not occur any problems under Windows as long as the unicode interfaces are used, because each application under Windows should respect this "convention".

      Show
      Informatica allows to configure whether the ANSI interfaces or unicode interfaces of the ODBC driver are to be used. On the technical level of the ODBC-API, this means whether to call function SQLPrepareA (ANSI) or SQLPrepareW (Unicode). Other examples would be SQLConnectA/SQLExecDirectA versus SQLConnectW/SQLExecDirectW. The W-functions are using wchar_t data types. In Microsoft Windows, these are encoded by UTF-16 little-endian (previously, for Windows NT, it was UCS-2). Thus, there should not occur any problems under Windows as long as the unicode interfaces are used, because each application under Windows should respect this "convention".
    • Solution:
      Hide

      Informatica has to be configured to use unicode interfaces via UTF-8 encoding. Additionally, it is necessary to enforce the ODBC driver to apply non-standard UTF-8 encoding for its W-interfaces and to avoid any further issues for its the A-interfaces as well.

      This could be done via ODBC connection string:

      AnsiArgEncoding=CP_UTF8;AnsiDataEncoding=CP_UTF8;UnicodeArgEncoding=CP_UTF8;UnicodeDataEncoding=CP_UTF8
      

      Alternatively, the Windows registry can be used to apply this setting on system level for the Informatica Windows server.
      For an exemplary ODBC DSN "myexaodbcdsn", the following REG file would do the job:

      Windows Registry Editor Version 5.00
      
      [HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\myexaodbcdsn]
      "AnsiArgEncoding"="CP_UTF8"
      "AnsiDataEncoding"="CP_UTF8"
      "UnicodeArgEncoding"="CP_UTF8"
      "UnicodeDataEncoding"="CP_UTF8"
      

      For testing the correct behaviour a vanilla table can be used.
      The table was created via JDBC, with EXAplus/JDBC/Java there had not been any unicode problems.

      create schema s;
      open schema s;
      create table t(v varchar(100));
      insert into t values 'öäü';
      insert into t values 'è';     -- http://www.i18nqa.com/debug/bug-utf-8-latin1.html 
      insert into t values '‰€œ';  -- http://www.i18nqa.com/debug/bug-iso8859-1-vs-windows-1252.html
      insert into t values 'ÁÍÏÐÝ'; -- http://www.i18nqa.com/debug/bug-double-conversion.html
      select * from t;
      
      Show
      Informatica has to be configured to use unicode interfaces via UTF-8 encoding. Additionally, it is necessary to enforce the ODBC driver to apply non-standard UTF-8 encoding for its W-interfaces and to avoid any further issues for its the A-interfaces as well. This could be done via ODBC connection string: AnsiArgEncoding=CP_UTF8;AnsiDataEncoding=CP_UTF8;UnicodeArgEncoding=CP_UTF8;UnicodeDataEncoding=CP_UTF8 Alternatively, the Windows registry can be used to apply this setting on system level for the Informatica Windows server. For an exemplary ODBC DSN "myexaodbcdsn", the following REG file would do the job: Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\myexaodbcdsn] "AnsiArgEncoding"="CP_UTF8" "AnsiDataEncoding"="CP_UTF8" "UnicodeArgEncoding"="CP_UTF8" "UnicodeDataEncoding"="CP_UTF8" For testing the correct behaviour a vanilla table can be used. The table was created via JDBC, with EXAplus/JDBC/Java there had not been any unicode problems. create schema s; open schema s; create table t(v varchar(100)); insert into t values 'öäü'; insert into t values 'è'; -- http://www.i18nqa.com/debug/bug-utf-8-latin1.html insert into t values '‰€œ'; -- http://www.i18nqa.com/debug/bug-iso8859-1-vs-windows-1252.html insert into t values 'ÁÍÏÐÝ'; -- http://www.i18nqa.com/debug/bug-double-conversion.html select * from t;
    • Category 1:
      3rd Party Tools
    • Category 2:
      Clients, Interfaces & Drivers - ODBC

      Attachments

        Issue Links

          Activity

            People

            • Assignee:
              CaptainEXA Captain EXASOL
              Reporter:
              CaptainEXA Captain EXASOL
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: