[en] Solution to Postgres JDBC driver ignoring charSet directive with SQL_ASCII database

„New” Postgres JDBC drivers check database engine version and if it’s not below 7.3 they will ignore charSet declaration. It breaks encoding of 8-bit characters if database has SQL_ASCII encoding (which basically means DONT_CARE_ANYTHING_GOES). If your database is set to SQL_ASCII and in reality it stores e.g. Windows-1250 (CP1250) characters, JDBC driver will assume incorrect encoding and diacritical characters will be broken.

Even though I understand the reasons given by Postgres developers for this (SQL_ASCII is NOT supposed to be used with anything other than 7-bit characters and it is obsolete), the solution they’re suggesting (to dump the db, and convert it to UTF-8) is not always possible. For instance, I would have a hard time telling my customer to do it, especially because their old software might break.

So I’ve patched the JDBC driver to accept charSet directive in JDBC URLs and also added a method setCharset to PGSimpleDataSource.

Here you can download it: Postgres JDBC driver – charSet patched

5 myśli nt. „[en] Solution to Postgres JDBC driver ignoring charSet directive with SQL_ASCII database

  1. I downloaded your patched PostgreSQL JDBC driver, and used it with my old SQL_ASCII database. Everything worked great!! Thank you very much!! Any chance you could post or send me the source code with your modifications? Thanks in advance.

  2. Hi Eduardo,

    I’m glad it works for you. I’ve only patched 3 files, here goes the diff:

    # This patch file was generated by NetBeans IDE
    # This patch can be applied using context Tools: Apply Diff Patch action on respective folder.
    # It uses platform neutral UTF-8 encoding.
    # Above lines and this line are ignored by the patching process.
    Index: pgjdbc/org/postgresql/core/ConnectionFactory.java
    --- pgjdbc/org/postgresql/core/ConnectionFactory.java Base (1.10)
    +++ pgjdbc/org/postgresql/core/ConnectionFactory.java Locally Modified (Based On 1.10)
    @@ -30,7 +30,7 @@
          * connection is returned.
          */
         private static final Object[][] versions = {
    -                { "3", new org.postgresql.core.v3.ConnectionFactoryImpl() },
    +//                { "3", new org.postgresql.core.v3.ConnectionFactoryImpl() },
                     { "2", new org.postgresql.core.v2.ConnectionFactoryImpl() },
                 };
     
    Index: pgjdbc/org/postgresql/core/v2/ConnectionFactoryImpl.java
    --- pgjdbc/org/postgresql/core/v2/ConnectionFactoryImpl.java Base (1.24)
    +++ pgjdbc/org/postgresql/core/v2/ConnectionFactoryImpl.java Locally Modified (Based On 1.24)
    @@ -395,29 +395,29 @@
     
             protoConnection.setServerVersion(dbVersion);
     
    -        if (dbVersion.compareTo("7.3") >= 0)
    -        {
    -            // set encoding to be unicode; set datestyle; ensure autocommit is on
    -            // (no-op on 7.4, but might be needed under 7.3)
    -            // The begin/commit is to avoid leaving a transaction open if we're talking to a
    -            // 7.3 server that defaults to autocommit = off.
    -
    -            if (logger.logDebug())
    -                logger.debug("Switching to UTF8 client_encoding");
    -
    -            String sql = "begin; set autocommit = on; set client_encoding = 'UTF8'; ";
    -            if (dbVersion.compareTo("9.0") >= 0) {
    -                sql += "SET extra_float_digits=3; ";
    -            } else if (dbVersion.compareTo("7.4") >= 0) {
    -                sql += "SET extra_float_digits=2; ";
    -            }
    -            sql += "commit";
    -
    -            SetupQueryRunner.run(protoConnection, sql, false);
    -            protoConnection.setEncoding(Encoding.getDatabaseEncoding("UTF8"));
    -        }
    -        else
    -        {
    +//        if (dbVersion.compareTo("7.3") >= 0)
    +//        {
    +//            // set encoding to be unicode; set datestyle; ensure autocommit is on
    +//            // (no-op on 7.4, but might be needed under 7.3)
    +//            // The begin/commit is to avoid leaving a transaction open if we're talking to a
    +//            // 7.3 server that defaults to autocommit = off.
    +//
    +//            if (logger.logDebug())
    +//                logger.debug("Switching to UTF8 client_encoding");
    +//
    +//            String sql = "begin; set autocommit = on; set client_encoding = 'UTF8'; ";
    +//            if (dbVersion.compareTo("9.0") >= 0) {
    +//                sql += "SET extra_float_digits=3; ";
    +//            } else if (dbVersion.compareTo("7.4") >= 0) {
    +//                sql += "SET extra_float_digits=2; ";
    +//            }
    +//            sql += "commit";
    +//
    +//            SetupQueryRunner.run(protoConnection, sql, false);
    +//            protoConnection.setEncoding(Encoding.getDatabaseEncoding("UTF8"));
    +//        }
    +//        else
    +//        {
                 String charSet = info.getProperty("charSet");
                 String dbEncoding = (results[1] == null ? null : protoConnection.getEncoding().decode(results[1]));
                 if (logger.logDebug())
    @@ -442,7 +442,7 @@
                     // XXX is this ever reached?
                     protoConnection.setEncoding(Encoding.defaultEncoding());
                 }
    -        }
    +//        }
     
             if (logger.logDebug())
                 logger.debug("Connection encoding (using JVM's nomenclature): " + protoConnection.getEncoding());
    Index: pgjdbc/org/postgresql/ds/common/BaseDataSource.java
    --- pgjdbc/org/postgresql/ds/common/BaseDataSource.java Base (1.25)
    +++ pgjdbc/org/postgresql/ds/common/BaseDataSource.java Locally Modified (Based On 1.25)
    @@ -48,6 +48,7 @@
         private String databaseName;
         private String user;
         private String password;
    +    private String charset;
         private int portNumber = 0;
         private int prepareThreshold = 5;
         private int unknownLength = Integer.MAX_VALUE;
    @@ -484,6 +485,9 @@
             if (compatible != null) {
                 sb.append("&compatible="+compatible);
             }
    +        if (charset != null) {
    +            sb.append("&charSet="+charset);
    +        }
             if (applicationName != null) {
                 sb.append("&ApplicationName=");
                 sb.append(applicationName);
    @@ -537,6 +541,10 @@
             {
                 ref.add(new StringRefAddr("compatible", compatible));
             }
    +        if (charset != null)
    +        {
    +            ref.add(new StringRefAddr("charset", charset));
    +        }
     
             ref.add(new StringRefAddr("logLevel", Integer.toString(logLevel)));
             ref.add(new StringRefAddr("protocolVersion", Integer.toString(protocolVersion)));
    @@ -566,6 +574,7 @@
             out.writeBoolean(binaryTransfer);
             out.writeObject(binaryTransferEnable);
             out.writeObject(binaryTransferDisable);
    +        out.writeObject(charset);
         }
     
         protected void readBaseObject(ObjectInputStream in) throws IOException, ClassNotFoundException
    @@ -589,6 +598,7 @@
             binaryTransfer = in.readBoolean();
             binaryTransferEnable = (String)in.readObject();
             binaryTransferDisable = (String)in.readObject();
    +        charset = (String)in.readObject();
         }
     
         public void initializeFrom(BaseDataSource source) throws IOException, ClassNotFoundException {
    @@ -601,4 +611,12 @@
             readBaseObject(ois);
         }
     
    +    public String getCharset() {
    +        return charset;
     }
    +
    +    public void setCharset(String charset) {
    +        this.charset = charset;
    +    }
    +
    +}
    
  3. This has been tremendously helpful, as recently Postgres (> 9.4) decided to throw an error for non-utf-8 characters. With our legacy db, this would not work with JDBC insisting on UTF-8.

    Thanks again for the patch.

  4. Hi,

    Looks like when entering UTF-8 strings through this driver using preparedStatement.setString method, it is messing up the strings. Putting T?kia instead of Tëkia, e.g.,

    Any thoughts?

    Thanks.

  5. Well, it’s been 6 years since I worked on it, so I don’t remember too much 🙁 I only did SELECT * queries with patched driver (my use case was to mirror the DB to some place and work from there), so I didn’t even try prepared statements. I’d guess there’s a reason Postgres devs abandoned the option I restored with this patch. But I’m afraid there won’t be an easy solution for You unless You take JDBC driver’s source code, patch it (diff is in one on my previous comments) and debug it Yourself. It’s not that difficult once You start with it 🙂

Dodaj komentarz

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *