Encoding problems with ogr2ogr and Postgis/PostgreSQL database

Chau picture Chau · Sep 4, 2009 · Viewed 17k times · Source

In our organization, we handle GIS content in different file formats. I need to put these files into a PostGIS database, and that is done using ogr2ogr. The problem is, that the database is UTF8 encoded, and the files might have a different encoding.

I found descriptions of how I can specify the encoding by adding an options parameter to org2ogr, but appearantly it doesn't have an effect.

ogr2ogr -f PostgreSQL PG:"host=localhost user=username dbname=dbname \
password=password options='-c client_encoding=latin1'" sourcefile;

The error I recieve is:

ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "målsætning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe56c73
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "påvirkning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe57669
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

ERROR 1: INSERT command for new feature failed.
ERROR: invalid byte sequence for encoding "UTF8": 0xf8
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

Currently, my source file is a Shape file and I'm pretty sure, that it is Latin1 encoded.

What am I doing wrong here and can you help me?

Kind regards, Casper

Answer

Chau picture Chau · Sep 8, 2009

Magnus is right and I will discuss the solution here.

I have seen the option to inform PostgreSQL about character encoding, options=’-c client_encoding=xxx’, used many places, but it does not seem to have any effect. If someone knows how this part is working, feel free to elaborate.

Magnus suggested to set the environment variable PGCLIENTENCODING to LATIN1. This can, according to a mailing list I queried, be done by modifying the call to ogr2ogr:

ogr2ogr -–config PGCLIENTENCODING LATIN1 –f PostgreSQL 
PG:”host=hostname user=username dbname=databasename password=password” inputfile

This didn’t do anything for me. What worked for me was to, before the call to ogr2ogr, to:

SET PGCLIENTENCODING=LATIN1

It would be great to hear more details from experienced users and I hope it can help others :)