Cannot Insert Unicode Using cx-Oracle

snarkyname77 picture snarkyname77 · Jan 9, 2013 · Viewed 7.7k times · Source

I am having an issue inserting unicode into an Oracle schema, I think the database is an Oracle 11g instance but am not certain at this point. I'm using python 2.6.1 on OS X 10.6.8 (this is the system verison of python) and am using the cx-Oracle driver module version 5.1 downloaded from sourceforge.net, built and installed to a virtualenv 1.6.1 instance with site packages visible. My script is as follows

  import cx_Oracle

  connection = cx_Oracle.connect(
      "<name>/<password>@<host>/<service-name>"
      )
  cursor = connection.cursor()
  result = cursor.execute(u"create table UNICODE_TEST (id NUMBER(6), text NCLOB not NULL)")

  raw_text = open("test.txt",'r').read()
  if isinstance(raw_text,str):
      raw_text = raw_text.decode("utf_8")

  statement = u"insert into UNICODE_TEST (id, text) values (1,'%s')" % raw_text
  result = cursor.execute(statement)

I create a connection, create the cursor, execute a statment to create a test table with an id and text field of types NUMBER and NCLOB. I open a file containing what I know to be text encoded in UTF-8, decode the string to unicode. Create an insert statment in a unicode string and execute that statement, and the result is this error.

  Traceback (most recent call last):
    File "unicode-test.py", line 19, in <module>
      result = cursor.execute(statement)
  UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 170: ordinal not in range(128)

Something is trying to encode my statement as ASCII before inserting it into the Oracle schema. So I started hunting around looking to better understand how cx-Oracle handles unicode and found this in the HISTORY.txt of the cx-Oracle source I downloaded from sourceforge.net

Changes from 5.0.4 to 5.1
1) Remove support for UNICODE mode and permit Unicode to be passed through in everywhere a string may be passed in. This means that strings will be passed through to Oracle using the value of the NLS_LANG environment variable in Python 3.x as well. Doing this eliminated a bunch of problems that were discovered by using UNICODE mode and also removed an unnecessary restriction in Python 2.x that Unicode could not be used in connect strings or SQL statements, for example. ...

My assumption is that the NLS_LANG environment variable is set to 'ascii' or some equivalent, so I try setting NLS_LANG to 'AL32UTF8' which I believe is the correct value for unicode, and set the new value before creating my connection.

  os.environ["NLS_LANG"] = "AL32UTF8"
  connection = cx_Oracle.connect(
      "<user>/<password>@<host>/<service-name>"
      )
  cursor = connection.cursor()
  ...

But I get this error.

  Traceback (most recent call last):
    File "unicode-test.py", line 11, in <module>
      "<user>/<password>@<host>/<service-name>"
  cx_Oracle.DatabaseError: ORA-12705: Cannot access NLS data files or invalid environment specified

So it looks like I cannot tamper with the NLS_LANG value.

Here are my questions as of now. Am I missing something simple like an incorrect column type? Is the problem with the cx-Oracle driver? Do I need to set the "WITH_UNICODE" environment variable when building the cx-Oracle module and how would I do that? Is the issue with the Oracle instance? I have little experience with Oracle and have never worked with Oracle and python together. I've spend two days working on this issue and would like a better understanding of what the issue is before I go to the DBA group with.

Thanks,

Answer

DecimusXIV picture DecimusXIV · Apr 4, 2013

Setting environment variable is the right way, but "AL32UTF8" is not the right value for NLS_LANG. To get the right value of the NLS_LANG used in your instance of Oracle, execute

SELECT USERENV ('language') FROM DUAL