Sqoop import Null string

Bagavathi picture Bagavathi · Oct 27, 2016 · Viewed 18.7k times · Source

The Null values are displayed as '\N' when a hive external table is queried.

Below is the sqoop import script:

sqoop import -libjars /usr/lib/sqoop/lib/tdgssconfig.jar,/usr/lib/sqoop/lib/terajdbc4.jar -Dmapred.job.queue.name=xxxxxx \ --connect jdbc:teradata://xxx.xx.xxx.xx/DATABASE=$db,LOGMECH=LDAP --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $user --password $pwd --query "

select col1,col2,col3 from $db.xxx

where \$CONDITIONS" \ --null-string '\N' --null-non-string '\N' \ --fields-terminated-by '\t' --num-mappers 6 \ --split-by job_number \ --delete-target-dir \ --target-dir $hdfs_loc

Please advise what change should be done to the script so that nulls are displayed as nulls when the external hive table is queried.

Answer

Bagavathi picture Bagavathi · Oct 28, 2016

Sathiyan- Below are my findings after many trials

  1. If (null string) property is not included during sqoop import, then NULLs are stored as [blank for integer columns] and [blank for string columns] in HDFS. 2.If the HIVE table on top of HDFS is queried, we would see [NULL for integer column] and [blank for String columns]
  2. If the (--null-string '\N') property is included during sqoop import, then NULLs are stored as ['\N' for both integer and string columns].
  3. If the HIVE table on top of HDFS is queried, we would see [NULL for both integer and string columns not '\N']