Sqoop export using update key

Y.Prithvi picture Y.Prithvi · Sep 17, 2014 · Viewed 12.1k times · Source

I have to export HDFS file into MySql.
Let's say my HDFS file is:

1,abcd,23
2,efgh,24
3,ijkl,25
4,mnop,26
5,qrst,27

and say my Mysql database schema is:

+-----+-----+-------------+
| ID  | AGE |    NAME     |
+-----+-----+-------------+
|     |     |             |
+-----+-----+-------------+

When I'm inserting using following Sqoop command:

sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age"

It's working fine and inserting into database.

But, when I need to update already existing records I have to use --update-key and --columns.

Now, when I'm trying to update table using following command:

sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age" \
--update-key id

I'm facing issue like data is not updating into columns as specified in --columns

Am I doing anything wrong?

Can't we update database this way? HDFS file should be in Mysql schema only to update?

Is there any other way to achieve this?

Answer

Thiru picture Thiru · Jul 22, 2016

4b.Update data from HDFS into a table in a relational database

Create emp table tbl in mysql test db

create table emp
(
id int not null primary key,
name varchar(50)
);

vi emp --> create file with below contents

1,Thiru
2,Vikram
3,Brij
4,Sugesh

Move the file to hdfs

hadoop fs -put emp <dir>

Execute the below sqoop job to export the data to the mysql

sqoop export --connect <jdbc connection> \
--username sqoop \
--password sqoop \
--table emp \
--export-dir <dir> \
--input-fields-terminated-by ',';

Verify the data in the mysql table

mysql> select * from emp;

+----+--------+
| id | name   |
+----+--------+
|  1 | Thiru  |
|  2 | Vikram |
|  3 | Brij   |
|  4 | Sugesh |
+----+--------+

update the emp file & move the updated file into hdfs. contents of the updated file

1,Thiru
2,Vikram
3,Sugesh
4,Brij
5,Sagar

Sqoop export for upsert - Update if the key matches else insert.

sqoop export --connect <jdbc connection> \
--username sqoop \
--password sqoop \
--table emp \
--update-mode allowinsert \
--update-key id \
--export-dir <dir> \
--input-fields-terminated-by ',';

Note: --update-mode <mode> - we can pass two arguments "updateonly" - to update the records. this will update the records if the update key matches.
if you want to do upsert (If exists UPDATE else INSERT) then use "allowinsert" mode.
example: 
--update-mode updateonly \ --> for updates
--update-mode allowinsert \ --> for upsert

verify the results:

mysql> select * from emp;
+----+--------+
| id | name   |
+----+--------+
|  1 | Thiru  |
|  2 | Vikram |
|  3 | Sugesh |--> Previous value "Brij"
|  4 | Brij   |--> Previous value "Sugesh"
|  5 | Sagar  |--> new value inserted
+----+--------+