I have to export HDFS file into MySql.
Let's say my HDFS file is:
1,abcd,23
2,efgh,24
3,ijkl,25
4,mnop,26
5,qrst,27
and say my Mysql database schema is:
+-----+-----+-------------+
| ID | AGE | NAME |
+-----+-----+-------------+
| | | |
+-----+-----+-------------+
When I'm inserting using following Sqoop command:
sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age"
It's working fine and inserting into database.
But, when I need to update already existing records I have to use --update-key
and --columns
.
Now, when I'm trying to update table using following command:
sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age" \
--update-key id
I'm facing issue like data is not updating into columns as specified in --columns
Am I doing anything wrong?
Can't we update database this way? HDFS file should be in Mysql schema only to update?
Is there any other way to achieve this?
4b.Update data from HDFS into a table in a relational database
Create emp table tbl in mysql test db
create table emp
(
id int not null primary key,
name varchar(50)
);
vi emp --> create file with below contents
1,Thiru
2,Vikram
3,Brij
4,Sugesh
Move the file to hdfs
hadoop fs -put emp <dir>
Execute the below sqoop job to export the data to the mysql
sqoop export --connect <jdbc connection> \
--username sqoop \
--password sqoop \
--table emp \
--export-dir <dir> \
--input-fields-terminated-by ',';
Verify the data in the mysql table
mysql> select * from emp;
+----+--------+
| id | name |
+----+--------+
| 1 | Thiru |
| 2 | Vikram |
| 3 | Brij |
| 4 | Sugesh |
+----+--------+
update the emp file & move the updated file into hdfs. contents of the updated file
1,Thiru
2,Vikram
3,Sugesh
4,Brij
5,Sagar
Sqoop export for upsert - Update if the key matches else insert.
sqoop export --connect <jdbc connection> \
--username sqoop \
--password sqoop \
--table emp \
--update-mode allowinsert \
--update-key id \
--export-dir <dir> \
--input-fields-terminated-by ',';
Note: --update-mode <mode> - we can pass two arguments "updateonly" - to update the records. this will update the records if the update key matches.
if you want to do upsert (If exists UPDATE else INSERT) then use "allowinsert" mode.
example:
--update-mode updateonly \ --> for updates
--update-mode allowinsert \ --> for upsert
verify the results:
mysql> select * from emp;
+----+--------+
| id | name |
+----+--------+
| 1 | Thiru |
| 2 | Vikram |
| 3 | Sugesh |--> Previous value "Brij"
| 4 | Brij |--> Previous value "Sugesh"
| 5 | Sagar |--> new value inserted
+----+--------+