How can we automate incremental import in SQOOP?

Devender Prakash picture Devender Prakash · Jan 13, 2016 · Viewed 8.6k times · Source

How can we automate the incremental import in SQoop ?

In incremental import, we need to give the --last-value to start the import from the last value onwards, but my job is to frequently import from RDBMS, I don't want to give last value manually, is there any way we can automate this process?

Answer

K S Nidhin picture K S Nidhin · Jan 13, 2016

An alternate approach to @Durga Viswanath Gadiraju answer.

In case you are importing the data to a hive table , you could query the last updated value from the hive table and pass the value to the sqoop import query. You could use shell script or oozie actions for achieving this.

Shell script :

lastupdatedvalue=`hive -e 'select last_value from table` #tweak the selection query based on the logic.

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${lastupdatedvalue}

Oozie approach :

  1. Hive action for the select query based on the logic to retrieve the last updated value .
  2. Sqoop action for incremental load from the captured output of previous hive action.

PFB a sudo workflow :

<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
<start to="hiveact"/>
<action name="hiveact">
    <hive xmlns="uri:oozie:hive-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <script>script.sql</script>
<capture-output/>
    </hive>    
    <ok to="sqoopact"/>
    <error to="kill"/>

<action name="sqoopact">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <command>import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${wf:actionData('hiveact')}</command>
     </sqoop>
    <ok to="end"/>
    <error to="kill"/>
</action>
<kill name="kill">
    <message>Action failed</message>
</kill>
<end name="end"/>

Hope this helps.