sqoop import multiple tables

Danny Westfall picture Danny Westfall · Jun 19, 2013 · Viewed 29.2k times · Source

We are using Cloudera CDH 4 and we are able to import tables from our Oracle databases into our HDFS warehouse as expected. The problem is we have 10's of thousands of tables inside our databases and sqoop only supports importing one table at a time.

What options are available for importing multiple tables into HDFS or Hive? For example what would be the best way of importing 200 tables from oracle into HDFS or Hive at a time?

The only solution i have seen so far is to create a sqoop job for each table import and then run them all individually. Since Hadoop is designed to work with large dataset it seems like there should be a better way though.

Answer

Kumar Reddy Basapuram picture Kumar Reddy Basapuram · May 21, 2014

U can use " import-all-tables " option to load all tables into HDFS at one time .

sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop  --target-dir '/Sqoop21/AllTables'

if we want to exclude some tables to load into hdfs we can use " --exclude-tables " option

Ex:

sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop  --target-dir '/Sqoop21/AllTables'  --exclude-tables <table1>,<tables2>

If we want to store in a specified directory then u can use " --warehouse-dir " option

Ex:

sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --warehouse-dir '/Sqoop'