What is --direct mode in sqoop?

Raj picture Raj · Aug 25, 2016 · Viewed 17.5k times · Source

As per my understanding sqoop is used to import or export table/data from the Database to HDFS or Hive or HBASE.

And we can directly import a single table or list of tables. Internally mapreduce program (i think only map task) will run.

My doubt is what is sqoop direct and what when to go with sqoop direct option?

Answer

Samson Scharfrichter picture Samson Scharfrichter · Aug 25, 2016

Just read the Sqoop documentation!

  • General principles are located here for imports and there for exports

Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...)


Some databases provides a direct mode for exports as well (...)

Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25

Bottom line: "direct mode" means different things for different databases.
For MySQL or PostgreSQL it relates to bulk loader/unloader utilities (i.e. completetely bypassing JDBC); while for Oracle it relates to "direct path INSERT" i.e. with JDBC but in a non-transactional mode (so you'd better use a temp table, or you might end up with duplicates in a PK and a corrupt table).