How to choose between Azure data lake analytics and Azure Databricks

Pragmatic picture Pragmatic · May 22, 2018 · Viewed 7.4k times · Source

Azure data lake analytics and azure databricks both can be used for batch processing. Could anyone please help me understand when to choose one over another?

Answer

wBob picture wBob · May 22, 2018

In my humble opinion, a lot of it comes down to existing skillsets. If you have a team experienced in Spark, Java, Python, r or Scala then Databricks is a natural fit. If on the other hand you have a team with existing SQL and c# skills, then the learning curve for them with U-SQL will be less steep.

That aside, there are other questions which can drive out differences:

  • Do you require realtime interaction (Databricks) or batch mode analytics (both)? Although there is a feedback item for real-time interactivity for U-SQL, please vote.
  • Do you want a pay-as-you-go model (U-SQL) or clusters with auto-terminate after a certain period (Databricks)?
  • Do you like working in a notebook (Databricks) or Visual Studio / VSCode / Powershell / .net sdk (U-SQL) method?
  • Do you want to use Spark libraries like GraphX (Databricks)?
  • Do you want the ability to run and scale any runtime (U-SQL)? See here for more details.
  • Do you want a local development emulator (U-SQL)? The U-SQL emulator in Visual Studio is seamless, ie you develop your code against your local drives in the same structure as your lake (for free), then simply click the drop-down in Visual Studio to run in the cloud. Although I think you can have a local Spark environment, I'm not sure what the local (and disconnected) development experience is for Databricks.
  • Are you using ADLS Gen 2 (only Databricks)? See here.

UPDATE October 2018: As far as I am aware, U-SQL does not currently support ADLS Gen 2, which would count against it (happy to be corrected). I will update the post if and when that support is added.

UPDATE January 2019: U-SQL has not had any meaningful updates since Spring 2018.

HTH