I have been trying to write an R script to query Impala database. Here is the query to the database:
select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA
When I run this query manually (read: outside the Rscript via impala-shell), I am able to get the table contents. However, when the same is tried via the R script, I get the following error:
[1] "HY000 140 [Cloudera][ImpalaODBC] (140) Unsupported query."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA'
closing unused RODBC handle 1
Why does the query fail when tried via R? and how do I fix this? Thanks in advance :)
Edit 1:
The connection script looks as below:
library("RODBC");
connection <- odbcConnect("Impala");
query <- "select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA";
data <- sqlQuery(connection,query);
You need to install the relevant drivers, please look at the following link
I had the same issue, all i had to do was update the ODBC drivers.
Also if you can update your odbcConnect with the username and password
connection <- odbcConnect("Impala");
to
connection <- odbcConnect("Impala", uid="root", pwd="password")