I'm trying to run rhadoop on Cloudera's hadoop distro (I can't remember if its CDH3 or 4), and am running into an issue: Rstudio server doesn't seem to recognize my global variables.
In my /etc/profile.d/r.sh file, I have:
export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF=/usr/hadoop/conf
export HADOOP_CMD=/usr/bin/hadoop
export HADOOP_STREAMING=/usr/lib/hadoop-mapreduce/
When I run R from the terminal, I get:
> Sys.getenv("HADOOP_CMD")
[1] "usr/bin/hadoop"
But when I run Rstudio server:
> Sys.getenv("HADOOP_CMD")
[1] ""
And as a result, when I try to run rhdfs:
> library("rJava", lib.loc="/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15")
> library("rhdfs", lib.loc="/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15")
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: package/namespace load failed for 'rhdfs'
Does anyone know where I should be putting my enviornment variables if not in that specific r.sh file?
Thanks!
You should set your environment variables in .Renviron
or Renviron.site
. I think these files are defined under R_HOME/etc/Renviron.site
. You can get more information by typing:
> ?Startup
Someone had a similar issue here and this is what he did to solve it.