Call R from JAVA to get Chi-squared statistic and p-value

user1830307 picture user1830307 · Apr 15, 2013 · Viewed 12.2k times · Source

I have two 4*4 matrices in JAVA, where one matrix holds observed counts and the other expected counts.

I need an automated way to calculate the p-value from the chi-square statistic between these two matrices; however, JAVA has no such function as far as I am aware.

I can calculate the chi-square and its p-value by reading the two matrices into R as .csv file formats, and then using the chisq.test function as follows:

obs<-read.csv("obs.csv")
exp<-read.csv("exp.csv")
chisq.test(obs,exp)

where the format of the .csv files would as follows:

A, C, G, T
A, 197.136, 124.32, 63.492, 59.052
C, 124.32, 78.4, 40.04, 37.24
G, 63.492, 40.04, 20.449, 19.019
T, 59.052, 37.24, 19.019, 17.689

Given these commands, R will give an output of the format:

X-squared = 20.6236, df = 9, p-value = 0.01443

which includes the p-value I was looking for.

Does anyone know of an efficient way to automate the process of:

1) Outputting my matrices from JAVA into .csv files 2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R 4) Returning the outputted p-value back into JAVA?

Thanks for any help....

Answer

Ciar&#225;n Tobin picture Ciarán Tobin · Apr 15, 2013

There are (at least) two ways of going about this.


Command Line & Scripts

You can execute Rscripts from the command line with Rscript.exe. E.g. in your script you would have:

# Parse arguments.
# ...
# ...

chisq.test(obs, exp)

Rather than creating CSVs in Java and having R read them, you should be able to pass them straight to R. I don't see the need to create CSVs and pass data that way, UNLESS your matrices are quite big. There are limitations on the size of command line arguments you can pass (varies across operating system I think).

You can pass arguments into Rscripts and parse them using the commandArgs() functions or with various packages (e.g. optparse or getopt). See this thread for more information.

There are several ways of calling and reading from the command line in Java. I don't know enough about it to give you advice but a bit of googling will give you a result. Calling a script from the command line is done like this:

Rscript my_script.R

JRI

JRI lets you talk to R straight from Java. Here's an example of how you would pass a double array to R and have R sum it (this is Java now):

// Start R session.
Rengine re = new Rengine (new String [] {"--vanilla"}, false, null);

// Check if the session is working.
if (!re.waitForR()) {
    return;
}

re.assign("x", new double[] {1.5, 2.5, 3.5});
REXP result = re.eval("(sum(x))");
System.out.println(result.asDouble());
re.end();

The function assign() here is the same as doing this in R:

x <- c(1.5, 2.5, 3.5)

You should be able to work out how to extend this to work with a matrix.


I think JRI is quite difficult at the beginning. So if you want to get this done quickly the command line option is probably best. I would say the JRI approach is less messy once you get it set up though. And if you have situations where you have a lot of back and forth between R and Java it is definitely better than calling multiple scripts.

  1. Link to JRI.
  2. Recommended Eclipse plugin to set up JRI.