I need implementation of PCA in Java. I am interested in finding something that's well documented, practical and easy to use. Any recommendations?
There are now a number of Principal Component Analysis implementations for Java.
Apache Spark: https://spark.apache.org/docs/2.1.0/mllib-dimensionality-reduction.html#principal-component-analysis-pca
SparkConf conf = new SparkConf().setAppName("PCAExample").setMaster("local");
try (JavaSparkContext sc = new JavaSparkContext(conf)) {
//Create points as Spark Vectors
List<Vector> vectors = Arrays.asList(
Vectors.dense( -1.0, -1.0 ),
Vectors.dense( -1.0, 1.0 ),
Vectors.dense( 1.0, 1.0 ));
//Create Spark MLLib RDD
JavaRDD<Vector> distData = sc.parallelize(vectors);
RDD<Vector> vectorRDD = distData.rdd();
//Execute PCA Projection to 2 dimensions
PCA pca = new PCA(2);
PCAModel pcaModel = pca.fit(vectorRDD);
Matrix matrix = pcaModel.pc();
}
//Create points as NDArray instances
List<INDArray> ndArrays = Arrays.asList(
new NDArray(new float [] {-1.0F, -1.0F}),
new NDArray(new float [] {-1.0F, 1.0F}),
new NDArray(new float [] {1.0F, 1.0F}));
//Create matrix of points (rows are observations; columns are features)
INDArray matrix = new NDArray(ndArrays, new int [] {3,2});
//Execute PCA - again to 2 dimensions
INDArray factors = PCA.pca_factor(matrix, 2, false);
Apache Commons Math (single threaded; no framework)
//create points in a double array
double[][] pointsArray = new double[][] {
new double[] { -1.0, -1.0 },
new double[] { -1.0, 1.0 },
new double[] { 1.0, 1.0 } };
//create real matrix
RealMatrix realMatrix = MatrixUtils.createRealMatrix(pointsArray);
//create covariance matrix of points, then find eigen vectors
//see https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues
Covariance covariance = new Covariance(realMatrix);
RealMatrix covarianceMatrix = covariance.getCovarianceMatrix();
EigenDecomposition ed = new EigenDecomposition(covarianceMatrix);
Note, Singular Value Decomposition, which can also be used to find Principal Components, has equivalent implementations.