Hadoop connectivity with SAS

sudheer picture sudheer · Aug 21, 2013 · Viewed 7.7k times · Source

I want to use SAS/ACESS 9.3M2 Interface for connecting sas with my Hive. My question is, whether sas imports hive cubes into sas environment and queries there? or, It again hits hive for the purpose of reporting so that it runs MR which degrades my reporting performance to more than 2-4 secs.

If it imports hive tables to its environment what would be its performance when compared to normal sql cubes?

I am totally new to sas i want my reports generated with in 2-4 secs where my aggregated data is in Hive tables and then I have created cube dimensions over that.

Thanks...

Answer

vasja picture vasja · Aug 21, 2013

What SAS/ACCESS serves for is to: - provide you with ability to read data and write from/to a datasource, take care of data type conversions - provides metadata about a datastore (list of tables, fields, datatypes) - provide a mean to (also partially) translate (implicit pass-through) SAS code to datasource specific code (usually SQL variant etc) - provide a mean for you to write a datasource specific code and sent it from SAS for execution in datasource

I'm totally new to Hadoop :-) so I'll just guess that SAS/Access to Hadoop (via LIBNAME statement) reads relational data from Hadoop, the documentation mentions JDBC, so I guess that's used for data access. I'd doubt SAS/Access is able to query the cubes from Hadoop (is that your question? - "I have created cube dimensions over that" - meaning in Hadoop?).

Generally SAS/Access tries to minimize data transfers from datasources and tries to push the processing to the datasource.

From http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together:

SAS/ACCESS to Hadoop

SAS/ACCESS provides the ability to access data sets stored in Hadoop in SAS natively. With SAS/Access to Hadoop:

LIBNAME statements can be used to make Hive tables look like SAS data sets on top of which SAS Procedures and SAS DATA steps can interact.
PROC SQL commands provide the ability to execute direct Hive SQL commands on Hadoop.
PROC HADOOP provides the ability to directly submit MapReduce, Apache Pig, and HDFS commands from the SAS execution environment to your CDH cluster.

The SAS/ACCESS interface is available from the SAS 9.3M2 release and supports CDH 3U2 as well as CDH 4.01 and higher.

Also might be helpful PROC HADOOP at http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1esotuxnkbuepn1w443ueufw8in.htm