Query OLAP Mondrian (MDX, XMLA) with a Python interface?

reyman64 picture reyman64 · Sep 25, 2010 · Viewed 10.7k times · Source

Actually I'm using R + Python with RPY2 to manipulate data and ggplot to create beautiful graphics.. I have some data in a PostgreSQL database, and I'm using psycopg2 to query data.

I'm starting a thesis, and in the future I need an OLAP cube to store my (very big) simulation data: multiple dimension, aggregation query, etc.

Is there any best or standard practice for interfacing between Python (and I want Python + R, no jpivot or some other dashboard in Java) and an OLAP engine like Mondrian? I searched on Google for any solution, and didn't I find anything.

I've briefly evaluated SQLAlchemy, and Django-ORM, but they have no MDX or XML/A interface to query an OLAP server (Mondrian or other) ...

Is it possible to write a query in MDX and, with psycopg + ODBC, query my OLAP server, and the OLAP server giving me an answer from my simulation data (no mapping on Python object, but it's OK for me)?

Update 1 :

Why do I need to search around OLAP + Mondrian technology ?

Because University of Laval (GeoSoa departements + Thierry Badard) wrote a spatial extension to OLAP: SOLAP, and implemented this in Mondrian as GeoMondrian. That interest me because I'm working on spatial multi agent based simulation ( ~= geosimulation).

The GeoSoa departement created an Ajax based component to communicate and visualize spatial data with GeoMondrian: SOLAPLAYERS, which can query a Mondrian server by its Xlma servlet.

Problem : probably slow in big data manipulation, need Internet or Apache 2. Briefly, it's only to visualize data or map ... In my case, I need raw data to make my own data manipulation + graphics with R: spatial analysis, regression analysis, rank-tail, etc. Here, SOLAP help me to prepare data for this later complex R analysis.

Why Python?

1 - Web access to spatial data -

I'm trying to use a "cool" Python framework, like GeoDjango or MapFish: big community in GIS, open-source, use GeoAlchemy to manipulate spatial query/data, include visualisation with JavaScript extensions and OpenLayers, etc.

2 - Local access to spatial data in GIS -

I want to create a plugin in QGIS (open source GIS) to access and visualize data, and QGIS plugin and API = Python.

3 - Automatic analysis of data -

A user or scientist runs a simulation with grid computing and choose automatic analysis (R + ggplot2 + MDX query) they want to run on this data. My goal here is to create a synthetic report of the simulation (graphic, tabular data, etc.).

So, after simulation, data go to OLAP/SOLAP cube, and many Python scripts (created by the user) get data with MDX, manipulate data with R + RPY2, and write and produce cool output for the scientist on doku-wiki or another community-platform.

Problem?

1 - Olap4j, the API core of Mondrian to communicate with an external component, is Java-made :/

2 - SOLAPLAYERS uses Ajax to access data, too slow for me.

3 - SQLAlchemy and GeoAlchemy have no driver connection to a multidimensional database (OLAP).

* Solution? *

1 - Py4j to access Java object or Java collection in olap4j with Python? Write my own function to access the Java mapped collection? => dangerous and not very easy?...

2 - XLMA with Ajax Mondrian server? It is too slow.

3 - Write my own py-connector to OLAP Mondrian ? => Ouch. It's an hard way, I think.

What should I do?

Answer

Julian Hyde picture Julian Hyde · Sep 30, 2010

I don't know python, but I am author of mondrian/olap4j.

If you can use py4j to access olap4j, great. If not, definitely consider XMLA. It may not be as slow as you think (unless python's XML parsing is slow). The biggest problem is the complexity of constructing SOAP requests and understanding the responses.

Julian