I need to develop application that can be getting data from multiple data sources ( Oracle, Excel, Microsoft Sql Server
, and so on) using one SQL query
. For example:
SELECT o.employeeId, count(o.orderId)
FROM employees@excel e. customers@microsoftsql c, orders@oracle o
WHERE o.employeeId = e.employeeId and o.customerId = c.customerId
GROUP BY o.employeeId;
This sql and data sources must be changes dynamically by java program. My customers want to write and run sql-like query
from different database and storage in same time with group by, having, count, sum
and so on in web interface of my aplication. Other requirements is perfomance and light-weight.
I find this way to do it (and what drawbacks I see, please, fix me if I wrong):
Apache Spark (drawbacks: heavy solution, more better for BigData, slow if you need getting up-to-date informations without cached it in Spark),
Distributed queries in SQL server (Database link of Oracle, Linked server of Microsoft SQL Server, Power Query of Excel) - drawbacks: problem with change data sources dynamically by java program and problem with working with Excel,
Prestodb (drawbacks: heavy solution, more better for BigData),
Apache Drill (drawbacks: quite young solution, some problem with not latest odbc drivers and some bugs when working),
Apache Calcite (ligth framework that be used by Apache Drill, drawbacks: quite young solution yet),
Do join from data sources manually (drawbacks: a lot of work to develop correct join, "group by" in result set, find best execution plan and so on)
May be, do you know any other way (using free open-source solutions) or give me any advice from your experience about ways in above? Any help would be greatly appreciated.
UnityJDBC is a commercial JDBC Driver that wraps multiple datasoruces and allows you to treat them as if they were all part of the same database. It works as follows:
You define a "schema file" to describe each of your databases. The schema file resembles something like:
...
<TABLE>
<semanticTableName>Database1.MY_TABLE</semanticTableName>
<tableName>MY_TABLE</tableName>
<numTuples>2000</numTuples>
<FIELD>
<semanticFieldName>MY_TABLE.MY_ID</semanticFieldName>
<fieldName>MY_ID</fieldName>
<dataType>3</dataType>
<dataTypeName>DECIMAL</dataTypeName>
...
You also have a central "sources file" that references all of your schema files and gives connection information, and it looks like this:
<SOURCES>
<DATABASE>
<URL>jdbc:oracle:thin:@localhost:1521:xe</URL>
<USER>scott</USER>
<PASSWORD>tiger</PASSWORD>
<DRIVER>oracle.jdbc.driver.OracleDriver</DRIVER>
<SCHEMA>MyOracleSchema.xml</SCHEMA>
</DATABASE>
<DATABASE>
<URL>jdbc:sqlserver://localhost:1433</URL>
<USER>sa</USER>
<PASSWORD>Password123</PASSWORD>
<DRIVER>com.microsoft.sqlserver.jdbc.SQLServerDriver</DRIVER>
<SCHEMA>MySQLServerSchema.xml</SCHEMA>
</DATABASE>
</SOURCES>
You can then use unity.jdbc.UnityDriver
to allow your Java code to run SQL that joins across databases, like so:
String sql = "SELECT *\n" +
"FROM MyOracleDB.Whatever, MySQLServerDB.Something\n" +
"WHERE MyOracleDB.Whatever.whatever_id = MySQLServerDB.Something.whatever_id";
stmt.execute(sql);
So it looks like UnityJDBC provides the functionality that you need, however, I have to say that any solution that allows users to execute arbitrary SQL that joins tables across different databases sounds like a recipe to bring your databases to their knees. The solution that I would actually recommend to your type of requirements is to do ETL processes from all of your data sources into a single data warehouse and allow your users to query that; how to define those processes and your data warehouse is definitely too broad for a stackoverflow question.