I have a few XML files containing data for a research project which I need to run some statistics on. The amount of data is close to 100GB.
The structure is not so complex (could be mapped to perhaps 10 tables in a relational model), and given the nature of the problem, this data will never be updated again, I only need it available in a place where it's easy to run queries on.
I've read about XML databases, and the possibility of running XPATH-style queries on it, but I never used them and I'm not so comfortable with it. Having the data in a relational database would be my preferred choice.
So, I'm looking for a way to covert the data stored in XML into a relational database (think of a big .sql file similar to the one generated by mysqldump
, but anything else would do).
The ultimate goal is to be able to run SQL queries for crunching the data.
After some research I'm almost convinced I have to write it on my own. But I feel this is a common problem, and therefore there should be a tool which already does that.
So, do you know of any tool that would transform XML data into a relational database?
PS1:
My idea would be something like (it can work differently, but just to make sure you get my point):
PS2:
I've seen some posts here in SO but still I couldn't find a solution. Microsoft's "Xml Bulk Load" tool seems to do something in that direction, but I don't have a MS SQL Server.
Databases are not the only way to search data. I can highly recommend Apache Solr
Keep your raw data as XML and search it using the Solr index