What is the fastest way to dump a large (> 1GB) XML file into a MySQL database?
The data in question is the StackOverflow Creative Commons Data Dump.
This will be used in an offline StackOverflow viewer I am building, since I am looking to do some studying/coding in places where I will not have access to the internet.
I would like to release this to the rest of the StackOverflow membership for their own use when the project is finished.
Originally, I was reading from XML/writing to DB one record at a time. This took about 10 hours to run on my machine. The hacktastic code I'm using now throws 500 records into an array, then creates an insertion query to load all 500 at once (eg. "INSERT INTO posts VALUES (...), (...), (...) ... ;
"). While this is faster, it still takes hours to run. Clearly this is not the best way to go about it, so I'm hoping the big brains on this site will know of a better way.
LOAD XML INFILE filename.xml
" are not usable in this project, as this feature is only available in MySQL 5.4 and above. This constraint is largely due to my hope that the project would be useful to people other than myself, and I'd rather not force people to use Beta versions of MySQL.MySql.Data
namespace is acceptable.Thanks for any pointers you can provide!
Ideas so far
stored procedure that loads an entire XML file into a column, then parses it using XPath
There are 2 parts to this:
For reading the xml file, this link http://csharptutorial.blogspot.com/2006/10/reading-xml-fast.html , shows that 1 MB can be read in 2.4 sec using stream reader, that would be 2400 seconds or 40 mins (if my maths is working this late) for 1 GB file.
From what I have read the fastest way to get data into MySQL is to use LOAD DATA.
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Therefore, if you can read the xml data, write it to files that can be used by LOAD DATA, then run LOAD DATA. The total time may be less than the hours that you are experiancing.