Data loading is slow while using "Insert/Update" step in pentaho 4.4.0
I am using pentaho 4.4.0. While using the "Insert/Update" step in kettle the speed of the data load is too slow compared to mysql. This step will scan through the entire records in table before inserting. If the record exist it will do a update. So what shall be done to optimize the performance while doing "Insert/Update" . and the process speed is 4 r/s, so totally my records will be above 1 lakh... The process takes 2 and half hours to complete the entire process.
Based on your comments it sounds like you want the Merge rows (diff)
step followed by a Synchronize after merge
. Check the Pentaho wiki to see how these steps work.
Another thing that makes a big difference is how many of the rows result in an upsert vs how many total rows. If the number of rows resulting in writes is more than roughly 40%, @carexcer's last comment may be a better approach. If it's less, definitely try the Merge rows (diff)
step.
4 - 25 rows per second sounds way slow. Be sure the fields you marked as keys are indexed, whichever step you choose.
If most of the rows result in an upsert, you may be better off with a full refresh. If that's the case, check out the MySQL bulk loaders. Pentaho has both a batch and streaming bulk loader, though I don't know how good they are.