In my application I need to massively improve insert performance. Example: A file with about 21K records takes over 100 min to insert. There are reasons it can takes some time, like 20 min or so but over 100 min is just too long.
Data is inserted into 3 tables (many-to-many). Id's are generated from a sequence but I have already googled and set hibernate.id.new_generator_mappings = true
and allocationSize + sequence increment to 1000.
Also the amount of data is not anything extraordinary at all, the file is 90 mb.
I have verified with visual vm that most of the time is spent in jdbc driver (postgresql) and hibernate. I think the issue is related to a unique constraint in the child table. The service layer makes a manual check (=SELECT) before inserting. If the record already exists, it reuses it instead of waiting for a constraint exception.
So to sum it up for the specific file there will be 1 insert per table (could be different but not for this file which is the ideal (fastest) case). That means total 60k inserts + 20k selects. Still over 100 min seems very long (yeah hardware counts and it is on a simple PC with 7200 rpm drive, no ssd or raid). However this is an improved version over a previous application (plain jdbc) on which the same insert on this hardware took about 15 min. Considering that in both cases about 4-5 min is spent on "pre-processing" the increase is massive.
Any tips who this could be improved? Is there any batch loading functionality?
see
spring-data JPA: manual commit transaction and restart new one
Add entityManager.flush()
and entityManager.clear()
after every n-th call to save() method. If you use hibernate add hibernate.jdbc.batch_size=100
which seems like a reasonable choice.
Performance increase was > 10x, probably close to 100x.