I'm trying to bulk insert into a MySQL db for a very large dataset and would love to use django's bulk_create
while ignoring duplicate errors.
Sample model:
class MyModel(models.Model):
my_id=models.IntegerField(primary_key=True)
start_time = models.DateTimeField()
duration = models.IntegerField()
......
description = models.CharField(max_length=250)
so far I have the following code (generic for all my models, I pass in a Model_instance() and [list of bulk_create objects]):
def insert_many(model, my_objects):
# list of ids where pk is unique
in_db_ids = model.__class__.objects.values_list(model.__class__._meta.pk.name)
if not in_db_ids:
# nothing exists, save time and bulk_create
model.__class__.objects.bulk_create(my_objects)
else:
in_db_ids_list = [elem[0] for elem in in_db_ids]
to_insert=[]
for elem in my_objects:
if not elem.pk in in_db_ids_list:
to_insert.append(elem)
if to_insert:
model.__class__.objects.bulk_create(to_insert)
Is there a way in django of doing this in order to avoid duplicates? mimicking MySQL's insert ignore
would be great. If I simply use bulk_create
(very fast), I get an error if there's a primary key duplicate and the insertion stops.
The ignore_conflicts parameter was added into bulk_create(Django 2.2)
and you can also find it in https://github.com/django/django/search?q=ignore_conflicts&unscoped_q=ignore_conflicts