#35399: Reduce the "Case-When" sequence for a bulk_update when the values for a certain field are the same. -------------------------------------+------------------------------------- Reporter: Willem | Owner: nobody Van Onsem | Type: New | Status: new feature | Component: Database | Version: 5.0 layer (models, ORM) | Keywords: db, bulk_update, Severity: Normal | case, when Triage Stage: | Has patch: 1 Unreviewed | Needs documentation: 0 | Needs tests: 0 Patch needs improvement: 0 | Easy pickings: 0 UI/UX: 0 | -------------------------------------+------------------------------------- Django's `.bulk_update(..)` seems to work with sequences of `Case-When` *per* field and *per* record to update. In other words, if we have three items we want to update with `pk` being 1, 4 and 5, and we have a field `yn` and the records for pk `1` and `5` have value `y`, whereas the one for `pk=4` it is `n`, then this will make a query that looks like:
{{{ UPDATE my_table SET yn=CASE WHEN id=1 THEN 'y' WHEN id=4 THEN 'n' WHEN id=5 THEN 'y' END WHERE id IN (1,4,5) }}} It thus builds a long sequence of `Case`-`When`s, which is not efficient. There are for most databases solutions with a temporary table that is then upserted into the main table. The Django ORM could probably use existent tools for this where we first make a temporary model, create it with the migration model, insert in bulk, then upsert with a query, and finally remove the temporary table. But a problem with this is, we might not have privileges to create an extra table in that database. But a low-hanging optimization is to first *group* the values together that we can group together. This can be done with a `defaultdict` that maps the value to a list of primary keys for that value. In case the value is not hashable, we can fallback on the original case-when logic. But for values like strings, this would reduce the query to: {{{ UPDATE my_table SET yn=CASE WHEN id IN (1, 5) THEN 'y' WHEN id=4 THEN 'n' END WHERE id IN (1,4,5) }}} That being said, I think bulk_updates should still be done in a more efficient manner, perhaps moving it to the engines, since a lot of databases allow the creation of temporary tables to make upserts a lot more efficient. -- Ticket URL: <https://code.djangoproject.com/ticket/35399> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/0107018f0b9ceb1d-0d14d9d3-bb8e-4f4b-adb3-9eafa6ac02af-000000%40eu-central-1.amazonses.com.