#35399: Reduce the "Case-When" sequence for a bulk_update when the values for a
certain field are the same.
-------------------------------------+-------------------------------------
               Reporter:  Willem     |          Owner:  nobody
  Van Onsem                          |
                   Type:  New        |         Status:  new
  feature                            |
              Component:  Database   |        Version:  5.0
  layer (models, ORM)                |       Keywords:  db, bulk_update,
               Severity:  Normal     |  case, when
           Triage Stage:             |      Has patch:  1
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Django's `.bulk_update(..)` seems to work with sequences of `Case-When`
 *per* field and *per* record to update. In other words, if we have three
 items we want to update with `pk` being 1, 4 and 5, and we have a field
 `yn` and the records for pk `1` and `5` have value `y`, whereas the one
 for `pk=4` it is `n`, then this will make a query that looks like:

 {{{
 UPDATE my_table
 SET yn=CASE WHEN id=1 THEN 'y' WHEN id=4 THEN 'n' WHEN id=5 THEN 'y' END
 WHERE id IN (1,4,5)
 }}}

 It thus builds a long sequence of `Case`-`When`s, which is not efficient.

 There are for most databases solutions with a temporary table that is then
 upserted into the main table. The Django ORM could probably use existent
 tools for this where we first make a temporary model, create it with the
 migration model, insert in bulk, then upsert with a query, and finally
 remove the temporary table. But a problem with this is, we might not have
 privileges to create an extra table in that database.

 But a low-hanging optimization is to first *group* the values together
 that we can group together. This can be done with a `defaultdict` that
 maps the value to a list of primary keys for that value. In case the value
 is not hashable, we can fallback on the original case-when logic. But for
 values like strings, this would reduce the query to:

 {{{
 UPDATE my_table
 SET yn=CASE WHEN id IN (1, 5) THEN 'y' WHEN id=4 THEN 'n' END
 WHERE id IN (1,4,5)
 }}}

 That being said, I think bulk_updates should still be done in a more
 efficient manner, perhaps moving it to the engines, since a lot of
 databases allow the creation of temporary tables to make upserts a lot
 more efficient.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/35399>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018f0b9ceb1d-0d14d9d3-bb8e-4f4b-adb3-9eafa6ac02af-000000%40eu-central-1.amazonses.com.

Reply via email to