#35399: Reduce the "Case-When" sequence for a bulk_update when the values for a
certain field are the same.
-------------------------------------+-------------------------------------
     Reporter:  Willem Van Onsem     |                    Owner:  nobody
         Type:                       |                   Status:  closed
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  5.0
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:  duplicate
     Keywords:  db, bulk_update,     |             Triage Stage:
  case, when                         |  Unreviewed
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Comment (by Willem Van Onsem):

 Well typically hashing should run linear with the data, so `Value(1)`
 should indeed take more time than `1`, but not dramatically more (by the
 way, it already hashes `Value(1)`, since it first checks the `if not
 hasattr(attr, "resolve_expression")` first, and wraps it into a `Value`,
 so that is where the current benmarks originate from. If we would have
 done `=Value(random.randint(0, 10))` for this benchmark, it would make no
 difference. From the moment it encounters a hash error, it sets the
 dictionary to `None`, and thus no longer hashes, and saves these cycles,
 it thus will stop looking for hashes if one of the items can not be
 hashed.

 But probably the main reason why it would be very strange that the hashing
 would increase time significantly is that in order to generate an SQL
 counterpart of some expression (like F('a') + Value(1)`, it will *also*
 take linear). So essentially *building* the SQL query with *all* `Case` /
 `When` items, will always take approximately the same effort as hashing
 all these items, since both run linear in the "size" of the SQL
 expressions (or at least, that is a reasonable assumption), so we will
 lose that effort anyway. It will thus probably at most ~double the amount
 of effort to generate the query, if there are (close) to no duplicate
 expressions available, and my experience is that building the query
 itself, often is not really the bottleneck: once to generate the hashes,
 and once to generate the query. If there are duplicate expressions, it
 saves also on generate parts of the SQL query, which again will probably
 not have much impact in a positive or negative way, since that is almost
 never the bottleneck.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/35399#comment:8>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018f119fb843-bec62755-44c4-44f4-b6bf-cc60c723adee-000000%40eu-central-1.amazonses.com.

Reply via email to