#35399: Reduce the "Case-When" sequence for a bulk_update when the values for a certain field are the same. -------------------------------------+------------------------------------- Reporter: Willem Van Onsem | Owner: nobody Type: | Status: closed Cleanup/optimization | Component: Database layer | Version: 5.0 (models, ORM) | Severity: Normal | Resolution: duplicate Keywords: db, bulk_update, | Triage Stage: case, when | Unreviewed Has patch: 1 | Needs documentation: 0 Needs tests: 0 | Patch needs improvement: 0 Easy pickings: 0 | UI/UX: 0 -------------------------------------+------------------------------------- Comment (by Willem Van Onsem):
Well typically hashing should run linear with the data, so `Value(1)` should indeed take more time than `1`, but not dramatically more (by the way, it already hashes `Value(1)`, since it first checks the `if not hasattr(attr, "resolve_expression")` first, and wraps it into a `Value`, so that is where the current benmarks originate from. If we would have done `=Value(random.randint(0, 10))` for this benchmark, it would make no difference. From the moment it encounters a hash error, it sets the dictionary to `None`, and thus no longer hashes, and saves these cycles, it thus will stop looking for hashes if one of the items can not be hashed. But probably the main reason why it would be very strange that the hashing would increase time significantly is that in order to generate an SQL counterpart of some expression (like F('a') + Value(1)`, it will *also* take linear). So essentially *building* the SQL query with *all* `Case` / `When` items, will always take approximately the same effort as hashing all these items, since both run linear in the "size" of the SQL expressions (or at least, that is a reasonable assumption), so we will lose that effort anyway. It will thus probably at most ~double the amount of effort to generate the query, if there are (close) to no duplicate expressions available, and my experience is that building the query itself, often is not really the bottleneck: once to generate the hashes, and once to generate the query. If there are duplicate expressions, it saves also on generate parts of the SQL query, which again will probably not have much impact in a positive or negative way, since that is almost never the bottleneck. -- Ticket URL: <https://code.djangoproject.com/ticket/35399#comment:8> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/0107018f119fb843-bec62755-44c4-44f4-b6bf-cc60c723adee-000000%40eu-central-1.amazonses.com.