What is the best way to do a left join on a foreign entity value?

Hi, what is the best way to do a left join on entity values on the right? for example, I’d like to join :txt/str with :var/name via :var/txt-id and return a null :var/name on the right when there is no match:

(with-open [node (xt/start-node {})]

  (xt/submit-tx node
                [[::xt/put {:xt/id -1
                            :var/name "abc"
                            :var/value 5
                            :var/txt-id -10}]

                 [::xt/put {:xt/id -10 :txt/str "$abc"}]
                 [::xt/put {:xt/id -11 :txt/str "xyz"}]])

  (xt/sync node)

  (xt/q (xt/db node)
        '{:find [?str ?var-name]
          :where [[?txt-id :txt/str ?str]
                  [?var-id :var/txt-id ?txt-id]
                  [?var-id :var/name ?var-name]]})
  ;; => #{["$abc" "abc"]}
  )

i.e. I would also like to get a nil value for the entry that does not match the right join (“xyz”), i.e. #{["xyz" nil] ["$abc" "abc"]}

The only way I could find of doing so is the following, but it negatively affects performance on large datasets:

(xt/q (xt/db node)
      '{:find [?str ?var-name]
        :where [[?txt-id :txt/str ?str]
                (or [?var-id :var/txt-id ?txt-id]
                    (and (not-join [?txt-id]
                                   [_ :var/txt-id ?txt-id])
                         [(identity nil) ?var-id]))
                [(get-attr ?var-id :var/name nil) [?var-name ...]]]})
;; => #{["xyz" nil] ["$abc" "abc"]}

Hey @ikappaki thanks for your patience and also for sharing a neat minimal example (always very helpful!) - I believe/hope this use of explicit rules with binding hints would be much faster although I’ve not attempted to validate that:

(xt/q (xt/db node)
        '{:find [?str ?var-name]
          :rules [[(left-join [a] b)
                   [b :var/txt-id a]]
                  [(left-join [a] b)
                   (not-join [a]
                             [_ :var/txt-id a])
                   [(identity nil) b]]]
          :where [[?txt-id :txt/str ?str]
                  (left-join ?txt-id ?var-id)
                  [(get-attr ?var-id :var/name nil) [?var-name]]]})

You can read more about how this works here: Datalog Queries · XTDB Docs

Essentially or (also or-join, not etc.) and named rules compile similarly under the hood into non-lazy subqueries, but named rules expose this [ ] hinting mechanism to control the binding scope. In contrast, the or in your second example will be executing and unifying across all the variables with multiple passes (to resolve everything simultaneously both ‘inside’ and ‘outside’ the or), by creating large cross-product intermediate relations that then get deduplicated.

EDIT:
I think this :limit 1 + subquery would be faster still if there were a lot of entries being iterated over inside the not-join

(xt/q (xt/db node)
        '{:find [?str ?var-name]
          :rules [[(left-join [a] b)
                   [b :var/txt-id a]]
                  [(left-join [a] b)
                   [(q {:find [a]
                        :in [a]
                        :where [[_ :var/txt-id a]]
                        :limit 1} a) res]
                   [(empty? res)]
                   [(identity nil) b]]]
          :where [[?txt-id :txt/str ?str]
                  (left-join ?txt-id ?var-id)
                  [(get-attr ?var-id :var/name nil) [?var-name]]]})

Due to the planning constraints requiring all attributes to be statically defined, there’s no way to make this left-join more generic as it stands, so you would have to generate different versions (with different names) as needed.

Thanks @refset, I’m able to follow and understand your explanation and suggestions.

A “left join” seems to me a fundamental operation (as far as SQL is concerned at least), should this be supported perhaps at the basic API level? Going to such lengths as above just to do a left join as the second suggestion seems a bit excessive. It appears get-attr is halfway there but only supports going from entity ides to values, could it be extended perhaps to support the path from values to entity ids as well?

Likewise the feedback is appreciated - thank you!

I believe it’s fair to say that the lineage of (edn) Datalog which XT has broadly followed intentionally doesn’t offer native relational operators beyond inner-join - e.g. see here: Datomic update examples against a social news database · GitHub

However the team here does agree that the expressiveness afforded by the wider range of SQL-like relational operators is worthwhile compared to having to write and maintain additional code that breaks the declarative abstraction (by mixing code and queries more than strictly necessary, particularly when writing Clojure is not an option).

As a consequence, in the upcoming work on the 2.x branch we have already added left-join along with other capabilities that bring Datalog up to parity with SQL: Add `left-join` to Datalog by jarohen · Pull Request #757 · xtdb/core2 · GitHub

It is certainly possible for us to add something similar to get-attr into 1.x for a future release, as you describe, but without a more comprehensive piece of design work it is hard to say whether it is the ‘right’ solution to explore.

Thanks @refset!

Just to confirm, considering that xtdb 2.x is currently in alpha and there is no definite timetable for a stable release in the near term, are you considering moving forward with supporting it in 1.x or do you think that the workaround discussed here is sufficient for the majority of the cases leading up to 2.x? and is there perhaps anything I can do to help, such as opening a feature request, in case of the first?

Good question, I think let’s at least open an ‘enhancement’ issue we can track on the project board and then we can move the conversation to the issue comments - I can raise it with the dev team here properly at that point. Would you like to do the honour of opening the “Left-join with reverse references is cumbersome” issue? :slight_smile: I’ll find some time in the next couple of days to write it up in any case, if you don’t beat me to it.

1 Like