Deduplicating open-q queries

Hey, in reading through the docs I cam across the following regarding the use of open-q and was confused:

Note that results are returned as bags, not sets, so you may wish to deduplicate consecutive identical result tuples (e.g. using clojure.core/dedupe or similar).

  • Under what circumstances will results from open-q be duplicated?
  • What’s the recommended deduplication strategy?
  • Why is there a discrepancy between xt/q and xt/open-q beyond how the results are consumed/processed?

Hey @ian_sinn :slightly_smiling_face:

  • Under what circumstances will results from open-q be duplicated?

For instance, consider this example:

  (with-open [n (xt/start-node {})]
    (let [query '{:find [result]
                  :where [[(range 4) [x ...]]
                          [(even? x) result]]}]
      [(xt/q (xt/db n) query)
       (with-open [i (xt/open-q (xt/db n) query)]
         (into [] (iterator-seq i)))]))
;;=>   [#{[true] [false]} [[true] [false] [true] [false]]]
  • What’s the recommended deduplication strategy?

You can use the usual tool belt of tricks, e.g. (into #{} ...) for simplicity, or perhaps clojure.core/dedupe if you to work with transducers

(with-open [n (xt/start-node {})]
    (let [query '{:find [result]
                  :where [[(range 4) [x ...]]
                          [(even? x) result]]}]
      (with-open [i (xt/open-q (xt/db n) query)]
        (into #{} (iterator-seq i)))))
  • Why is there a discrepancy between xt/q and xt/open-q

The behaviour of xt/q is “correct” in the sense that Datalog always operates (and returns results) in terms of sets. xt/open-q exposes some of the implementation detail of how Datalog queries are executed, but it also enables various opportunities for lazy consumption and implementing complex algorithms efficiently, beyond the context of single query.

1 Like