:in collection literal used in not clause

when I have a collection input variable being used in not clause, seems it only check one value in the collection. Below query still returns entities with attribute :tag8 with value tag8-9. If I change the arg to only has tag8-9, results exclude those. Appreciate for suggestions! Example query:

(xt/q (xt/db node) '{:find [(pull v? [*]) tag8?] :in [ [arg-tag8s-0 …] ] :where [[v? :x/type :v] [(get-attr v? :v/tag8 “”) [tag8? …]] (not [v? :v/tag8 arg-tag8s-0])] :limit 10} [“tag8-8” “tag8-9”])

Note: attribute value type is string

Reply from another user via Slack:

I played around with this for a minute and I think there are two issues: one is query will return a document if it has tags that do not overlap with the input vector (instead of checking for all), and the second issue is that it seems :limit 10 changes the semantics of the query somehow.
I changed the input to pass a set and am checking directly with a predicate. So this query returns all documents that contain none of the supplied tags:

   (xt/submit-tx xtdb-node (mapv #(do [::xt/put %])
                            [{:xt/id 1 :x/type :v :v/tag8 ["tag8-8" "tag8-9"]}
                             {:xt/id 2 :x/type :v :v/tag8 ["not-tag8-8" "another"]}
                             {:xt/id 3 :x/type :v :v/tag8 ["not-tag8-8" "second-tag"]}] ) )

  (xt/q (xt/db xtdb-node) '{:find  [v?]
                            :in    [arg-tag8s-0]
                            :where [[v? :x/type :v]
                                    [v? :v/tag8 ?tags]
                                    (not [(contains? arg-tag8s-0 ?tags)])]
                            ;:limit 10
                            }
    #{"tag8-8" "tag8-9"})
;; => #{[3] [2]}
;; if you include the :limit 10 however I get:
;; => [[2] [2] [3] [3]]
;; which is still not 1 at least.

My own assessment is in broad agreement - I agree that you want to use contains? here, since otherwise the not only needs to consider one of the arg-tag8s-0 input values.

The effects of :limit are briefly mentioned in the docs here:

:limit may be used in isolation, without :order-by , and will also return a bag of results that can contain duplicates.
[via https://docs.xtdb.com/language-reference/datalog-queries/#ordering-and-pagination]

Is it correct that triple clause makes better use of indexes than contains clause?

jarohen replied in slack: correct :slight_smile:

1 Like