Help with xtdb queries including pull and order-by

I have a problem with datalog and need some help. Can I post the query and the error here so I can get some help?

If so, I want to order my query by a specific field.
The original query works fine without the “order-by” clause.

When I add it, I get the error below.

Query:

{:query {
			:find [(pull a [:xt/id :session/user :session/user-id :session/user-name
				   			:session/type :session/connection :session/verb :session/event-size
						    :session/start-date :session/end-date :session/dlp-count])]
			:in [org-id arg-user arg-type arg-conn arg-start-date arg-end-date]
			:where [[a :session/org-id org-id]
					[a :session/user-id usr-id]
					[a :session/type typ]
					[a :session/connection conn]
					[a :session/verb verb]
					[a :session/start-date start-date]
					[a :session/end-date end-date]
					(or [(= arg-user nil)]
						[(= usr-id arg-user)])
					(or [(= arg-type nil)]
						[(= typ arg-type)])
					(or [(= arg-conn nil)]
						[(= conn arg-conn)])
					(or [(= arg-start-date nil)]
						[(> start-date arg-start-date)])
					(or [(= arg-end-date nil)]
						[(< start-date arg-end-date)])]
			:order-by [[:session/start-date :desc]]
			:limit %v
			:offset %v}
		:in-args [%s]}

Error:

#:xtdb.error{:error-key :order-by-requires-find-element,
 :error-type :illegal-argument,
 :message
               "Order by requires an element from :find. unreturned element: [:form [:keyword :session/start-date]]"}

Looks like the order by does not work with the pull syntax…
Thanks

Hey @Daniel_Bonilha !

Always :smiley:

The issue here is that :order-by can only work with top-level entries from the :find vector (not nested keys/values), so the only valid ‘form’ you can use for :order-by here is the entire literal pull expression (which you need to repeat exactly), i.e.

:order-by [[(pull a [:xt/id :session/user :session/user-id :session/user-name
                     :session/type :session/connection :session/verb :session/event-size
                     :session/start-date :session/end-date :session/dlp-count]) :asc]]

…but this won’t produce a particularly meaningful ordering however and might not even sort at all due to the result maps being ~incomparable types (I’m not 100% sure without running it).

A possible workaround would be to wrap the current query in an outer query than can use a get-in expression on the result map to extract the nested value and then do :order-by on that. Something like:

{:find [m v]
 :where [[(q {:find ...
              :where ...}) m]
         [(get-in m [:a :b]) v]]
 :order-by [[v :asc]]}

Thanks for the quick reply.

The first approach, where I replicate all the pull expression, didn’t work. It returned the following error:

#:xtdb.error{:error-key :order-by-requires-find-element,
 :error-type :illegal-argument,
 :message
               "Order by requires an element from :find. unreturned element: [:form [:list ([:symbol pull] [:symbol a] [:vector [[:keyword :xt/id] [:keyword :session/user] [:keyword :session/user-id] [:keyword :session/user-name] [:keyword :session/type] [:keyword :session/connection] [:keyword :session/verb] [:keyword :session/event-size] [:keyword :session/start-date] [:keyword :session/end-date] [:keyword :session/dlp-count]]])]]"}

For the second approach, where I wrap the whole query in an outer query I get a malformed return:

{:error "Malformed \"application/edn\" request."}

Maybe I need to drop the pull syntax?

Hey again, if you can share the full queries each time that would definitely help to avoid confusion. As mentioned though I don’t really consider the “first approach” valid anyway - if you need to order-by based on the nested contents of the existing (working) pull then just do it on the application side as it’s possibly simpler all around.

However, the subquery really should work though, so if you can share the full query you used we may be able to spot the cause of the error.

If you’re using Clojure/Java and embedding XT then you should see a more useful error there instead, but I guess you’re using HTTP for good reasons - out of interest, what is your application language?

(post deleted by author)

Hey, sorry about not posting the full query. Probably I made a mistake somewhere:

{:query {
			 :find [m v]
			 :where [[(q {
				
				:find [(pull a [:xt/id :session/user :session/user-id :session/user-name
				   			:session/type :session/connection :session/verb :session/event-size
						    :session/start-date :session/end-date :session/dlp-count])]
				:in [org-id arg-user arg-type arg-conn arg-start-date arg-end-date]
				:where [[a :session/org-id org-id]
						[a :session/user-id usr-id]
						[a :session/type typ]
						[a :session/connection conn]
						[a :session/verb verb]
						[a :session/start-date start-date]
						[a :session/end-date end-date]
						(or [(= arg-user nil)]
							[(= usr-id arg-user)])
						(or [(= arg-type nil)]
							[(= typ arg-type)])
						(or [(= arg-conn nil)]
							[(= conn arg-conn)])
						(or [(= arg-start-date nil)]
							[(> start-date arg-start-date)])
						(or [(= arg-end-date nil)]
							[(< start-date arg-end-date)])]
				:limit %v
				:offset %v}

			 }) m]
					 [(get-in m [:a :b]) v]]
			 :order-by [[v :asc]]
		:in-args [%s]}

Anyhow, I think the wrapper query approach wont work because I need the ordering to be done in the inner query, otherwise the :limit and :offset wont work properly (for pagination).

Even sorting in the application I ended up getting wrong data. We use golang by the way, that’s why using http =)

Well, I ended up with the pull syntax trying to solve another problem. Maybe we can fix the original problem instead.

I had this working query, both sorting and pagination working very fine:

{:query {
			:find [id usr usr-id usr-name typ conn verb event-size start-date end-date]
			:keys [xt/id session/user session/user-id session/user-name
				   session/type session/connection session/verb session/event-size
				   session/start-date session/end-date]
			:in [org-id arg-user arg-type arg-conn arg-start-date arg-end-date]
			:where [[a :session/org-id org-id]
					[a :xt/id id]
					[a :session/user usr]
					[a :session/user-id usr-id]
					[a :session/user-name usr-name]
					[a :session/type typ]
					[a :session/connection conn]
					[a :session/verb verb]
					[a :session/event-size event-size]
					[a :session/start-date start-date]
					[a :session/end-date end-date]
					(or [(= arg-user nil)]
						[(= usr-id arg-user)])
					(or [(= arg-type nil)]
						[(= typ arg-type)])
					(or [(= arg-conn nil)]
						[(= conn arg-conn)])
					(or [(= arg-start-date nil)]
						[(> start-date arg-start-date)])
					(or [(= arg-end-date nil)]
						[(< start-date arg-end-date)])]
			:order-by [[start-date :desc]]
			:limit %v
			:offset %v}
		:in-args [%s]}

One day we introduced a new field: :session/dlp-count.

This broke the query. Now, only items with the new field were being fetched. Old items withou the field were not anymore. To fix it, I changed to pull syntax.

Here is the modified query that introduced the issue:

{:query {
			:find [id usr usr-id usr-name typ conn verb event-size start-date end-date dlp-count]
			:keys [xt/id session/user session/user-id session/user-name
				   session/type session/connection session/verb session/event-size
				   session/start-date session/end-date session/dlp-count]
			:in [org-id arg-user arg-type arg-conn arg-start-date arg-end-date]
			:where [[a :session/org-id org-id]
					[a :xt/id id]
					[a :session/user usr]
					[a :session/user-id usr-id]
					[a :session/user-name usr-name]
					[a :session/type typ]
					[a :session/connection conn]
					[a :session/verb verb]
					[a :session/event-size event-size]
					[a :session/start-date start-date]
					[a :session/end-date end-date]
					[a :session/dlp-count dlp-count]
					(or [(= arg-user nil)]
						[(= usr-id arg-user)])
					(or [(= arg-type nil)]
						[(= typ arg-type)])
					(or [(= arg-conn nil)]
						[(= conn arg-conn)])
					(or [(= arg-start-date nil)]
						[(> start-date arg-start-date)])
					(or [(= arg-end-date nil)]
						[(< start-date arg-end-date)])]
			:order-by [[start-date :desc]]
			:limit %v
			:offset %v}
		:in-args [%s]}

Any idea on the behaviour that caused the items without the field to stop coming?

I think the wrapper query approach wont work because I need the ordering to be done in the inner query, otherwise the :limit and :offset wont work properly (for pagination).

Ah right, yes that would be a problem :sweat_smile: but I think you could simply invert where the pull happens in that case? i.e. pull in the outer and order-by in the inner

On the topic of pagination more generally though, you may want to read this old discussion: Is walking raw indices foreseen? ¡ xtdb ¡ Discussion #1514 ¡ GitHub - depending on your performance requirements it may be worth some particularly careful thinking ahead here.

This is the expected behaviour for triple clauses - they necessarily filter for the presence of attributes on entities. To workaround this you can use the ‘get-attr’ lookup function like this instead of the triple clause: [(get-attr a :session/dlp-count) [dlp-count]] (note that get-attr also accepts an optional default value parameter after the end if there’s no stored value, otherwise will return nil)

Interesting - useful to know! Are you using GitHub - go-edn/edn: Go implementation of EDN (Extensible Data Notation) also?

Hey, the [(get-attr a :session/dlp-count) [dlp-count]] solution worked nicely. Thanks for clarifying.

We are using edn library, yes, instead of doing all over again what someone already did =)

Thanks again for the working solution!
Best regards!

1 Like

(post deleted by author)

(post deleted by author)