Question: what is (or will be) the best/most performant way to query collections?
Use case: we have a “people” table. Each person can have one or more phone numbers with metadata (e.g. label) attached. We will always query by phone number, never metadata.
Here’s our first, preferred (and naive?) attempt:
(xt/q
my-node
'(->
(rel
[{:name "John Smith"
:phone-numbers [{:label "home" :phone "+15551111111"}
{:label "work" :phone "+15552222222"}
{:label "mobile" :phone "+15553333333"}]}
{:name "Jane Smith"
:phone-numbers [{:label "work" :phone "+15554444444"}
{:label "mobile" :phone "+15555555555"}]}]
[name phone-numbers])
(unnest {:phone-number phone-numbers})
(with {:phone (. phone-number phone)})
(where (= phone "+15554444444"))))
Unless I’m missing something, my current understanding is that every record in the table would need to be examined.
Other options we can see:
- storing phone numbers and metadata in separate fields (e.g.
{:phone-numbers ["+15551111111"] :metadata {"+15551111111" {:label "home"}}
) (this obviates thewith
step, but doesn’t seem terribly useful) - expanding phone numbers and metadata (e.g.
{:phone-0/phone "+15551111111" :phone-0/label "home"}
) (keeps queries… interesting!) - storing phone numbers in a separate table (e.g.
{:xt/id #uuid … :phone "+15551111111"}
(phone numbers aren’t unique, so we can’t use them for IDs)
I believe the last two options would provide for the best performance given the current state of XTDB; my understanding is that the planner can limit the number of “pages” scanned for top-level attributes.
However, our initial data modeling approach strikes me as ergonomically best, and would (could?) work well once secondary indexes make their debut.
Finally, this is a bit hypothetical; we’re not storing production data yet, but we want to be ready to hit the ground running!
Thanks!