hi team, I encountered some slow query and would like get some help.
I am running latest XTDB (1.0+) and populated around 1 million records into it. The record is like:
{'id':'0001_BBG', 'ticker':'0001', 'tag':'BBG', ... }
Among the 1 million records, there are only 3 possible values for ‘tag’.
Then when I query by ‘ticker’, it’s very fast, like 20ms. But when I query ‘ticker’ and ‘tag’ together, it takes 20 seconds.
I suspect it might be related with ‘tag’'s value, but I am not sure whether it’s a known issue. If it is, is there any work around?
p.s. I’m more than glad to provide any further information and repro if needed.
Thanks,
-BS
Hi @blshao84 good to hear from you!
It sounds like either the cardinality estimation is wrong/unexpected or the optimizer has simply picked the plan incorrectly. You can often use get-attr
as a workaround in these situations, e.g.
{:find [?e]
:where [[?e :ticker '0001']
[(get-attr ?e :tag) [?t]]
[(== ?t #{'BBG'}]]}
This works because e
must be resolved before the get-attr
clause can fire (i.e. the order is controlled and can’t be re-ordered by the planner).
{:find [(pull EqPrice [*])],
:where [
[EqPrice :com.map.data.mkt.EqPrice.ccy ccy]
[EqPrice :com.map.data.mkt.EqPrice.close1 close1]
[EqPrice :com.map.data.mkt.EqPrice.dataDate dataDate]
[EqPrice :com.map.data.mkt.EqPrice.tag tag]
[EqPrice :com.map.data.mkt.EqPrice.underlying underlying]
[EqPrice :xt/id xt/id]
[(== underlying "300827.SZ")]
[(== tag "WIND")]
], :in []}
Here’s my actual EDN query. I tried to change it to something like:
{:find [(pull EqPrice [*])],
:where [
[EqPrice :com.map.data.mkt.EqPrice.ccy ccy]
[EqPrice :com.map.data.mkt.EqPrice.close1 close1]
[EqPrice :com.map.data.mkt.EqPrice.dataDate dataDate]
[(get-attr EqPrice :com.map.data.mkt.EqPrice.tag) [tag]]
[EqPrice :com.map.data.mkt.EqPrice.underlying underlying]
[EqPrice :xt/id xt/id]
[(== underlying "300827.SZ")]
[(== tag "WIND")]
], :in []}
It seems working.
Thanks Jeremy~
1 Like
@refset Jeremy, by the way is there any significant performance penalty by adding get-attr? I am asking because in our use case we write a ORM layer ourselves and the EDN query is machine generated. Being said that, in order to improve the performance in above case, I’m considering adding an annotation for user’s to provide hint for such properties so that we could add ‘get-attr’ for each query like below:
data class EqPrice(val underlying:String, @enum val tag:String, ...)
Any thought?
Thanks,
-BS
get-attr
still uses the exact same indexes that the triple clause would use, so the I/O performance should be near enough identical. I believe there are some minor differences to how the intermediate result gets handled which might generate some kind of impact for extremely high cardinalities, but I suspect it will be imperceptible in the general case. Happy to help dig into that if it looks like an issue in practice for you though, of course
Adding an annotation so that get-attr
has an explicit mapping in your ORM makes sense
Thanks for the detailed explanation.
-BS
Jeremy Taylor via Discuss XTDB <notifications@xtdb.discoursemail.com>于2024年7月10日 周三21:14写道:
1 Like