Could Someone Give me Advice on Optimization and Indexing in XTDB for Large-Scale Data

Hello there,

I have been using XTDB for a while now and have been impressed with its flexibility and capabilitie; especially in handling complex queries and rich data models. Although; as my dataset grows; I have noticed some performance issues; particularly with query execution times.

I am dealing with a large volume of data; and some queries that used to run quickly now take considerably longer.

I understand that XTDB’s indexing system plays a crucial role in query performance. Although; I am not entirely sure how to best optimize my queries and indexes to handle the increasing data load.

What are some best practices for indexing in XTDB? Are there specific types of indexes or configurations that work better for large datasets; particularly when dealing with frequent writes and complex queries?

How can I optimize my queries to be more efficient? Are there common pitfalls or techniques in query construction that I should be aware of?

What tools or methods do you recommend for monitoring and analyzing the performance of XTDB? How can I identify bottlenecks in my current setup?

Also, I have gone through this post; https://discuss.xtdb.com/t/scaling-out-and-going-to-production-with-xtdb-golang/ which definitely helped me out a lot.

How does resource allocation affect XTDB performance; and what are the best practices for configuring these resources?

Thank you in advance for your help and assistance. :innocent:

Hey @DomSalv thanks for your feedback and questions! In the general case you may want to considering re-shaping your data and any aggregation logic to happen more asynchronously, but for some kinds of querying and low-latency requirements even that strategy may not be sufficient to meet your business goals.

For context, XTDB’s (1.x) built-in indexes are mostly intended to support “graph” queries (i.e. large scale pattern matching) and simple range queries (i.e. retrieve the largest known value for some attribute) - if you need something more exotic then you would need to look at creating a custom secondary index like we have created with the ‘Lucene’ module. @jacobobryant very helpfully documented his process for doing ~this in a recent blog post here: Indexes pre-release

What tools or methods do you recommend for monitoring and analyzing the performance of XTDB? How can I identify bottlenecks in my current setup?

Typically when working with 1.x we would recommend getting comfortable with a JVM profiler (e.g. YourKit or GitHub - clojure-goes-fast/clj-async-profiler: Embedded high-precision Clojure profiler). A profiler can quickly reveal where time is spent without needing to be an expert in JVM internals or performance engineering.

Please feel free to share specific kinds of queries and examples of data here that you would like to make faster and we will do our best to answer.

I would also be happy to help look at your specific queries and options on a call next week if you can spare the time - feel free to send me an email if that’s of interest: jdt@juxt.pro

1 Like