Indexing and database internals

refset · 10 February 2025 09:56

(I wrote this comment in response to a post that turned out to be spam, so am re-posting here in case anyone’s interest )

The pre-GA (General Availabilty) v2 primary index is still evolving, as is the metadata that allows for fast columnar-style pruning of scanned data. We’re currently working with a number of Design Partners to ensure that it can handle their workloads and scale confidently. For instance, we’re about to release some changes that will make processing very large volumes of time series data (i.e. events, rather than long-lived states) much more efficient.

You can read a summary of the design space and current objectives of the primary index here: xtdb/dev/doc/evaluating-index-strategies.adoc at main · xtdb/xtdb · GitHub

Over the longer term we will definitely explore more indexing strategies, potentially introducing secondary indexes (which are desirable for low-latency lookups and faster transactions wherever constraints are needed) as well as other columnar techniques (e.g. zone maps). Right now though our focus is on stability and getting v2 fully launched (by making sure our Design Partners are happy )

Databases generally only evolve in response to well-defined workloads, and to that end it’s important for us to have a clear picture on: what kinds of data volumes are you working with in your domain currently, and what kinds of queries do you need to run?

Depending on the amount of resources you are willing to expend on making temporal (analytical) queries ‘real-time’, you may want to consider bringing in a separate dataflow query engine to augment XTDB’s interactive query paradigm (Materialize, Epsio, RisingWave etc.).

Topic		Replies	Views
Handling high-volume temporal queries in XTDB Users	1	30	6 February 2025
"full bitemporality" Users	1	214	2 April 2024
Could Someone Give me Advice on Optimization and Indexing in XTDB for Large-Scale Data Users v1	1	155	25 July 2024
Best version for use case: 1.0 vs 2.0 Users	2	102	5 March 2025
Time Series Data Users	1	548	12 January 2022

Indexing and database internals

Related topics