Indexing and database internals

(I wrote this comment in response to a post that turned out to be spam, so am re-posting here in case anyone’s interest :sweat_smile:)

The pre-GA (General Availabilty) v2 primary index is still evolving, as is the metadata that allows for fast columnar-style pruning of scanned data. We’re currently working with a number of Design Partners to ensure that it can handle their workloads and scale confidently. For instance, we’re about to release some changes that will make processing very large volumes of time series data (i.e. events, rather than long-lived states) much more efficient.

You can read a summary of the design space and current objectives of the primary index here: xtdb/dev/doc/evaluating-index-strategies.adoc at main · xtdb/xtdb · GitHub

Over the longer term we will definitely explore more indexing strategies, potentially introducing secondary indexes (which are desirable for low-latency lookups and faster transactions wherever constraints are needed) as well as other columnar techniques (e.g. zone maps). Right now though our focus is on stability and getting v2 fully launched (by making sure our Design Partners are happy :slightly_smiling_face:)

Databases generally only evolve in response to well-defined workloads, and to that end it’s important for us to have a clear picture on: what kinds of data volumes are you working with in your domain currently, and what kinds of queries do you need to run?

Depending on the amount of resources you are willing to expend on making temporal (analytical) queries ‘real-time’, you may want to consider bringing in a separate dataflow query engine to augment XTDB’s interactive query paradigm (Materialize, Epsio, RisingWave etc.).

1 Like