Time Series vs. Bitemporal Explainer/Example

refset · 8 December 2024 18:34

I’m working on a new demo to explain the conceptual differences between time-series databases and XTDB (not discussing internals/performance here at all). Thought I’d share my notes…

Let’s initially imagine there are no built-in time features, and pretend that XTDB is just another update-in-place SQL database - this is what a typical time-series schema might look like:
INSERT INTO instruments (_id, symbol, ts, val) VALUES (1, ‘USD’, NOW, 100); - note that by default we see all the versions, so let’s limit to seeing the latest value
Now let’s use dates for the examples to make reading the ts column values a little easier, and also convey the idea of importing externally produced timestamps INSERT INTO instruments (_id, symbol, ts, val) VALUES (1, ‘USD’, DATE ‘2024-01-01’, 101);
For the rest of the demo, we’ll query for all versions in the time series at once
Add a reported_at column to convey the idea that the results may change each time we query for the time series, and we may need to track the ‘version’ of the results produced by the database
Instead of INSERTing the 3rd value at the end, what if it is actually meant to sit between the first two entries based on an out-of-order ts
What if we then discover some sort of mistake or error in that retroactively INSERTed entry, e.g. the value was wrong - we have to delete the old entry and insert a new one (with a new unique ID)
At this point, a typical SQL / time series system would be losing any sense of auditability with such changes going on (retroactivity + corrections), and every time you run such a query the results may be wildly different - luckily XTDB retains old versions, so even with this userspace time series model we can always see old versions and access a full audit history
But now let’s see what the best way to model this data in XTDB actually is, given time is first-class. Instead of using separate rows and IDs for the same time series, we can use the built-in versioning against the _id column, along with the ability to control _valid_from and _valid_to
Adding FOR ALL SYSTEM_TIME shows us the number of versions required to correctly decompose the full history of the data
AS OF syntax allows us to query the relevant versions - this can similarly be applied across many tables or even the entire query
We can use SYSTEM_TIME AS OF to avoid the need for worrying about versioning the results (e.g. via reported_at) and the repeatability/consistency across queries more generally

As a concluding remark, it’s worth appreciating that time series databases have evolved out from a mindset of ingestion performance, whereas the bitemporal model is all about correctness. To achieve the same level of correctness, consistency and overall flexibility in a time series database would likely require it to run a lot more slowly.

Topic		Replies	Views
Time Series Data Users	1	548	12 January 2022
New blog: "But bitemporality always introduces complexity!" Users	0	307	8 August 2023
Accidental bi-temporal correction Users	2	532	1 July 2022
Opinion: Temporal Queries? Dev	16	992	12 March 2023
Is Xtdb similar to Dolt database? Users	6	439	24 February 2024

Time Series vs. Bitemporal Explainer/Example

Related topics