Opinion: Temporal Queries?

We have done a lot of meta-research on possible alternatives to our current z-curve temporal index. The research in the field is thorough (and long-standing), making the problem one of “knowing when to stop” more than deciding on the implementation itself.

In your own apps and services, how fancy do you anticipate your temporal queries becoming? For many XT users the answer is “not at all” – their queries are always performed with a default valid-time (application time) of now. In general, those users are happy just to have immutable records. However, many folks are look to XT precisely to solve their temporal query problems:

  • Will you need to join over time intervals to answer questions like “which insurance policies were within one month of expiration at some point during Q4 2021?”

  • Will you need to answer specifically bitemporal as-of questions like “how would we have executed this trade on 2021-08-23 given what I knew about the market on 2021-07-15?”

  • Will you need to perform a temporal join with an interval result, like “which known IP addresses were potentially exposed to corrupt data from machines affected by the log4j vulnerability (CVE-2021-44228) between 2021-10-01 and 2021-12-14, and for how long?”

  • What other funky questions about time can you come up with that your business might actually ask itself?

1 Like

We definitely have some situations at work where we essentially have some self-healing data streams and, currently, we have no way to determine “what does X value look like across Y items on date Z (valid) but given data that we had available at date Z+N (transaction)?” – so that would be a useful query for us (and could encourage us to actually use XTDB for that data at least).

Most of our queries are point-in-time or across a range (pair) of date/time values, which we can mostly already do – but would likely be easier with XTDB.

2 Likes

To enable pushing updates one question we need to answer is “given a previous db and the current db what is the diff between the results of applying a query to each”. We’ve implemented this somewhat clunkily for one specific query, so it would be nice to have a good generic API for that.

4 Likes

+1 to a good generic api for that!

1 Like

“given a previous db and the current db what is the diff between the results of applying a query to each”

@ema @xlfe Out of curiosity, do either of you have an example from the business domain you’re working in? It’s easy to visualize the generic case but it’s still illustrative to have an example or two to talk about. :slight_smile:

In our DB modelling we have users which belong to organizations which have contracts (for a quantity of a product) which have orders (for a part of that quantity). In our app we have an overview page showing how much of each contract was already ordered. Generating the initial content of that page is a relatively simple query but to enable live updates we need to basically duplicate the logic of that query to see if a new DB document or new version of a DB document changes the result of that query.

2 Likes

one example that comes to mind is comments or notifications and being able to stream them to a client. when a client reconnects after a few days, being able to answer the question, what new notifications do we have to show them since a particular (tx-id) from a few days ago.

The un-optimised logic would be run the query at tx-id-old and xt-id-now, diff the two sets of xt/ids that represent the notification/comments,etc and pull-many on the difference.

Although, if the result of the query is large (which over time it would be with lots of previous notifications) then that is an expensive diff.

So perhaps a more efficient method would be to use the tx-log from tx-id-old and look for documents that are relevant (eg have :type :comment, etc)

4 Likes

I actually found xtdb because I went looking for a graph-database with time-queries for a specific usecase we have:

We want to model construction yards and (some of) their electric equipment. Think power-supplyes, wirering, excavators, etc. But everything is temporary: power-supplyes gets moved between construction yards. Machines enter, machines leave. The wirering between machines change. We want to model what equipment is at what yard when, and how it is connected. The latter requires graph queries of arbitrary depth, since machines are connected in a graph with many “hops”. So the kind of queries we want to perform are:

  • What machines got power from power-supply A at time T (so a fixed-time graph query).
  • What machines got power from power-supply A in January 2021 (so a timespan graph query, maybe it should return a list of timespan → result mappings?)
  • (slightly less important) How long has machine A been connected to power-supply B ?
3 Likes

This is very cool, @epa095 ! It’s not only interesting to hear what kinds of temporal domains folks are working in, it really helps inform our planning and prioritization to know what real-world uses cases are out there. We can dream up all sorts of scenarios, but knowing someone’s real business problem involves (say) a timespan graph query is very helpful. Thanks!

1 Like