Opinion: Temporal Queries?

We have done a lot of meta-research on possible alternatives to our current z-curve temporal index. The research in the field is thorough (and long-standing), making the problem one of “knowing when to stop” more than deciding on the implementation itself.

In your own apps and services, how fancy do you anticipate your temporal queries becoming? For many XT users the answer is “not at all” – their queries are always performed with a default valid-time (application time) of now. In general, those users are happy just to have immutable records. However, many folks are look to XT precisely to solve their temporal query problems:

  • Will you need to join over time intervals to answer questions like “which insurance policies were within one month of expiration at some point during Q4 2021?”

  • Will you need to answer specifically bitemporal as-of questions like “how would we have executed this trade on 2021-08-23 given what I knew about the market on 2021-07-15?”

  • Will you need to perform a temporal join with an interval result, like “which known IP addresses were potentially exposed to corrupt data from machines affected by the log4j vulnerability (CVE-2021-44228) between 2021-10-01 and 2021-12-14, and for how long?”

  • What other funky questions about time can you come up with that your business might actually ask itself?

3 Likes

We definitely have some situations at work where we essentially have some self-healing data streams and, currently, we have no way to determine “what does X value look like across Y items on date Z (valid) but given data that we had available at date Z+N (transaction)?” – so that would be a useful query for us (and could encourage us to actually use XTDB for that data at least).

Most of our queries are point-in-time or across a range (pair) of date/time values, which we can mostly already do – but would likely be easier with XTDB.

2 Likes

To enable pushing updates one question we need to answer is “given a previous db and the current db what is the diff between the results of applying a query to each”. We’ve implemented this somewhat clunkily for one specific query, so it would be nice to have a good generic API for that.

4 Likes

+1 to a good generic api for that!

1 Like

“given a previous db and the current db what is the diff between the results of applying a query to each”

@ema @xlfe Out of curiosity, do either of you have an example from the business domain you’re working in? It’s easy to visualize the generic case but it’s still illustrative to have an example or two to talk about. :slight_smile:

In our DB modelling we have users which belong to organizations which have contracts (for a quantity of a product) which have orders (for a part of that quantity). In our app we have an overview page showing how much of each contract was already ordered. Generating the initial content of that page is a relatively simple query but to enable live updates we need to basically duplicate the logic of that query to see if a new DB document or new version of a DB document changes the result of that query.

2 Likes

one example that comes to mind is comments or notifications and being able to stream them to a client. when a client reconnects after a few days, being able to answer the question, what new notifications do we have to show them since a particular (tx-id) from a few days ago.

The un-optimised logic would be run the query at tx-id-old and xt-id-now, diff the two sets of xt/ids that represent the notification/comments,etc and pull-many on the difference.

Although, if the result of the query is large (which over time it would be with lots of previous notifications) then that is an expensive diff.

So perhaps a more efficient method would be to use the tx-log from tx-id-old and look for documents that are relevant (eg have :type :comment, etc)

4 Likes

I actually found xtdb because I went looking for a graph-database with time-queries for a specific usecase we have:

We want to model construction yards and (some of) their electric equipment. Think power-supplyes, wirering, excavators, etc. But everything is temporary: power-supplyes gets moved between construction yards. Machines enter, machines leave. The wirering between machines change. We want to model what equipment is at what yard when, and how it is connected. The latter requires graph queries of arbitrary depth, since machines are connected in a graph with many “hops”. So the kind of queries we want to perform are:

  • What machines got power from power-supply A at time T (so a fixed-time graph query).
  • What machines got power from power-supply A in January 2021 (so a timespan graph query, maybe it should return a list of timespan → result mappings?)
  • (slightly less important) How long has machine A been connected to power-supply B ?
3 Likes

This is very cool, @epa095 ! It’s not only interesting to hear what kinds of temporal domains folks are working in, it really helps inform our planning and prioritization to know what real-world uses cases are out there. We can dream up all sorts of scenarios, but knowing someone’s real business problem involves (say) a timespan graph query is very helpful. Thanks!

1 Like

Hi all -

Our use case will also involve some pretty complex time based queries. We track timelines of unit deployments over time, and also need to know what our understanding was at certain points in time.

“when in our understanding did this date fall within a deployment range”

“using only information added within this range, what did the world look like?”

Thanks for chiming in @nikonikoniko - you may be interested to hear that I have recently these been demonstrating solving these kinds of questions with our experimental Core2 engine and the SQL:2011 temporal query features:

1 Like

Hi Jeremy,
Thanks for this. I’ve looked a little into the core2 things and it seems
like it is sql syntax on top of the xtdb engine? I quite like the
datalog edn syntax.

I will definitely look closer, because form the example above the
“overlaps” clause looks fascinating. Also it looks like there is a (>
inst inst) comparison there which I could not figure out how to do in
datalog.

By the way - It seems like I will be diving into using xtdb extensively
at the university nonprofit I am working for. It seems like a very good
use for our specific problem - tracking claims military units locations
across time.

Is that forum the correct place to post questions I couldn’t find
answers to in the docs?

After looking more deeply into this I understand the ideas behind core2
a lot better now, and, that would be exactly what we would need. It
seems it is possible to replicate overlaps type queries in xtdb with a
(< unix-timestamp unix-timestamp) kind of approach, but reading your
queries from the playground shows how much easier it could be. We also
love the schemaless continuation in sql.

So it seems like sql:2011 was never truly implemented in any of those
databases? I looked in to the postgres plugins for it and they seemed
to not implement most of the ease of those queries.

Thanks again for your responses, and keep up the amazing work!

Cheers
– Niko

1 Like

So it seems like sql:2011 was never truly implemented in any of those
databases?

Indeed, we probably would never have embarked on building XTDB (or ‘Crux’ as it was named originally) if an existing OSS database did bitemporality well :slight_smile:

1 Like

What Cohesic does to currently work around our history api. GitHub - cohesic/cohesic-xtdb: Helper functions for the XTDB bitemporal database.

1 Like

Thanks for linking our lib. Our domain/functionality requires basically to UNDO changes. What the above library does is to get the entity state at a specific time and add it via new transaction (either put or delete) to XTDB’s current state.

1 Like

Sneak peak of the in-flight Datalog syntax: translate snodgrass-99 tests to Datalog by refset · Pull Request #649 · xtdb/core2 · GitHub

1 Like