Opinion: Temporal Queries?

deobald · 14 December 2021 15:55

We have done a lot of meta-research on possible alternatives to our current z-curve temporal index. The research in the field is thorough (and long-standing), making the problem one of “knowing when to stop” more than deciding on the implementation itself.

In your own apps and services, how fancy do you anticipate your temporal queries becoming? For many XT users the answer is “not at all” – their queries are always performed with a default valid-time (application time) of now. In general, those users are happy just to have immutable records. However, many folks are look to XT precisely to solve their temporal query problems:

Will you need to join over time intervals to answer questions like “which insurance policies were within one month of expiration at some point during Q4 2021?”
Will you need to answer specifically bitemporal as-of questions like “how would we have executed this trade on 2021-08-23 given what I knew about the market on 2021-07-15?”
Will you need to perform a temporal join with an interval result, like “which known IP addresses were potentially exposed to corrupt data from machines affected by the log4j vulnerability (CVE-2021-44228) between 2021-10-01 and 2021-12-14, and for how long?”
What other funky questions about time can you come up with that your business might actually ask itself?

seancorfield · 14 December 2021 21:06

We definitely have some situations at work where we essentially have some self-healing data streams and, currently, we have no way to determine “what does X value look like across Y items on date Z (valid) but given data that we had available at date Z+N (transaction)?” – so that would be a useful query for us (and could encourage us to actually use XTDB for that data at least).

Most of our queries are point-in-time or across a range (pair) of date/time values, which we can mostly already do – but would likely be easier with XTDB.

ema · 30 December 2021 10:43

To enable pushing updates one question we need to answer is “given a previous db and the current db what is the diff between the results of applying a query to each”. We’ve implemented this somewhat clunkily for one specific query, so it would be nice to have a good generic API for that.

xlfe · 3 January 2022 00:54

+1 to a good generic api for that!

deobald · 3 January 2022 18:01

“given a previous db and the current db what is the diff between the results of applying a query to each”

@ema @xlfe Out of curiosity, do either of you have an example from the business domain you’re working in? It’s easy to visualize the generic case but it’s still illustrative to have an example or two to talk about.

ema · 4 January 2022 10:40

In our DB modelling we have users which belong to organizations which have contracts (for a quantity of a product) which have orders (for a part of that quantity). In our app we have an overview page showing how much of each contract was already ordered. Generating the initial content of that page is a relatively simple query but to enable live updates we need to basically duplicate the logic of that query to see if a new DB document or new version of a DB document changes the result of that query.

xlfe · 5 January 2022 05:15

one example that comes to mind is comments or notifications and being able to stream them to a client. when a client reconnects after a few days, being able to answer the question, what new notifications do we have to show them since a particular (tx-id) from a few days ago.

The un-optimised logic would be run the query at tx-id-old and xt-id-now, diff the two sets of xt/ids that represent the notification/comments,etc and pull-many on the difference.

Although, if the result of the query is large (which over time it would be with lots of previous notifications) then that is an expensive diff.

So perhaps a more efficient method would be to use the tx-log from tx-id-old and look for documents that are relevant (eg have :type :comment, etc)

epa095 · 14 June 2022 11:03

I actually found xtdb because I went looking for a graph-database with time-queries for a specific usecase we have:

We want to model construction yards and (some of) their electric equipment. Think power-supplyes, wirering, excavators, etc. But everything is temporary: power-supplyes gets moved between construction yards. Machines enter, machines leave. The wirering between machines change. We want to model what equipment is at what yard when, and how it is connected. The latter requires graph queries of arbitrary depth, since machines are connected in a graph with many “hops”. So the kind of queries we want to perform are:

What machines got power from power-supply A at time T (so a fixed-time graph query).
What machines got power from power-supply A in January 2021 (so a timespan graph query, maybe it should return a list of timespan → result mappings?)
(slightly less important) How long has machine A been connected to power-supply B ?

deobald · 17 June 2022 00:32

This is very cool, @epa095 ! It’s not only interesting to hear what kinds of temporal domains folks are working in, it really helps inform our planning and prioritization to know what real-world uses cases are out there. We can dream up all sorts of scenarios, but knowing someone’s real business problem involves (say) a timespan graph query is very helpful. Thanks!

nikonikoniko · 25 October 2022 01:14

Hi all -

Our use case will also involve some pretty complex time based queries. We track timelines of unit deployments over time, and also need to know what our understanding was at certain points in time.

“when in our understanding did this date fall within a deployment range”

“using only information added within this range, what did the world look like?”

refset · 26 October 2022 10:26

Thanks for chiming in @nikonikoniko - you may be interested to hear that I have recently these been demonstrating solving these kinds of questions with our experimental Core2 engine and the SQL:2011 temporal query features:

github.com

xtdb/core2-playground/blob/cb8568cc671a9b583ecb051cc2a96a25dca2b5e2/bitemporal/snodgrass-99.sql#L631-L645


      
          -- Case 8: Application-time sequenced and system-time nonsequenced
          -- When was it recorded that a property is or was owned by the customer who
          -- owned at the same time property 7797?
          
          
SELECT P2.property_number,
                 GREATEST(P1.application_time_start, P2.application_time_start) AS VT_Begin,
                 LEAST(P1.application_time_end, P2.application_time_end) AS VT_End,
                 P2.system_time_start AS Recorded_Start
          FROM Prop_Owner FOR ALL SYSTEM_TIME FOR ALL APPLICATION_TIME AS P1,
               Prop_Owner FOR ALL SYSTEM_TIME FOR ALL APPLICATION_TIME AS P2
          WHERE P1.property_number = 7797
            AND P2.property_number <> P1.property_number
            AND P1.customer_number = P2.customer_number
            AND P1.APPLICATION_TIME OVERLAPS P2.APPLICATION_TIME
            AND P1.SYSTEM_TIME CONTAINS PERIOD(P2.system_time_start, P2.system_time_start);

nikonikoniko · 26 October 2022 12:23

Hi Jeremy,
Thanks for this. I’ve looked a little into the core2 things and it seems
like it is sql syntax on top of the xtdb engine? I quite like the
datalog edn syntax.

I will definitely look closer, because form the example above the
“overlaps” clause looks fascinating. Also it looks like there is a (>
inst inst) comparison there which I could not figure out how to do in
datalog.

By the way - It seems like I will be diving into using xtdb extensively
at the university nonprofit I am working for. It seems like a very good
use for our specific problem - tracking claims military units locations
across time.

Is that forum the correct place to post questions I couldn’t find
answers to in the docs?

nikonikoniko · 26 October 2022 14:38

After looking more deeply into this I understand the ideas behind core2
a lot better now, and, that would be exactly what we would need. It
seems it is possible to replicate overlaps type queries in xtdb with a
(< unix-timestamp unix-timestamp) kind of approach, but reading your
queries from the playground shows how much easier it could be. We also
love the schemaless continuation in sql.

So it seems like sql:2011 was never truly implemented in any of those
databases? I looked in to the postgres plugins for it and they seemed
to not implement most of the ease of those queries.

Thanks again for your responses, and keep up the amazing work!

Cheers
– Niko

refset · 26 October 2022 16:15

So it seems like sql:2011 was never truly implemented in any of those
databases?

Indeed, we probably would never have embarked on building XTDB (or ‘Crux’ as it was named originally) if an existing OSS database did bitemporality well

FiV0 · 14 February 2023 17:00

What Cohesic does to currently work around our history api. GitHub - cohesic/cohesic-xtdb: Helper functions for the XTDB bitemporal database.

arichiardi · 14 February 2023 18:24

Thanks for linking our lib. Our domain/functionality requires basically to UNDO changes. What the above library does is to get the entity state at a specific time and add it via new transaction (either put or delete) to XTDB’s current state.

refset · 12 March 2023 23:03

Sneak peak of the in-flight Datalog syntax: translate snodgrass-99 tests to Datalog by refset · Pull Request #649 · xtdb/core2 · GitHub

Topic		Replies	Views
"full bitemporality" Users	1	213	2 April 2024
[ANN] 1.22.1 is out now! Users release	0	331	27 October 2022
Time Series vs. Bitemporal Explainer/Example Users	0	123	8 December 2024
Is Xtdb similar to Dolt database? Users	6	428	24 February 2024
Creation and modification time Users	3	392	4 May 2023

Opinion: Temporal Queries?

Related topics