How to handle frequently updated "aggregate" tables?

seancorfield · 1 January 2024 22:39

Since XTDB is document-based and the recommendation I’ve seen a few times is to group together data that changes at the same rate, I’m wondering how to handle aggregated data?

Example from work (where we use MySQL today):

We have a few xxx_by_day tables in our database that we update with live aggregated data, i.e., when some event “A” happens, we increment xxx_by_day.a_event for today’s date: UPDATE xxx_by_day SET a_event = a_event + 1 WHERE aggregate_date = DATE(NOW())

This is an optimization so that we can get accurate “how many A’s per day” information very quickly – but this means that these tables may get many thousands of updates per day even tho’ each conceptual document only needs to have one entry per day.

All the raw data exists separately: every event that we aggregate also has a non-aggregated version: tables with one row per event (in general) but there are a large number of events per day so a live aggregate query across those tables is pretty slow.

What would be the recommended “XTDB way” to deal with this sort of thing?

We could probably switch to a daily cron job to aggregate the previous day’s data and perhaps a live aggregate query for the event data not already aggregated if that’s the “best” way with immutable documents.

And I guess I should also ask: is the answer different for XTDB v1 and v2?

jarohen · 4 January 2024 08:32

Yep, I’d say these are all reasonable approaches - my personal choice would depend on how many events people have, how often they make the aggregate queries, and their performance/consistency expectations of those queries.

I might also be tempted to use a timestamp-based UUID for the xt/id in this case (or a squuid from Datomic) so that the related documents are co-located in the primary index

Eventually I’d like XT to have better support for dealing with this kind of derived data so that it can be queried more efficiently, but I suspect that’s a fair way off yet.

And I guess I should also ask: is the answer different for XTDB v1 and v2?

Probably not, in this case!

Cheers,

James

seancorfield · 5 January 2024 06:18

That’s a very useful insight, thank you!

Topic		Replies	Views
(V2) Best way to handle frequent updates that might not contain any changes? Users v2	13	331	12 September 2024
Could Someone Give me Advice on Optimization and Indexing in XTDB for Large-Scale Data Users v1	1	134	25 July 2024
Time Series vs. Bitemporal Explainer/Example Users	0	121	8 December 2024
Update if entity has changed - feedback request Users v1	6	264	16 May 2024
Recommendations to keep queries performant? Users	1	351	26 November 2023

How to handle frequently updated "aggregate" tables?

Related topics