How to handle frequently updated "aggregate" tables?

Since XTDB is document-based and the recommendation I’ve seen a few times is to group together data that changes at the same rate, I’m wondering how to handle aggregated data?

Example from work (where we use MySQL today):

We have a few xxx_by_day tables in our database that we update with live aggregated data, i.e., when some event “A” happens, we increment xxx_by_day.a_event for today’s date: UPDATE xxx_by_day SET a_event = a_event + 1 WHERE aggregate_date = DATE(NOW())

This is an optimization so that we can get accurate “how many A’s per day” information very quickly – but this means that these tables may get many thousands of updates per day even tho’ each conceptual document only needs to have one entry per day.

All the raw data exists separately: every event that we aggregate also has a non-aggregated version: tables with one row per event (in general) but there are a large number of events per day so a live aggregate query across those tables is pretty slow.

What would be the recommended “XTDB way” to deal with this sort of thing?

We could probably switch to a daily cron job to aggregate the previous day’s data and perhaps a live aggregate query for the event data not already aggregated if that’s the “best” way with immutable documents.

And I guess I should also ask: is the answer different for XTDB v1 and v2?

Yep, I’d say these are all reasonable approaches - my personal choice would depend on how many events people have, how often they make the aggregate queries, and their performance/consistency expectations of those queries.

I might also be tempted to use a timestamp-based UUID for the xt/id in this case (or a squuid from Datomic) so that the related documents are co-located in the primary index

Eventually I’d like XT to have better support for dealing with this kind of derived data so that it can be queried more efficiently, but I suspect that’s a fair way off yet.

And I guess I should also ask: is the answer different for XTDB v1 and v2?

Probably not, in this case!



That’s a very useful insight, thank you!