Since XTDB is document-based and the recommendation I’ve seen a few times is to group together data that changes at the same rate, I’m wondering how to handle aggregated data?
Example from work (where we use MySQL today):
We have a few
xxx_by_day tables in our database that we update with live aggregated data, i.e., when some event “A” happens, we increment
xxx_by_day.a_event for today’s date:
UPDATE xxx_by_day SET a_event = a_event + 1 WHERE aggregate_date = DATE(NOW())
This is an optimization so that we can get accurate “how many A’s per day” information very quickly – but this means that these tables may get many thousands of updates per day even tho’ each conceptual document only needs to have one entry per day.
All the raw data exists separately: every event that we aggregate also has a non-aggregated version: tables with one row per event (in general) but there are a large number of events per day so a live aggregate query across those tables is pretty slow.
What would be the recommended “XTDB way” to deal with this sort of thing?
We could probably switch to a daily cron job to aggregate the previous day’s data and perhaps a live aggregate query for the event data not already aggregated if that’s the “best” way with immutable documents.
And I guess I should also ask: is the answer different for XTDB v1 and v2?