I have an application where I need to remove older data from the database. I had planned on using ::xt/evict for that. However, after a few experiments I noticed that ::xt/evict actually grows the database size. Is that expected? How can I permanently remove data from the DB and reduce the physical size of the DB?
Hi @Robert_Martin - thanks for the question!
Eviction is not intended as a space reclamation feature although in some limited circumstances it may be usable in that regard.
An eviction operation has the effect of overwriting the relevant document in the doc-store with a ‘tombstone’ document, but this tombstone still takes up some space (~78 bytes) and the eviction operation on the tx-log also takes space (~109 bytes) - so your original document would have to be larger than ~187 bytes for it to start making any sort of sense (I was merely using nippy/freeze
in my REPL to get those numbers) . This isn’t factoring in the index-store at all, and there would be more saving there, but again some KV entries must be retained indefinitely. All that said, if you have MB-sized documents then it could be worthwhile.
The ‘better’ option is to decant the database into a new instance which can be done periodically using open-tx-log
and manually filtering out data that’s no longer relevant before calling submit-tx
against the fresh instance (i.e. an empty tx-log & doc-store pair).
Automated retention policies are not completely out of the question in the future however - what sort of retention policies would be useful for your use case? e.g. “all corrected versions older than 1 year”, “all entities older than 7 years”
I have two use cases.
- Some old documents should simply be disposed of.
- Older versions of corrected documents should also be disposed of.
The decanting technique is probably going to work best for me in the short term.
That’s good to know, thanks for the feedback! We could arguably offer extra high-level capabilities in the future to help out with the decanting strategy too You may benefit from taking a look at this (undocumented) ‘replicator’ module which can export the database to edn files xtdb/replicator.clj at master · xtdb/xtdb · GitHub