Will large XML blobs get compressed?

The complete question from @arichiardi on the Clojurians Slack:

Is there any recommendation for storing a large (XML) payload in XTDB? Do I need extra compression or nippy will take care of it for me?
I don’t need to query it, it is basically just a blob to hydrate and store as is.

First I’ll answer the hard bit:

Is there any recommendation for storing a large (XML) payload in XTDB?

If you find it convenient and the storage volumes aren’t too extreme (i.e. both the size of an individual document and the database in aggregate are sufficiently small), then XT can definitely be used to store arbitrary blobs usefully and I wouldn’t immediately rule it out. It can be particularly handy when you are prototyping or otherwise trying to minimise the number of systems in your architecture.

However, it is important to understand that at non-trivial scales this kind of usage may not be cost effective compared to using raw blob storage (or other KV storage) and merely holding a pointer (e.g. URL) to the blob in XT. This is partly because XT currently does not employ any structural sharing (i.e. cross-document compression) in the document store, such that small changes to the same large blob values over time will cause a lot of nearly duplicate data to be written into the document store. This duplication could be relatively $expensive if the document store is, for example, backed by Postgres and not S3.

Perhaps more significantly though XT’s index-store (e.g. backed by RocksDB as the KV store) is monolithic (all data is local) and storing blob data will bloat its size more quickly, and therefore when storing blobs you will want to feel confident that the rate in growth of the KV store is low enough to avoid being too concerned about ongoing re-provisioning concerns, i.e. upgrading storage capacity in a year’s time. The disk used for this KV store may also be $expensive compared to generic object storage, but then it likely also has lower latency and avoids tranfser costs…so there plenty of tradeoffs to consider!

a large (XML) payload […] Do I need extra compression or nippy will take care of it for me?

Whether you store XML as a string or as some opaque binary blob, all that Nippy really does within XT is manage the most essential level of encoding (i.e. handling headers for known types + serialization). XT is not configured to use any of Nippy’s compression options. Instead XT relies entirely on the native compression facilities (if any) available in the underlying KV store (e.g. RocksDB compression) and document store implementations (e.g. Postgres TOAST) - also note that XT will not control the configuration options for these external systems, so activating some of the compression facilities may require extra configuration.

I hope that helps!

Thank you Jeremy, what you are saying totally makes sense and I am definitely considering using Postgres’ facilities for this blob.

I have also listened to what you were mentioning around metadata vs actual data storing in the first video Meetup, that helped as well.

1 Like