Opinions: Separate Storage and Compute?

Another heavily-researched topic over the past year has been the concept of “separating storage and compute.” We are experimenting with an architecture which would move XTDB away from shared-nothing nodes which each require a full copy of the entire database to a shared object store model where each node will only acquire (from the object store) as many chunks of data as is necessary to serve queries.

Our interest here is in serving customers who have requested this feature over the past few years. Large datasets of immutable records inherently consume a lot of disk. Separating storage and compute allows the object store to live on cheap disks while the compute nodes live on faster machines with less disk space.

Do you anticipate storing terabytes of data in your XTDB instances? Will this feature be useful to you? Or do you feel that the separation of storage and compute is a distraction? What would you rather see us focusing on?

I think this is a difficult questions to answer, because (at least in my experience) systems are architected to use the tools available. So xt currently requires all your indexes to fit on a node? Not a problem - don’t try and put trillions of records into it (use something designed for that!) XT currently does provide something that few - if any - other databases do. a sweet spot between a document database, with bitemporality, and graph traversal.

When I first moved from Datomic, I thought the answer to this questions was yes. Now, after a couple of very solid months using xt I think the answer might be no, or rather it depends (and also depends on the use case).

because xt (currently) keeps a full node-local index (lmdb/rocks/etc), you get near constant traversal across those indexes. if you move to a model where nodes cache only the chunks of the index they need (for the graph traversal use case) then you likely degrade performance until you have most of the graph cached locally.

are you considering having the new backend as a pluggable option?

for my use case, I’m planning on using a two tier storage system, whereby the reference data for the system (many millions of records) is stored in a local rocks/lmdb instance for tx log, document store (and of couse indexes), and then any writes (transactions or documents) are stored in a gcp datastore. all reads first attempt to read from the local tx/document store and then fallback to the remote gcp datastore. for my use case, this reference data is common across many deployments (each customer) but it would be expensive to have to duplicate all the reference data into the datastore as well.

as to your original question - terabytes of data? perphaps not. my current usecase specifically targets xtdb’s sweet spot (at the moment) which means I’m pretty happy with 1.0. For scale, we’ll just get bigger nodes or more storage (which is cheap!). if you can deliver what you describe without reducing another aspect that makes xtdb great. but it’s already great :slight_smile: so, if you can’t it won’t make it less useful than it already is!

2 Likes

Not really. The research and experimentation thus far has been in the shape of “usable spikes” which assume the SSaC model is global. SSaC, if implemented, would be quite a departure from XTDB’s existing storage approach and would land in a 2.x series.

Regarding XTDB’s “sweet spot” I’ve forked the thread over here: XTDB's Sweet Spot