Had some questions prompted by this thread:
We’re hoping that Kafka and Kafka-compatible services (e.g. things like Redpanda and Warpstream) are now ubiquitous enough that we don’t need to support further pluggability in the tx log implementation. Anything S3-compatible should be sufficient for object storage.
The postgres TX log option in V1 is nice since gives XTDB full “coverage” for any size of deployment. For V2, given your current roadmap, what would be your recommendations for a hypothetical developer who’s considering building a side business on, say, digitalocean but wants to avoid starting out with their managed kafka offering (minimum $147/month)?
Just use the filesystem
Maybe just stick with the filesystem TX log until the app is large enough to warrant kafka? This line of reasoning basically. Makes sense to me. The object store would still be managed at least.
The tx log is a Write-Ahead Log which means that there will ~always be some novelty stored there which doesn’t yet exist in the object store (and the delay may be minutes or even hours).
For this hypothetical, the potential for hours of data loss is quite possibly fine. Even so, maybe that could be mitigated further:
- can the filesystem tx log be backed up while the system is running?
- could the filesystem tx log be streamed to s3/something for backup, similar to sqlite + litestream?
- could there be a setting to force the tx log to be flushed to object storage more often, say every 5 minutes?
- or is a delay of hours something that would only occur for systems under high load? Maybe for this hypothetical, the delay would likely be small anyway and there’s no need to worry about any additional backup other than having the managed object store.
self-host redpanda
My only reservation about “Just use the filesystem” would be not having an escape hatch (other than paying a lot for managed kafka) in case there is some indie-developer scenario where it really would be best to have multiple servers. I learned about Redpanda for the first time about an hour ago; seems like that could be a good option here? E.g. set up a single-node cluster (apparently 2GB of ram is the minimum, which is cheaper than a DO managed postgres instance anyway) and ideally have it stream to S3 or something for backup, same as mentioned above. probably can just use their docker image.
Maybe for both this + the file system scenario, backing up the TX log could just be done in application space. I.e. if XT V2 has a listen
api, the application could use that to send TX log items somewhere.
give up
Maybe XTDB just isn’t a good fit for the solo developer use case. I don’t hold that opinion since the two options above both seem practical, but if the XTDB team ever comes to that opinion, that’s totally fine and I’d love to know sooner rather than later .
So yeah, would be interested in whatever thoughts you have on all that, anything I haven’t thought of, etc