Adding XTDB 2 to an existing clustered app deployment

I know folks have asked about XTDB v1 going to production but I want to specifically ask about your thoughts on XTDB v2 going to production, and specifically about adding XTDB v2 to an existing setup.

About our setup:

We have multiple backend processes – a primary API, OAuth identity/login servers, a billing application, a real-time chat application, and a few others. In QA and production, these are clustered (and stateless so requests can hit any server via the load balancer). We have a pair of Redis servers and a pair of MySQL servers in production, and the app does a huge amount of SQL via JDBC drivers (to ProxySQL running locally on each node, which in turn connects to the MySQL primary/secondary servers).

All those processes can run with pretty small heaps at the moment (mostly 512M, a few at 1.5G) and DB access is all over the wire but within a specific data center.

The way I’ve used XTDB so far, just to experiment with, is by adding the XTDB dependencies to my local process, starting a node, and just interacting with it. I realized I have no idea what a production setup of XTDB might look like and how best to interact with it.

What sort of general architectural guidance would you give for how to add XTDB v2 to a setup like this? I’d expect all processes to be interacting with XTDB and I’d like to be able to use both SQL (primarily, I suspect) and XTQL from all those processes. We have AWS/S3 in the mix too if that helps.

1 Like

Hey @seancorfield we will publish more documentation and guidance in due course, but in general we expect the best choice for most v2 users will be to deploy database clusters on servers indepedently of application servers. In this sense XTDB would act much more like MySQL, and not as an embeddable indexing and query subsystem (which is how numerous users have embraced XTDB v1 previously).

There are several factors driving this recommendation, but crucially it provides the most stable operational experience without the risk of any non-functional concerns of XTDB leaking internal implementation details into your application design and deployment setup.

For instance, as with most other database systems v2 becomes a lot more efficient given ample quantities of memory (relative to the size of data stored on disk), and this runs counter to current trends for deploying ~many lightweight application servers where each is using as little memory as possible. There are also many implications for deployment patterns based on how long it takes for a (necessarily) stateful database to update vs. a stateless application instance.

Coupling these concerns together may be the best choice under some certain circumstances, but the tradeoffs become complex to reason about. Also, given the increased power of the new query language & engine, more application logic is able to be ‘pushed down’ (or ‘sent to’) the database than would have been reasonably possible in v1.

Regardless, you should have no problem interacting with XTDB v2 remotely, via our Clojure client, across N database nodes.

I’d expect all processes to be interacting with XTDB and I’d like to be able to use both SQL (primarily, I suspect) and XTQL from all those processes. We have AWS/S3 in the mix too if that helps.

i.e. this scenario should be perfectly viable by running some dedicated XTDB servers alongside your application cluster

We won’t prevent embedded/in-process usage or refuse to offer support, but currently we expect that it will usually be the wrong choice for the average user.


Sounds like running XTDB on AWS in some configuration would be the right path for us then. I look forward to trialing that in 2024.

1 Like

running XTDB on AWS in some configuration would be the right path

It’s just a starting point for the time being (and we may yet offer something more formally via the marketplace), but this got merged today Getting started with XTDB on AWS | XTDB [edit: fixed the link, thanks!] :slightly_smiling_face:

Thanks. And that’s the latest XTDB v2 snapshot stuff?

I may try that out over the holidays…

I have the green light to stand up a test cluster at AWS next week – over the holidays – to look at what it might cost and how it works in practice.

I’ll note that the URL has moved to Getting started with XTDB on AWS | XTDB

What are y’all recommending for security? I am used to databases having username + password but I don’t see that sort of thing mentioned in the docs. Does that mean that XTDB is essentially available to “anyone” via HTTPS once this is set up?

(I’m new to AWS stuff in general)

That’s great – we’ll be keen to help you get up and running! If you’re looking for something (much) cheaper than MSK to test with then in addition to Confluent I think both Upstash and WarpStream look pretty interesting.

Does that mean that XTDB is essentially available to “anyone” via HTTPS once this is set up?

Yes that’s what the default xtdb-vpc template achieves, a public HTTP endpoint, there’s no authentication layer in the stack so far. Therefore you wouldn’t want to deploy that particular template for anything more than very basic end-to-end (minus security) testing.

Instead you probably want to either re-use an existing VPC (e.g. the one already used for your app cluster) or setup VPC peering to a dedicated private VPC used exclusively by the XTDB cluster. Unfortunately I’m not a CloudFormation expert so can’t illustrate beyond that for the moment. Maybe someone else will be around to chime in this side of January.

At this point I have no idea what MSK is used for by XTDB so…

Our app servers are currently in a data center, not AWS, so it sounds like we’d need to bridge from the DMZ there into a specific VPC at AWS?

This is purely a short-term test so public access isn’t a problem for what I’m doing next week, and part of the exercise is to learn about AWS stuff too.

The range of options for this is, unsurprisingly, quite large - this looks like a comprehensive list: Connect to Amazon VPC | AWS re:Post (note you can of course swap out the AWS VPN tech for similar alternatives).

I’m tempted to suggest that AWS Transit Gateway might be the best bet…but I’ve not used it before :slight_smile:

FWIW, I almost have an AWS cluster set up…

I’ve built it twice now and both times have gone fine until the last step: setting up the ECS piece mostly works but the ECSService never seems to complete (the other 10 resources get created just fine).

MSK took about 30 minutes, as expected in the notes.

How long should I wait for ECSService to complete? So far it’s taken well over an hour each time before I’ve given up and deleted the stack and tried again.

This SO post lists a whole bunch of ways this step can fail but I don’t know enough about AWS and templates to be able to figure out what to look for: Cloudformation template for creating ECS service stuck in CREATE_IN_PROGRESS - Stack Overflow