A question about RocksDB data files

So /var/lib/xtdb is the default location for data using the docker image?

That’s right yep, as defined here https://github.com/xtdb/xtdb/blob/f7102e542c735e7db00867a3992abf913e930e34/build/docker/Dockerfile#L7

I assume I can parallelize this? Though if batch input of 30K records is fast enough I may not have to. It is kinda important to preserve sequential transaction time, but it’s more of a nice to have for my PoC, if parallelizing populates it faster.

As per my reply on the other thread Parallelizing data loading, processing large query results - XT doesn’t have an explicit mechanism for parallel import. Sequential transaction time can’t be avoided using the public APIs. That said, if you really wanted to explore advanced custom options then there are some interesting possibilities in theory, e.g. see https://rockset.com/blog/optimizing-bulk-load-in-rocksdb/ - but hopefully the default serial performance is sufficient for now.

how do I handle a query result with possibly millions of records? Some kind of laziness and partitioning seems required here. What I’m likely to do is only ask for the IDs filtered by 2 criteria, and then spit out SQL update files

See again the link I shared on open-q on the other thread :slight_smile:

Big picture: I’m loading XTDB with all history, then loading a few SQL instances with the current state of relevant, filtered records (each one having its own filtered view of current state).

Any additional advice deeply appreciated!

Good to know, I think you’re on the right track, but will have a think whether there are any useful existing examples to consider. Let me know if I can help accelerate or unblock your evaluation somehow - I would be very happy to get on call sometime soon if that’s of interest.