Do you have published data on Performance metrics?

twashing · 25 February 2023 23:12

I’m basically pondering these questions.

Time to write to document store + transaction log ?
Can indexer keep up with transaction writes ?
What’s the indexer performance ?

Or broken down.

writes: What’s the max docs/sec (per doc size)
indexer: What’s the latency of indexer @ write max docs/sec
reads: What’s the read performance per graph shape (triangle, suare, star, etc)

I can see the Performance benchmark suites here. But are results published anywhere?
https://docs.xtdb.com/resources/performance/

refset · 26 February 2023 18:16

Hey @twashing thanks for your questions!

I would be happy to share some ballpark numbers but a lot of variables have to be factored in beyond raw bytes/sec of the various components or even underlying hardware/network (application-dependent knowledge of document shapes, complexity of constraints, match write contention, etc.). As ever with these things its safest to measure and extrapolate for the given use case.

Using our typical benchmark setup (m5.xlarge with default RocksDB configs) during TPC-H bulk loading we consistently observe >50K AVs (Attribute-Value pairs) per second, which equates to double-digit MB/s. Note that this is really a measure of indexing throughput using a local disk, because that is ordinarily always the bottleneck (not the writes to the tx-log). Write latency will normally be much more network sensitive and depend on the levels of HA/durability/distribution you need, but it shouldn’t really change based on throughput workload unless you always need transactional (synchronous, logical) confirmation of those writes using await-tx/tx-committed?…in which case there would be some degradation whilst RocksDB’s LSM tree does its thing and you may want to tune RocksDB appropriately.

Read performance is also highly workload dependent but LMDB will almost always offer the best read performance (>3x RocksDB). XTDB is not (currently) engineered to be particularly efficient at TPC-H shaped analytical workloads, but for WatDiv-style cyclic queries with uniformly randomised distributions the performance is typically better than what most OLTP (transactional) engines can offer, although nothing is absolute and XT won’t always pick the optimal join order (which is usually what dominates real-world performance).

are results published anywhere?

Not currently, but I’d be happy to help you run the benchmarks yourself, or I can share some recent samples with you if you’d like to find out more: jdt@juxt.pro

Topic		Replies	Views
[ANN] 1.22.0 has landed! Users release	0	374	15 September 2022
Parallelizing data loading, processing large query results Users	3	372	6 September 2023
V2 AWS cluster performance expectations? Users v2	2	218	13 January 2024
General query performance. What should I expect from in-memory DB? Users	1	476	7 February 2022
What kind of startup times you are seeing with/out checkpoints? Users	2	532	2 February 2023

Do you have published data on Performance metrics?

Related topics