Role/support of Kafka in XTDB 2.0?

My understanding (naive and without-benefit-of-actually-using-it) of XTDB 1.x is that “the data” could be in a Kafka topic, and then XTDB would index that data and support queries.

Assuming that is a semi/partially accurate statement, does XTDB 2.0 with Arrow change that?

Hello @dcj :slight_smile: in 2.x the role of Kafka is reduced to ‘Write-Ahead Log’ (i.e. it is safe to truncate eventually), whereas currently the tx-log/doc-log topics need to have ‘infinite retention’ to support replaying.

My understanding is that in 1.x DB ingestion was accompished by publishing to a Kafka topic, correct? And yes, those topics are required to support infinite retention.

So, are you saying that in 2.0 ingestion is still via writing to a Kafka topic, but then there is a way to truncate that message later? If so, that would be way cool!

2 Likes

Yep, that’s about right! 2.x can lift the infinite retention of the tx-log requirement because it has a much stronger canonical type system and object-storage layer that is by itself sufficient as the “golden store”, thus avoiding the any need for replaying from a log-shaped golden store.

1 Like

So, in 2.0 the data is ingested via Kafka topics, but then stored in the new “columnar” store?

This sounds exciting!

1 Like

Exactly, yes :slight_smile: I should add that there’s still a hard requirement for deterministic processing throughout in order to offer ~trivial High Availability (otherwise XT has to solve leadership election / quorum challenges as well).

1 Like