My understanding (naive and without-benefit-of-actually-using-it) of XTDB 1.x is that “the data” could be in a Kafka topic, and then XTDB would index that data and support queries.
Assuming that is a semi/partially accurate statement, does XTDB 2.0 with Arrow change that?
Hello @dcj
in 2.x the role of Kafka is reduced to ‘Write-Ahead Log’ (i.e. it is safe to truncate eventually), whereas currently the tx-log/doc-log topics need to have ‘infinite retention’ to support replaying.
My understanding is that in 1.x DB ingestion was accompished by publishing to a Kafka topic, correct? And yes, those topics are required to support infinite retention.
So, are you saying that in 2.0 ingestion is still via writing to a Kafka topic, but then there is a way to truncate that message later? If so, that would be way cool!
2 Likes
Yep, that’s about right! 2.x can lift the infinite retention of the tx-log requirement because it has a much stronger canonical type system and object-storage layer that is by itself sufficient as the “golden store”, thus avoiding the any need for replaying from a log-shaped golden store.
1 Like
So, in 2.0 the data is ingested via Kafka topics, but then stored in the new “columnar” store?
This sounds exciting!
1 Like
Exactly, yes
I should add that there’s still a hard requirement for deterministic processing throughout in order to offer ~trivial High Availability (otherwise XT has to solve leadership election / quorum challenges as well).
1 Like