The xtdb-kafka module design currently relies on a transparent model of an infinite-retention + single-partition topic, so you couldn’t truncate or migrate messages from that topic without some redesign of the module. I’m aware that mainline tiered storage for Kafka is incoming (and that Confluent has something similar in-house already) which would avoid the need for changes in XT whilst achieving the same outcome, see https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
Can we generate the “Transaction Log” and “Index”, from the “Document Store”?
No, the tx-log is the primary source of truth as it contains all the hashed IDs which are the references to the documents in the doc-store (as well as lists of transaction operations, timestamps etc.).
can we query against solely the “Document Store”?
XT doesn’t officially support querying the doc-store directly without first consulting indexes, so you always have to query via a node with local indexes (via q/entity/pull etc.). There is an internal Clojure protocol / API available for advanced usage, and in the ultimate case you can enumerate and inspect the contents of the underlying doc-store storage (e.g. a Postgres table or Kafka topic) directly if needed.
“The xtdb-kafka module design currently relies on a transparent model of an infinite-retention + single-partition topic, so you couldn’t truncate or migrate messages from that topic without some redesign of the module.”
Ah ok. So that means, for a Kafka-backed XTDB, the Kafka topic must have an inifinite retention.
And a topic retention of, say 1 year, means the XTDB data model doesn’t work.