Transactor stuck with "missing docs" error

We are using xtdb with rocksdb store for indexes, tx-log and document store. The data ingestion is stuck with “missing docs” error and we are not able to add any further data. (we are using 1.24.1)

Please suggest if there is a way to recover from this.

Error:

07:23:23.473 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @33622ms to org.eclipse.jetty.util.log.Slf4jLog
07:23:37.898 [main] INFO  xtdb.tx - XT tx ingester starting... Latest completed tx '11822220', latest submitted to tx-log '13536542'
07:23:37.902 [main] INFO  xtdb.tx - Secondary indices up to date, continuing...
07:23:37.985 [main] INFO  o.a.c.a.metrics.MetricsSystemLoader - No metrics implementation available on classpath. Using No-op implementation

07:23:38.536 [xtdb-tx-docs-fetch-1] ERROR xtdb.tx - Ingester error occurred
xtdb.IllegalStateException: missing docs: #{#xtdb/id "89b1b6fd07a2e5539ea8c1aa1ea5d471dac868cc" #xtdb/id "d08978b894a9c2fc17c6ca94c48dd0a3d940ae4d" #xtdb/id "0042268b1f472689f3856bf9d94a79857cf86e47" #xtdb/id "fb7e289d731d304f63e69c51e13d9028ef95061a" #xtdb/id "e2ceaac2d78ca15dee86be80a7e8f757ac2db028" #xtdb/id "5b585cc55d9ca4ec0ce721e1b6901de52cb1e783" #xtdb/id "d6152961b8e5cc4f8e5d1a72a0383d35b36de881" #xtdb/id "2cfc4e0e6208760a8e76b79dc3c9b217fcc193bf" #xtdb/id "669d13bf15bcc90ba886158367d8443853197c6d" #xtdb/id "c38e100c939f768f3fae8c356560ebfcdb45bc1a" #xtdb/id "2ee45dfa1522a40f2024a6c3ca0bb16285ebc196" #xtdb/id "26c67d13bc7090b46b6d4b63d66ccf20276a5279" #xtdb/id "2ad5235b0b521754c91dacc80ae61dd592ee3c31" #xtdb/id "4f7d18b090f88b1f9717ba07df4e855a4ad34d8d" #xtdb/id "044917de0c80fec7c950c75fce8fc345dba77abf" #xtdb/id "e881966366baa6840a5161691d5d39e050c93369" #xtdb/id "53ce86f5c03cd9b03c77673bc04b148be79150b6"}
	at xtdb.tx$fetch_docs_for_tx.invokeStatic(tx.clj:63)
	at xtdb.tx$fetch_docs_for_tx.invoke(tx.clj:54)
	at xtdb.tx$__GT_tx_ingester$fn__11440$txs_doc_fetch_fn__11466.invoke(tx.clj:682)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$apply.invoke(core.clj:662)
	at xtdb.tx$__GT_tx_ingester$apply_if_not_done__11430.invoke(tx.clj:614)
	at xtdb.tx$__GT_tx_ingester$fn__11440$submit_job_BANG___11441$fn__11442.invoke(tx.clj:649)
	at clojure.lang.AFn.call(AFn.java:18)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Hi @uvsravi there are some blunt ways to recover from this “missing docs” halted ingestion state (essentially either manually truncate the log or insert empty/recovered documents, then restart the node), however the implication here is that some kind of data loss has occurred. It is possible that the data is still present though or can be recovered, so definitely take a backup if you can and make sure any existing backups are safe.

Did any kind of system/hardware failure happen beforehand (e.g. non-safe JVM shutdown)? Has the node been restarted multiple times already? Does this error occur consistently with the exact same #xtdb/id entries listed each time?

Perhaps most importantly, is this an issue affecting a production environment or a development/testing environment? Note that the RocksDB-backed tx-log and document store is not intended for HA or strongly-durable production usage.

Please also feel free to reach out directly with any other context you would like to share: jdt@juxt.pro