:where clause not working

I have a document that includes an :event/click-type attribute, yet when I include [event :event/click-type] in a query, the document is not returned. All other attributes work fine.

(let [doc-id #uuid "3c708e71-c338-4289-837f-d77b09b8d109"
      db (xt/db node)]
  [(select-keys (xt/entity db doc-id) [:xt/id
                                       :event/timestamp
                                       :event/click-type])
   (xt/q db
         '{:find [event]
           :in [event]
           :where [[event :event/timestamp]]}
         doc-id)
   (xt/q db
         '{:find [event]
           :in [event]
           :where [[event :event/click-type]]}
         doc-id)])
; => [{:xt/id #uuid "3c708e71-c338-4289-837f-d77b09b8d109",
       :event/timestamp #inst "2021-12-10T16:45:28.897-00:00",
       :event/click-type :one-click-sub}
      #{[#uuid "3c708e71-c338-4289-837f-d77b09b8d109"]}
      #{}]

A weird bug in XT perhaps? I don’t see any bugs in my code. I’ve ran this code on two different machines (both indexing from the same tx log in production) and gotten the same result. I’m also going to ingest a dump of the transaction log on my local machine and see if I get the same results; I’ll update here once that’s done. It looks like I have at least 7 documents exhibiting this behavior.

Here’s the entity history for the document above. The first transaction doesn’t include :event/click-type. The second transaction (2 seconds after the first) adds that key. So I’m guessing this is related somehow, since :event/click-type is the only key that doesn’t work in the query.

[#:xtdb.api{:tx-time #inst "2021-12-10T16:45:28.907-00:00",
            :tx-id 4266555,
            :valid-time #inst "2021-12-10T16:45:28.907-00:00",
            :content-hash
            #xtdb/id "61af549d7a41097ff8c03befe57efa83bdc8b091",
            :doc
            {:event/type :click,
             :event/timestamp #inst "2021-12-10T16:45:28.897-00:00",
             :event/email-event #uuid "cc3e4f2b-ddf7-4106-bf2b-2d4b5e7e2042",
             :event/click-url "...",
             :event/ip-addr "...",
             :xt/id #uuid "3c708e71-c338-4289-837f-d77b09b8d109"}}
 #:xtdb.api{:tx-time #inst "2021-12-10T16:45:30.992-00:00",
            :tx-id 4266566,
            :valid-time #inst "2021-12-10T16:45:30.992-00:00",
            :content-hash
            #xtdb/id "9f27b318acea8b854bd563878b0728b6901f2512",
            :doc
            {:event/type :click,
             :event/timestamp #inst "2021-12-10T16:45:28.897-00:00",
             :event/email-event #uuid "cc3e4f2b-ddf7-4106-bf2b-2d4b5e7e2042",
             :event/click-url "...",
             :event/ip-addr "...",
             :event/click-type :one-click-sub,
             :xt/id #uuid "3c708e71-c338-4289-837f-d77b09b8d109"}}]
`

Update: when testing this locally, it works. I didn’t actually ingest the tx log exactly as returned by open-tx-log though (I’m unable to iterate through the tx log without getting an OOM error); instead I call entity-history on all docs in the DB and dump those as a list of operations, which I then ingested locally. If that’s worth exploring more, I could try to get the actual tx log and see if replaying that causes the error to manifest locally.

This isn’t super urgent; I can work around this particular query easily, though obviously it’s a little disconcerting and it would be nice to get to the bottom of this :slight_smile:

Well that looks pretty weird. I haven’t been able to reproduce obvious anything either :confused:

This seems like the most sensible course of action to me - and I’d be quite glad to pair on the debugging once you have something you can run locally (just DM me to coordinate).

Ah, this sounds like a bug in its own right - I will do some testing of my own (and potentially open another thread/issue…).

In the meantime, if you could confirm the XT version and storage config used, that might be helpful to us or others :pray:

Regarding the tx log iteration OOM, does running the following cause an OOM also? (or is this roughly what you were doing anyway?)

(let [fname "tx-log-export-1.ednl"]
  (with-open [wr-file (clojure.java.io/writer fname)]
    (with-open [l (xt/open-tx-log (xtdb-node) 0 true)]
      (doseq [e (iterator-seq l)]
        (.write wr-file (str e \newline))))))

This seems like the most sensible course of action to me - and I’d be quite glad to pair on the debugging once you have something you can run locally (just DM me to coordinate).

That’d be great. I’ll keep you posted. Next thing I’ll try, hopefully soon (or not, I guess with the holidays!), is to download the index to my local machine and see if I can get around the OOM error there. Relevant discussion we had on Slack last month: https://clojurians.slack.com/archives/CG3AM2F7V/p1636609241148400

I mentioned in that thread that the first part of the tx log consists of a bunch of 1000-document transactions, which you said should probably be fine. Another detail I’ve remembered since then is that a lot of those documents are relatively large, since they contain email contents from newsletters (now I store the contents in object storage with a foreign key in XT). Maybe the combination of large TXes + large docs is pushing XT over the edge?

Anyway, from the Slack convo, I was able to finish indexing after enough restarts, so maybe if I set that up locally I’ll be able to get through it eventually even if I keep getting OOMs.

Regarding the tx log iteration OOM, does running the following cause an OOM also? (or is this roughly what you were doing anyway?)

I got the OOM even from simply doing this:

(with-open [l (xt/open-tx-log (xtdb-node) nil true)]
  (while (.hasNext l)
    (.next l)))

I’m on version 1.19.0, using RocksDB for the index and Postgres for the TX log and doc store:

{:xtdb/index-store {:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
                               :db-dir (io/file "storage/xtdb" "index")}}
 :xtdb.jdbc/connection-pool {:dialect {:xtdb/module 'xtdb.jdbc.psql/->dialect}
                             :pool-opts {:maximumPoolSize 5}
                             :db-spec {...}}
 :xtdb/tx-log {:xtdb/module 'xtdb.jdbc/->tx-log
               :connection-pool :xtdb.jdbc/connection-pool}
 :xtdb/document-store {:xtdb/module 'xtdb.jdbc/->document-store
                       :connection-pool :xtdb.jdbc/connection-pool}}
1 Like

On the OOM front, I think it may be because we didn’t have fetch-size configured (meaning results would be cached by the driver) - please you try using open-tx-log again with the latest dev-SNAPSHOT release? The commit with the fix hasn’t been pushed yet, but essentially we just add a couple of parameters to the prepare call here: xtdb/jdbc.clj at e2f51ed99fc2716faa8ad254c0b18166c937b134 · xtdb/xtdb · GitHub

What are the full coordinates for the dev-SNAPSHOT release? com.xtdb/xtdb-core {:mvn/version "dev-SNAPSHOT"} didn’t work.

Ah, sorry, and thanks for pointing this out! So it seems to be the case that depending on SNAPSHOTs worked fine before we moved everything off of Clojars, but Maven Central seems to prefer to keep releases & snapshots in separate repositories :sweat_smile:

To resolve, try adding this to your deps.edn:
:mvn/repos {"sonatype snapshots" {:url "https://s01.oss.sonatype.org/content/repositories/snapshots"}}

Or to your project.clj:
:repositories [["sonatype snapshots" {:url "https://s01.oss.sonatype.org/content/repositories/snapshots"]]

An extreme workaround is download the snapshot jar directly: https://s01.oss.sonatype.org/content/repositories/snapshots/com/xtdb/xtdb-jdbc/dev-SNAPSHOT/xtdb-jdbc-dev-20211221.134401-3.jar (note: you can browse for full coordinates on this site)
and use com.xtdb/xtdb-jdbc {:local/root "/path/to/jar/xtdb-jdbc-dev-20211221.134401-3.jar"} - you probably want to leave the other xtdb-* deps at 1.20.0 or whatever, mainly because xtdb-core brings in all the other modules, which then all need to be manually downloaded and added too…though I’m probably missing some crucial POM scoping concept to avoid this)

1 Like

I edited the above post to replace "sonartype snapshots" with "sonatype snapshots" to avoid future confusion.

Just ran this, still got an OOM error a couple minutes in.

Thanks for trying again.

I’ve created this for the OOM: Postgres open-tx-log should not OOM · Issue #1682 · xtdb/xtdb · GitHub

And this for the original :where issue: Document not being returned by certain triple clause · Issue #1683 · xtdb/xtdb · GitHub

1 Like

Thanks, I’ll try to follow up on those soon!

1 Like

The root cause of this issue has now - we strongly suspect - been resolved, and the fix will be included in the next release. It boils down to a failed match operation very subtly preventing related documents which are first-seen in that transaction from ever being indexed properly thereafter. The issue for this has been closed on that basis (https://github.com/xtdb/xtdb/issues/1683) but feel free to re-open it as required :slightly_smiling_face:

I don’t have any updates on the open-tx-log issue yet.

1 Like