[2.X] Hash map with string keys not supported?

Hi all! Long time developer, first time Clojurist/XTDB-ian.

I’ve been playing with XTDB 2.x (with an eye towards production) and ran into an issue. It appears that storing a record with a value that’s a hash-map with string keys doesn’t work. As an example:

(xt/submit-tx my-node [[:put :event {:xt/id "msg_8675309"
                                     :name "sent_message"
                                     :data {"to" "+15551234567"
                                            "segements" 1}}]])

The XTDB server (running through Docker) reports:

java.lang.IllegalArgumentException: No matching clause: :map
	at xtdb.types$eval4486$fn__4487.invoke(types.clj:143)
	at clojure.lang.MultiFn.invoke(MultiFn.java:239)
	at xtdb.types$col_type__GT_field.invokeStatic(types.clj:173)
	at xtdb.types$col_type__GT_field.invoke(types.clj:171)
	at xtdb.vector.writer$eval6088$fn$reify$reify__6117.apply(writer.clj:769)
	at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220)
	at xtdb.vector.writer$eval6088$fn$reify__6110.writerForType(writer.clj:754)
	at xtdb.vector.writer$eval6052$fn__6055.invoke(writer.clj:589)
	at xtdb.vector.writer$eval5599$fn__5611$G__5590__5618.invoke(writer.clj:34)
	at xtdb.tx_producer$__GT_put_writer$write_put_BANG___24838.invoke(tx_producer.clj:283)
	at xtdb.tx_producer$write_tx_ops_BANG_.invokeStatic(tx_producer.clj:360)
	at xtdb.tx_producer$write_tx_ops_BANG_.invoke(tx_producer.clj:347)
	at xtdb.tx_producer$serialize_tx_ops.invokeStatic(tx_producer.clj:381)
	at xtdb.tx_producer$serialize_tx_ops.invoke(tx_producer.clj:367)
	at xtdb.tx_producer.TxProducer.submitTx(tx_producer.clj:403)
	at xtdb.node.Node.submit_tx_AMPERSAND_(node.clj:103)
	at xtdb.server$eval35039$fn__35040$fn__35042.invoke(server.clj:65)
	at sieppari.interceptor$eval34217$fn__34218$fn__34219.invoke(interceptor.cljc:33)
	at sieppari.core$_try.invokeStatic(core.cljc:20)
	at sieppari.core$_try.invoke(core.cljc:17)
	at sieppari.core$enter.invokeStatic(core.cljc:62)
	at sieppari.core$enter.invoke(core.cljc:49)
	at sieppari.core$execute$fn__34423.invoke(core.cljc:125)
	at sieppari.core$execute.invokeStatic(core.cljc:123)
	at sieppari.core$execute.invoke(core.cljc:117)
	at reitit.interceptor.sieppari$reify__34432.execute(sieppari.clj:18)
	at reitit.http$ring_handler$fn__33494.invoke(http.cljc:165)
	at clojure.lang.AFn.applyToHelper(AFn.java:160)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.lang.AFunction$1.doInvoke(AFunction.java:31)
	at clojure.lang.RestFn.invoke(RestFn.java:436)
	at ring.adapter.jetty9$proxy_async_handler$fn__34962.invoke(jetty9.clj:70)
	at ring.adapter.jetty9.proxy$org.eclipse.jetty.server.handler.AbstractHandler$ff19274a.handle(Unknown Source)
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:51)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
	at org.eclipse.jetty.server.Server.handle(Server.java:562)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:406)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:663)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:398)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
	at org.eclipse.jetty.io.SocketChannelEndPoint$1.run(SocketChannelEndPoint.java:101)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:412)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:381)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:378)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
	at java.base/java.lang.Thread.run(Thread.java:833)

If I switch the keys to keywords, inserting a record works:

(xt/submit-tx my-node [[:put :event {:xt/id "msg_8675309"
                                     :name "sent_message"
                                     :data {:to "+15551234567"
                                            :segements 1}}]])

The Docker invocation for the XTDB server is:

docker run -p 5432:5432 -p 3001:3000 ghcr.io/xtdb/xtdb-ea@sha256:32ad41db657406269177a3e02562a3157083cd4bdef15ca9ee5c82ae77931356

Some background color: I’m looking to store (parsed) JSON objects and want to keep the keys as their original strings.

Are string keys in a hash-map not supported, or was I lucky enough to find a bug?

Thanks!

Hey @tenpin, and welcome :wave:

Yes, string keys are currently unsupported - we’re looking to add support for them, but unfortunately it’s not as straightforward as it might seem. tl;dr is that we need to figure out a way to differentiate the user’s intent between a ‘struct’ (where all the keys are known ahead-of-time, like a Java class) and a ‘map’ (where they’re not), because Arrow requires us to store them differently - we’re looking into ways to do this that are fool-proof in Clojure and JSON, and not overly onerous on the user.

Also, in the underlying Arrow format, all of the keys throughout the nested structure have to be stored as strings, so to a certain extent the information about whether keys are keywords or strings will be lost in a roundtrip. At the moment we solve this by keywordising everything on the way out, but I’d like this to at least be parameterised so that users have the choice.

For now, though, I’d recommend the approach you’ve taken.

HTH!

James

1 Like

Come for the answer, stay for the mini deep-dive into the workings of XTDB!

@jarohen thank you so much for not just answering my question, but explaining the “why” behind it. Makes sense, and knowing what I know now, I feel more comfortable “keywordising” (love that term) JSON data before storage.

Orthogonally, checking my knowledge, it looks like XTDB does support keywords as values?

E.g. two separate records are created here, and I’m able to retrieve them by their :xt/id

(xt/submit-tx my-node [[:put :currency {:xt/id :USD
                                        :name "US Dollar"
                                        :symbol "$"
                                        :precision 2}]])

(xt/submit-tx my-node [[:put :currency {:xt/id "USD"
                                        :name "US Dollar"
                                        :symbol "$"
                                        :precision 2}]])

(Are there any performance penalties and/or foot guns I’m missing when it comes to using keywords as a record ID?)

FWIW, I’m wildly bullish on XTDB. Having been tasked (more than once!) with hacking bitemporality features on to various SQL databases at the application level over the years, XTDB is a breath of (very) fresh air. :100:

1 Like

My pleasure :slight_smile:

Orthogonally, checking my knowledge, it looks like XTDB does support keywords as values?

It does, yep

Are there any performance penalties and/or foot guns I’m missing when it comes to using keywords as a record ID?

Normally we’d recommend randomised UUIDs in XT2 where possible, but in this case you’ve got a natural primary key, so this’ll be fine - all that’ll happen is that we’ll take a hash of the keyword when deciding where it goes in the primary index.

FWIW, I’m wildly bullish on XTDB. Having been tasked (more than once!) with hacking bitemporality features on to various SQL databases at the application level over the years, XTDB is a breath of (very) fresh air. :100:

This is great to hear, thanks! Please do let us know if you have any other questions, and more than happy to talk XT internals.

Related: XT2 is still in early access, so we’re looking to work with keen early adopters, understand their use cases in more detail and elicit feedback - would this be something you’d be interested in? If so, give us a shout :slight_smile: (cc @refset)

Cheers,

James

Any particular reason for random UUIDs, vs, say, v5 UUIDs as surrogate keys? I’m curious, as I have used v5 UUIDs extensively on XTDB 1.x

@jarohen @refset I’d love to provide feedback and ask (a lot!) of questions around the best way to structure and query data with XTDB v2 as it evolves!

Please let me know what that process looks like.
(and thanks again for your help!)

Sorry, ‘random’ is a little stronger than we actually need, yes - ‘well distributed’ is the property we’re looking for, as we use these for sharding the primary index. The SHA1 hashing in v5 UUIDs (even if it’s only the local half that has a range of values) should be fine :slightly_smiling_face:

James

@tenpin , might be worth checking this thread Upcoming query API discussion session - 2.x Datalog - #11 by Martin_Varela

1 Like