XTDB v2 direct access to arrow tables

Hi all,

I’ve been playing with the pgwire/SQL interface over python, which returns results as sequences of rows. My input data and output data are arrow tables, and I am using XTDB to retain bitemporal-ness across various datasets. I have the impression I might be doing unnecessary transformations (arrow vs sequences-of-rows) when interfacing with XTDB v2.

Are there any plans (or is there existing functionality - SQL/XTQL/other) to send/receive arrow-format tables to/from XTDB?

Thanks!

Hey @jr200 we already have a FlightSQL module which in theory might help with this, but it’s not been a priority for us to test or improve (or document) recently. The other possibility is adding ADBC support: ADBC driver for production in-process usage · Issue #3395 · xtdb/xtdb · GitHub

Currently though we don’t support direct access to Arrow - everything must go through one of the (row-oriented) SQL APIs (over HTTP / pgwire). Another possibility we’ve discussed previously is adding Arrow as a supported format to the HTTP endpoint, but investing effort into a more standardised approach (FlightSQL/ABDC) is probably the better direction. Happy to hear your thoughts & ideas here though!

I don’t have a preference on ADBC versus FlightSQL, my use-case (loading data into polars) appears to be agnostic. The primary motivation is just to avoid any unnecessary marshalling/unmarshalling - esp. as my dataset grows.

Can I add a vote to have a non-clojure based solution. I’d be happy to have a test/play if such an interface appears…

1 Like

Understood, thanks. And am I correct to assume that, for your usage, querying in a columnar format is more important than inserting in a columnar format? Just to make sure we prioritise the right thing :slightly_smiling_face:

At this stage of my project, querying is my priority. However!..

At some point, I will have a big backfill of data to do. Using a traditional temporal database (where I had to apply hacks such as manually correcting the _valid_to of records programatically), this took around 7 days of constant backfill. Its a one-off job, so I don’t mind if its a bit painful.

I suspect others would find bulk write functionality useful - I haven’t yet looked at what options xtdb-v2 currently has…

1 Like