I’ve been playing with the pgwire/SQL interface over python, which returns results as sequences of rows. My input data and output data are arrow tables, and I am using XTDB to retain bitemporal-ness across various datasets. I have the impression I might be doing unnecessary transformations (arrow vs sequences-of-rows) when interfacing with XTDB v2.
Are there any plans (or is there existing functionality - SQL/XTQL/other) to send/receive arrow-format tables to/from XTDB?
Currently though we don’t support direct access to Arrow - everything must go through one of the (row-oriented) SQL APIs (over HTTP / pgwire). Another possibility we’ve discussed previously is adding Arrow as a supported format to the HTTP endpoint, but investing effort into a more standardised approach (FlightSQL/ABDC) is probably the better direction. Happy to hear your thoughts & ideas here though!
I don’t have a preference on ADBC versus FlightSQL, my use-case (loading data into polars) appears to be agnostic. The primary motivation is just to avoid any unnecessary marshalling/unmarshalling - esp. as my dataset grows.
Can I add a vote to have a non-clojure based solution. I’d be happy to have a test/play if such an interface appears…
Understood, thanks. And am I correct to assume that, for your usage, querying in a columnar format is more important than inserting in a columnar format? Just to make sure we prioritise the right thing
At this stage of my project, querying is my priority. However!..
At some point, I will have a big backfill of data to do. Using a traditional temporal database (where I had to apply hacks such as manually correcting the _valid_to of records programatically), this took around 7 days of constant backfill. Its a one-off job, so I don’t mind if its a bit painful.
I suspect others would find bulk write functionality useful - I haven’t yet looked at what options xtdb-v2 currently has…