Opinions: Datalog, SQL, both, or neither?

We’re hearing from more and more people that they would make heavy use of a proper SQL engine in XT. We do have a few users of the existing xtdb-sql extension but because it is built on Apache Calcite, it is restricted to an unorthodox (for a document-graph database, anyway) schema-on-write setup, as can be seen in the docs. We’ve been thinking, researching, and experimenting with a ground-up SQL engine – but we’d love to hear your thoughts on the matter.

If the XT SQL engine were powerful enough, we’d like to know how many folks would still make use of Datalog queries. Would you do OLTP-shaped queries in Datalog and OLAP-shaped queries in SQL? Would you switch completely to SQL? Or would you avoid SQL like the plague?

We know there’s likely to be a few folks out there that don’t like either SQL or Datalog. If that’s you, what query/db language do you wish you had? Why?

Implementing an ANSI SQL compatible engine sounds like a massive amount of work – are you really thinking of that scope or would it be more likely to be a “useful” subset of ANSI SQL?

And would this avoid the need for any schema to be invented/overlaid on the data?

And (final question) would this potentially lead to an XTDB JDBC driver?

Without the answer to the last two being “yes” I think I would prefer to stick with Datalog – but I think a viable JDBC driver for XTDB would open up a lot of new opportunities :slight_smile:

The current target for this research is a “useful” subset of SQL:2011. Even if we manage to build that, it will still be a massive amount of work. :wink:

Assuming that mass of work is accomplished, the answers to the last two questions are “yes” and “probably.” The initial goal would be for a new XT SQL engine to avoid schema-on-write entirely. (Though there’s been talk of supporting CREATE VIEW for BI tools.) We do hope a new SQL engine could lean on the mountains of existing SQL tooling, but it seems unlikely we could reuse existing drivers, verbatim. XT is naturally “schema-on-demand”, including nested data. Add to that our upcoming temporal index and it’s quite unlikely existing drivers (JDBC or otherwise) would be enough.