Hey folks. I’m looking to use XTDB as the interface for a data lake, taking advantage of the bitemporal features to easily rebuild reporting tables at any point in time. I was going through the tutorial using psql and wanted to see what would happen to the information schema if I changed the type of some structs. What I noticed is that the information_schema appears to capture the schema of structs, but does not keep track of it as of a point in time. So if I once had a record with {"val": "1"} and then changed it to {"val": 1} the information schema would look something like
:union #{
:utf8 :i64
}
which does indeed tell us that the change took place, but it doesn’t have the same bitemporal functionality the rest of the db has. I want to be able to see the information schema as of before and after the change.
I imagine this extends to all other columns, and the schema in general is not tracked over time.
Is that schema history otherwise possible, or maybe I have this wrong entirely?
Hey @jgoad apologies for the delayed response - almost the entire team is away at conferences this week (either OSFF NYC / GOTO Copenhagen)!
You’re right that the schema info is currently ‘atemporal’ and therefore does not capture the schema change information in way that is accessible to users via SQL. Internally it should be possible for us to reconstruct the schema as-of a point in system-time, due to the way the files are stored immutably, however it’s not something we’ve prioritised so far. Efficiently tracking schema evolution over valid-time will likely require much more thought and engineering.
Can you say more about what you would like to achieve? Are there similar capabilities in other tools you’ve used?
You should at least be able to query and store this schema change information yourself in userspace for now. I will try to cook up an example of that soon.
Hey Jeremy! No worries at all. Seemed like a plenty fast response to me.
Can you say more about what you would like to achieve?
I’m not sure how I would implement it, but I feel like this would be a requirement if I wanted to gracefully handle schema evolution.
I guess an extremely trivial example could be to throw an alert and pause a pipeline when the schema of a payload changes in some breaking way (e.g. the type of field changes from string to an int).
Are there similar capabilities in other tools you’ve used?
No I don’t think so. I have very little experience with bitemporal databases and there are very few on offer!
That’s helpful, thanks. I feel like we could help with that scenario relatively easily. My initial thought is to fail the transaction when the schema is not what you expect, e.g. something like ASSERT EXISTS (SELECT 1 FROM information_schema.columns WHERE column_name = 'foo' AND data_type = ':utf8'); INSERT INTO docs (_id, foo) VALUES (2, 123) / Play
You could then detect the change before the data arrives in the database. Or alternatively you can have another process which takes data from some explicitly unsanitised table/column and re-transacts it elsewhere into a (more) sanitised form.