Just wanted to copy this here from the juxt zulip for public reference
Just wondering, why does the Lucene module store the fields? I can see that the resolve-search-results-a-v{-wildcard} function requires the attribute and value (hence they have to be stored by lucene), but why is this preferred over storing the document temporal hash? Is it to avoid having to reindex multiple copies of attributes on the same document that haven’t changed?
Answer from @refset
Exactly this, yep
undoubtedly in some scenarios it’s going to be less than ideal to have massive strings stored many (i.e. both in KV indexes + multiple ways in Lucene), but hopefully overall this is the better trade-off to make (…although we don’t have any hard empirical data to support that claim!)
For others’ context, this is really what we are discussing: https://github.com/xtdb/xtdb/blob/ff0895b0c3cc956940a4784afe1b477094995a88/modules/lucene/src/xtdb/lucene.clj#L240-L243