Lucene indexes and storage of fields

Just wanted to copy this here from the juxt zulip for public reference

Just wondering, why does the Lucene module store the fields? I can see that the resolve-search-results-a-v{-wildcard} function requires the attribute and value (hence they have to be stored by lucene), but why is this preferred over storing the document temporal hash? Is it to avoid having to reindex multiple copies of attributes on the same document that haven’t changed?

Answer from @refset_xt

Exactly this, yep :smile: undoubtedly in some scenarios it’s going to be less than ideal to have massive strings stored many (i.e. both in KV indexes + multiple ways in Lucene), but hopefully overall this is the better trade-off to make (…although we don’t have any hard empirical data to support that claim!)
For others’ context, this is really what we are discussing: xtdb/lucene.clj at ff0895b0c3cc956940a4784afe1b477094995a88 · xtdb/xtdb · GitHub

2 Likes