Handling potentially missing fields

Summarising from other places.

  • Storing explicit values – like :missing or nil – for potentially missing fields leads to faster queries when you are interested in absence of data.
  • Empty vectors will be treated as missing in queries – they are not indexed.
  • Examples of ‘absence queries’ when fields are missing: 1, 2.
1 Like

Jeremy’s idea of using get-attr. I did a benchmark of various query strategies, which seems to show – for that particular data example – that the idea performs well.

Jeremy added more nuance:

IIRC the real performance challenges arise when joining in the presence of multiple potential absences (at least)

(Deleted the previous post in favour of this, hopefully, clearer one.)