Field Indexes

Field indexes index orderable values. Note that they don’t check for orderability. That is, all of the values added to the index must be orderable together. It is up to applications to provide only mutually orderable values.

>>> from zope.index.field import FieldIndex
>>> index = FieldIndex()
>>> index.index_doc(0, 6)
>>> index.index_doc(1, 26)
>>> index.index_doc(2, 94)
>>> index.index_doc(3, 68)
>>> index.index_doc(4, 30)
>>> index.index_doc(5, 68)
>>> index.index_doc(6, 82)
>>> index.index_doc(7, 30)
>>> index.index_doc(8, 43)
>>> index.index_doc(9, 15)

Field indexes are searched with apply, which returns an instance of IFSet. Let’s write a function to display those sets portably (across CPython and PyPy).

The argument to apply is a tuple with a minimum and maximum value.

>>> def show_ifset(ifset):
...     print('IFSet(%s)' % list(ifset))
...
>>> show_ifset(index.apply((30, 70)))
IFSet([3, 4, 5, 7, 8])

A common mistake is to pass a single value. If anything other than a two-tuple is passed, a type error is raised:

>>> index.apply('hi')
Traceback (most recent call last):
...
TypeError: ('two-length tuple expected', 'hi')

Open-ended ranges can be provided by provinding None as an end point:

>>> show_ifset(index.apply((30, None)))
IFSet([2, 3, 4, 5, 6, 7, 8])
>>> show_ifset(index.apply((None, 70)))
IFSet([0, 1, 3, 4, 5, 7, 8, 9])
>>> show_ifset(index.apply((None, None)))
IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

To do an exact value search, supply equal minimum and maximum values:

>>> show_ifset(index.apply((30, 30)))
IFSet([4, 7])
>>> show_ifset(index.apply((70, 70)))
IFSet([])

Field indexes support basic statistics:

>>> index.documentCount()
10
>>> index.wordCount()
8

Documents can be reindexed:

>>> show_ifset(index.apply((15, 15)))
IFSet([9])
>>> index.index_doc(9, 14)
>>> show_ifset(index.apply((15, 15)))
IFSet([])
>>> show_ifset(index.apply((14, 14)))
IFSet([9])

Documents can be unindexed:

>>> index.unindex_doc(7)
>>> index.documentCount()
9
>>> index.wordCount()
8
>>> index.unindex_doc(8)
>>> index.documentCount()
8
>>> index.wordCount()
7
>>> show_ifset(index.apply((30, 70)))
IFSet([3, 4, 5])

Unindexing a document id that isn’t present is ignored:

>>> index.unindex_doc(8)
>>> index.unindex_doc(80)
>>> index.documentCount()
8
>>> index.wordCount()
7

We can also clear the index entirely:

>>> index.clear()
>>> index.documentCount()
0
>>> index.wordCount()
0
>>> show_ifset(index.apply((30, 70)))
IFSet([])

Sorting

Field indexes also implement IIndexSort interface that provides a method for sorting document ids by their indexed values.

>>> index.index_doc(1, 9)
>>> index.index_doc(2, 8)
>>> index.index_doc(3, 7)
>>> index.index_doc(4, 6)
>>> index.index_doc(5, 5)
>>> index.index_doc(6, 4)
>>> index.index_doc(7, 3)
>>> index.index_doc(8, 2)
>>> index.index_doc(9, 1)
>>> list(index.sort([4, 2, 9, 7, 3, 1, 5]))
[9, 7, 5, 4, 3, 2, 1]

We can also specify the reverse argument to reverse results:

>>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True))
[1, 2, 3, 4, 5, 7, 9]

And as per IIndexSort, we can limit results by specifying the limit argument:

>>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3))
[9, 7, 5]

If we pass an id that is not indexed by this index, it won’t be included in the result.

>>> list(index.sort([2, 10]))
[2]
>>> index.clear()

Bugfix testing

Happened at least once that the value dropped out of the forward index, but the index still contains the object, the unindex broke

>>> index.index_doc(0, 6)
>>> index.index_doc(1, 26)
>>> index.index_doc(2, 94)
>>> index.index_doc(3, 68)
>>> index.index_doc(4, 30)
>>> index.index_doc(5, 68)
>>> index.index_doc(6, 82)
>>> index.index_doc(7, 30)
>>> index.index_doc(8, 43)
>>> index.index_doc(9, 15)
>>> show_ifset(index.apply((None, None)))
IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here is the damage:

>>> del index._fwd_index[68]

Unindex should succeed:

>>> index.unindex_doc(5)
>>> index.unindex_doc(3)
>>> show_ifset(index.apply((None, None)))
IFSet([0, 1, 2, 4, 6, 7, 8, 9])

Optimizations

There is an optimization which makes sure that nothing is changed in the internal data structures if the value of the ducument was not changed.

To test this optimization we patch the index instance to make sure unindex_doc is not called.

>>> def unindex_doc(doc_id):
...     raise KeyError
>>> index.unindex_doc = unindex_doc

Now we get a KeyError if we try to change the value.

>>> index.index_doc(9, 14)
Traceback (most recent call last):
...
KeyError

Leaving the value unchange doesn’t call unindex_doc.

>>> index.index_doc(9, 15)
>>> show_ifset(index.apply((15, 15)))
IFSet([9])

Reference

Field index

A sorting mixin class for FieldIndex-like indexes.