Querying DecentCMS, part 1: building an index

DecentCMS’s search module provides the infrastructure to build and query search indexes, as well as a file-based implementation that is suitable for small sites. Querying in DecentCMS is based on a simple JavaScript API that is loosely based on the map/reduce pattern. The basic idea is that you first build an index, and then you can run queries on that index. The architecture ensures that querying can scale to very large content stores. It also enables querying to work in a unified way across heterogeneous storage mechanisms. Effectively, storage and querying are entirely separated.

It is possible to create and query a search index from code, like documentation-toc-part is doing, but there is also a ready-to-use content part that makes simple querying really easy: search-part.

In order to perform a search, an index has first to be created. An index is defined by three things:

  • An optional id filter

    This is your first and most efficient way to filter out content items. It's a regular expression that will be tested against the id of the content items. If it tests negatively, the content item won't even be fetched from the store.

  • A mapping function

    This is a JavaScript function that maps a content item onto one or several index entries. It's the rough equivalent of a SELECT in SQL: it defines what properties will be available on the index entries, the same way that a SQL SELECT specifies what columns will be available on the rows of the result set. A big difference however is that the mapping function may decline to return an entry for some items, and can return more than one if necessary. In that sense, it's also a little bit of a WHERE.

  • An order function

    This is a JavaScript function that maps an index entry onto one or more values by which the index should be sorted. It is the equivalent of SQL's ORDER BY. The return type of this function can be a simple JavaScript comparable value (string, number, Date), or it can be an array of such values. If an array is returned, each value in the array will be used, one after the other. As soon as a comparison gives a non-equal result, it is considered done and the exploration of the array is stopped.

From those three things, an index can be built by the search module by scanning each content item in the system, no matter where it is stored.

The index will be pre-sorted, and pre-filtered by both the id filter, and the logic in the mapping function. The index doesn't need to be rebuilt as long as the content items don't change. When a content item changes, it is possible to update the index by running the id filter and the mapping function on just this item, and by using the order by function to figure out where to change index entries.

This makes the system fast on querying, and a little slower on write operations, as all indexes in the system potentially need to be updated in such a case. The index updates can be performed asynchronously in the background, however, which mitigates the issue.

An index is built by requiring an index service from the scope, and then calling getIndex on it. An optional name can be given to the index, and it is recommended that you do so. Otherwise, a name is generated from the filter, map, and order function source code, which is harder to maintain.

In the next post, I’ll show how to query the index we just built.

No Comments