What is faceted search?
Searching is an important part of any business database function, either through internal databases, internal document stores, or through the content of a website. The more documents are indexed the more important becomes to refine the search query and filtering results. This can be done by adding extra keywords to the query, field search which restricts the search within certain specified fields only, or faceted search.
Faceted search vs. field search
Faceted search is much more than just a field search. Field search allows you to restrict the search within specified fields, but requires you to know both all available fields in the indexed documents as well as all unique values per field. This leads often to trial and error, leaving the user with no or unsatisfactory results.
Faceted search is not just a filter, but a complete set of filters is automatically generated/derived from the indexed documents. Faceted search automatically clusters the indexed documents or search results into categories. The user can then filter and refine the search results by selecting specific values or numeric ranges for each facet field. For each facet field the number of search results matching each distinct facet value or numeric range is shown.
As faceted search allows counting and filtering of documents matching specific facet field values, it is often used in product search to narrow down and count the search results according to specific product features like brand, manufacturer, rating, or price.
Index facets
After indexing, the index facets list all available facet fields, their unique values, and count how often a specific value occurs within all indexed documents. This provides a complete overview of all filtering options for the entire index, together with quantitative information about the potential result number for each option.
Query facets
When searching, the query facets list all available facet fields, their unique values, and count how often a specific value occurs within all indexed documents that match a given query. This provides a complete overview of all filtering options for the current result set, together with quantitative information about the potential result number for each option.
Facet filtering, counting & sorting
The facet filter specified in the query filters the returned results to those documents both matching the query AND matching for all (boolean AND) specified facet filter fields at least one (boolean OR) of the specified values.
If the query is changed and/or the facet filter is changed then both search results and facet counts are changed.
String Facets
E.g. the Language field of a document may have different values: e.g. English, French, German, Russian, Chinese.
Value facet counting: Each distinct value is counted across all indexed documents (index facet) or all results documents (query facet). The string facets return both information about which languages exist and how often they occur, within all indexed documents (index facet) or within in all documents that match the query (query facet).
Facet sorting: The values within each facet field are sorted by the number of their occurrences in the documents in descending order. The number of returned values per facet field can be limited by the parameter facetvalueslength.
Search result filtering: Besides counting, we may also filter the results to return only those documents, where the Language field contains specific language values e.g. English or French.
Search result sorting: The returned search results can be sorted by the (first) value of the specified facet field of a document in ascending or descending order. If no sort field is specified, then the results are sorted by rank in descending order per default.
SeekStorm supports the stringFacet field type for Value Facet counting, filtering & sorting.
Numerical Range Facets
In contrast to string facets which define themselves by the existing distinct values, for range facets we have to explicitly define the ranges we want to distinguish and count. E.g. for the Price field we want to define price ranges: e.g. 0..10 USD, 10..20 USD, 20..50 USD, 50..100 USD, 100..1000 USD. The ranges may be defined differently for each query.
Range facet counting: Across all results documents (query facet) it is then counted how often a price value falls within one of the defined price ranges.
Search result filtering: Besides counting, we may also filter the results to return only those documents, where the Price field value is within the range of 15..25 USD. The filter range is independent from the defined range facets for counting.
Search result sorting: The returned search results can be sorted by the value of the specified integerFacet, floatFacet, or dateFacet field of a document in ascending or descending order. If no sort field is specified, then the results are sorted by rank in descending order per default.
SeekStorm supports integerFacet, floatFacet, and dateFacet (Unix Time) field types for Range Facet counting, filtering & sorting.
Performance
Faceted search is known to have a heavy impact on search performance. For SeekStorm performance and scaling are always paramount, also for faceted search. We went the extra mile to optimize our index architecture so that faceting has almost no impact on the search performance, no matter how big your index is, how many facet fields, and unique facet values per field your indexed documents and your query facet filter contain.
Relevant API endpoints
Create index
Facet fields allow counting, filtering, and sorting of indexed documents matching specific facet field values or numeric ranges.
Create index request object: Facet fields are defined by setting the field type to one of the following types: stringFacet, integerFacet, floatFacet, dateFacet:
"fields": { "language":{"store":true,"type":"stringFacet"}, "category":{"store":true,"type":"stringFacet"}, "author":{"store":true,"type":"stringFacet"}, "price":{"store":true,"type":"floatFacet"} }The number of facet fields (type stringFacet, integerFacet, floatFacet, dateFacet) is limited to 10 per index. The number of index fields (type Title, Url, Text, Index) together is limited to an additional 16. Of this, the number of type Title, Url, Text is each limited to 1. The number of fields of NoIndex type is unlimited. NoIndex fields can be of any JSON type (number, string, boolean). They are not indexed and not searchable. But with the property store=true they are stored and can be retrieved as a payload in the search results.
The string length of any field name is limited to 100 characters.
Index document(s)
While the type of the facet fields (stringFacet, integerFacet, floatFacet, dateFacet) is defined in create index, the value of those fields is defined and indexed with index document.
Index document request object: Each facet field can be assigned a single or multiple string values or a single numerical value per document:
{ "language":"English", "category":"Politics", "author":["Bob Woodward","Carl Bernstein"], "price":9.90 }In addition to the facet functionality (counting, filtering, sorting) the facet value is also indexed to full-text search.
The number of facet values per doc, across all string facet fields, is limited to 10, excess values are ignored.
The number of distinct values per stringFacet field, across the whole index, is limited to 65,535.
The string length of a stringFacet value is limited to 100 characters.
Get index
Facet values length: Maximum number of returned values per facet field, if there are too many distinct values.The parameter facetvalueslength allows limiting the number of returned distinct values per facet field (default=10). With facetvalueslength=0 no facts are returned at all. The values are sorted by the frequency of the appearance of the value within the indexed documents matching the query in descending order.
&facetvalueslength=5
Facet values filter: If there are many distinct values per facet field, we might want to filter the returned values to those matching a given prefix.
&facetvaluesfilter={"language":"ger","brand":"a"}
Index facets response object: Index facets are returned in the response body. For all facet fields, it is counted how often each distinct value occurs within all indexed documents. Within each facet field, the values are sorted by their occurrence count within all indexed documents in descending order.
With get index only string facets are returned. Range facets for numerical values are returned with query documents,
where the ranges can be defined dynamically with different boundaries for each query. This allows defining facets ranges like Last hour, last day, last week, last month.
"facets":{ "language":[{"value":"English","count":4},{"value":"German","count":3}], "category":[{"value":"Economy","count":4},{"value":"Politics","count":2}] }
Query documents
Query: The query defines which of the indexed documents are returned as search results.
query=test
Facet values length: Maximum number of returned values per facet field, if there are too many distinct values.
The facetvalueslength parameter allows limiting the number of returned distinct values per facet field (default=10). With facetvalueslength=0 no query facets are returned, but facet filtering is still active. With facetvalueslength=0 and no facet filter defined the query faceting is completely disabled, resulting in slightly better query performance. The values are sorted by the frequency of the appearance of the value within the indexed documents matching the query in descending order.
&facetvalueslength=5
Facet values filter: If there are many distinct values per facet field, we might want to filter the returned values to those matching a given prefix.
&facetvaluesfilter={"language":"ger","brand":"a"}
Facet filter: Search results are filtered to documents matching specific values in the facet fields.
The filter parameter filters the returned results to those documents both matching the query AND matching for all (boolean AND) stated facet filter fields at least one (boolean OR) of the stated values.
If the query is changed and/or the facet filter is changed then both search results and facet counts are changed.
&filter={"language":"german","brand":["apple","google"]}
&filter={"price":[3000,5000]} //range filter: 3000..5000, both boundaries belong to range
&filter={"price":[0,3000]} //range filter: 0..3000
&filter={"price":3000} //range filter: >= 3000
&filter={"price":3000,null} //range filter: >= 3000
&filter={"price":[null,3000]} //range filter: <= 3000
Sort field and order: Search results are sorted by the specified facet field, either in ascending or descending order.
If no sort field is specified, then the search results are sorted by rank in descending order per default.
If there are multiple sort field specified, then the results are ordered by the first field, and then by the second field.
&sort={"price":"asc","language":"asc"}
Numeric range facet definition: For range facets, we have to explicitly define the ranges we want to distinguish and count.
The ranges can be defined arbitrarily. Different bucket widths are allowed within the same range facet field. The lower boundary is defined explicitly (including the number itself). The upper boundary is defined implicitly by the following boundary (excluding the number itself) or the maximum value allowed for the numeric type.
The ranges may be defined differently for each query. This allows to define facets ranges like last hour=current date - 1 hour, last day=current date - 24 hours, last week=current date - 168 hours, last month=current date - 5040 hours, where the meaning of last hour changes dynamically, when you look at the absolute value of the date of the indexed documents you want to include in the results.
&ranges={"price":[{"0 - 10":0.0},{"10 - 20":10.0},{"20 - 50":20.0},{"50 - 100":50.0}]}The number of distinct ranges per range facet field is limited to 10, excess ranges are ignored.
Query facets response object: Query facets with counts for all distinct values are returned in the response body.
For all facet fields, it is counted how often each distinct value occurs within all indexed documents matching the query. If the query is not empty, then sorted/capped query facets are returned. If the query is empty, then sorted/capped index facets are returned (like with get index).
Within each facet field, the values are sorted by their occurrence count within all indexed documents matching the query in descending order. Range & value facet counting depends both on the query and on the applied range & string facet filtering.
"facets":{ "language":[{"value":"German","count":4},{"value":"Russian","count":3}], "category":[{"value":"economy","count":4},{"value":"science","count":2}], "author":[{"value":"Jules Verne","count":4},{"value":"Stanislaw Lem","count":3}], "brand":[{"value":"Coca Cola","count":3},{"value":"Pepsi","count":3}]}