Wolf Garbe
Wolf Garbe CEO and co-founder of SeekStorm

Document iterator and search with empty query string

Document iterator and search with empty query string

SeekStorm introduces a new Document Iterator API and support search with empty query strings.

Document iterator

The Document iterator API lets you iterate over all document IDs—and documents—in the entire index, forward or backward. It enables efficient sequential access to every document, even in very large indexes, without running a search.

The iterator guarantees that only valid document IDs are returned, even though document IDs are not strictly continuous. You can also fetch IDs in batches, reducing round trips and significantly improving performance, especially when using the REST API.

In SeekStorm, document IDs become continuous over time (“eventually continuous”). In a multi-sharded index, each shard maintains its own document ID space. Because documents are distributed across shards in a non-deterministic, load-dependent way, shard-local document IDs advance at different rates. When these are mapped to global document IDs, temporary gaps can appear.

As a result, simply iterating from 0 to the total document count may encounter invalid IDs near the end. The new Document ID Iterator API abstracts this complexity and reliably returns only valid document IDs.

Iterator parameter:

  • include_deleted returns also deleted document ID.
  • include_document returns documents along document IDs.
  • fields specifies which document fields to return.

Typical use cases include index export, conversion, analytics, audits, and inspection.

Search with empty query:

Added search features:

  • enable_empty_query: iterator across all indexed documents
    • query parameters supported for empty queries: offset, length, query_facets, facet_filter, result_sort
  • Two special fields: ` _score ` and ` _id `
    • sort ascending, or descending (i.e. by relevance or modification date)
    • for search with and without query
    • special fields are automatically injected into result documents.
    • If result_sort is used with any other than the special field _id, or query_facets or facet_filter are used, then a min-heap consumes RAM proportional to offset + length.
      Per default or if sort field _id is specified, the consumed RAM is proportional to length.

Typical use cases include facet filtering and sorting of all documents in the index.

Rating: