Wolf Garbe
Wolf Garbe CEO and co-founder of SeekStorm

πŸ”₯SeekStorm 3.0 adds vector search & hybrid search

πŸ”₯SeekStorm 3.0 adds vector search & hybrid search

SeekStorm uses two separate, first-class, native index architectures, under one roof.

  • Lexical search: sharded and leveled inverted index.
  • Vector search: sharded and leveled IVF index for ANN or exhaustive search.
  • Shared document store, shared document ID space.
  • Both first-class engines are integrated at the query planner level.
  • Query planner with QueryModes (Lexical, Vector, Hybrid…) and FusionTypes (RRF, …).

Vector Features

  • Multi-Vector indexing: both from multiple fields and from multiple chunks per field.
  • Integrated inference from any text document field or Import external embeddings.
  • Variable dimensions and precisions: f32, i8.
  • Scalar Quantization (SQ).
  • Multiple similarity measures: Cosine similarity, Dot product, Euclidean distance.
  • Chunking that respects sentence boundaries and Unicode segmentation for multilingual text.
  • K-medoid clustering with actual data points as centers.
  • Field filters are active during vector search, not just as post-search filtering step.
  • True real-time search
  • disk-based billion-scale vector search
  • Sub-millisecond search for 1 million vectors on a laptop, CPU only:
    • Sift1M recall@10=95%, 0.2 msrecall@10=99%, 0.3 ms

Server info & vector search results

  • Internally, SeekStorm uses two separate, first-class, native index architectures for vector search and keyword search. Two native cores, not just a retrofit, add-on layer.
  • SeekStorm doesn’t try to make one index do everything. It runs two native search engines and lets the query planner decide how to combine them.
  • Two native index architectures under one roof:
    • Lexical search: an inverted index optimized for lexical relevance,
    • Vector search: an ANN index optimized for vector similarity.
  • Both are first-class engines, integrated at the query planner level.
    • Query planner with dedicated QueryModes and FusionTypes
    • Query planner mode can be automatically or manually selected.
    • Active QueryModes mode is returned for explainability, relatability and credibility.
  • Separate internal index, storage layouts, indexing, search, scoring, top-k candidates - unified query planner and result fusion (Reciprocal Rank Fusion - RRF).
  • But the user is fully shielded from the complexity, as if it was only a single index.
  • Enables pure lexical, pure vector or hybrid search (exhaustive, not only re-ranking of preliminary candidates).


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚     User / API     β”‚
                        β”‚   (hybrid query)   β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚    Query Planner   β”‚
                        β”‚ (intent + strategy)β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                                β”‚       β”‚
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       └──────────────┐
                 β–Ό                                     β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ Lexical Engine     β”‚            β”‚ Vector Engine      β”‚
        β”‚ Inverted Index     β”‚            β”‚ Native ANN Index   β”‚
        β”‚ (BM25 / Boolean)   β”‚            β”‚ (Leveled‑IVF)      β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚                                 β”‚
                  β–Ό                                 β–Ό
          Ranked Results L                    Ranked Results V
                  β”‚                                 β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β–Ό               β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       Result Fusion        β”‚
                    β”‚ (RRF / rerank strategies)  β”‚
                    β”‚                            β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                        Final Ranked Results

Leveled IVF index (vector)

  • Disk-based, Leveled IVF index for unlimited index size.
  • Sharded index for lock-free utilization of all processor cores.
  • true real-time indexing and search capable.
  • Approximate Nearest Neighbor Search (ANNS) and exhaustive k-nearest neighbor search (kNN)
  • K-Medoid clustering: PAM (Partition Around Medoids) with actual data points as centers.

Benchmark

  • 1 million vectors, 128 dimensions, f32 precision
  • nprobe=16 -> recall@10=95%, average latency=188 microseconds
  • nprobe=33 -> recall@10=99%, average latency=302 microseconds

SIFT1M dataset

Benchmark code

Repository

SeekStorm: vector & lexical search - in-process library & multi-tenancy server, in Rust.
SeekStorm is open-source, under the Apache-2.0 license, available at our GitHub repository.

Rating: