π₯SeekStorm 3.0 adds vector search & hybrid search

SeekStorm uses two separate, first-class, native index architectures, under one roof.
- Lexical search: sharded and leveled inverted index.
- Vector search: sharded and leveled IVF index for ANN or exhaustive search.
- Shared document store, shared document ID space.
- Both first-class engines are integrated at the query planner level.
- Query planner with QueryModes (Lexical, Vector, Hybridβ¦) and FusionTypes (RRF, β¦).
Vector Features
- Multi-Vector indexing: both from multiple fields and from multiple chunks per field.
- Integrated inference from any text document field or Import external embeddings.
- Variable dimensions and precisions: f32, i8.
- Scalar Quantization (SQ).
- Multiple similarity measures: Cosine similarity, Dot product, Euclidean distance.
- Chunking that respects sentence boundaries and Unicode segmentation for multilingual text.
- K-medoid clustering with actual data points as centers.
- Field filters are active during vector search, not just as post-search filtering step.
- True real-time search
- disk-based billion-scale vector search
- Sub-millisecond search for 1 million vectors on a laptop, CPU only:
Sift1M recall@10=95%, 0.2 ms recall@10=99%, 0.3 ms

Dual Engine Architecture for Hybrid Search
- Internally, SeekStorm uses two separate, first-class, native index architectures for vector search and keyword search. Two native cores, not just a retrofit, add-on layer.
- SeekStorm doesnβt try to make one index do everything. It runs two native search engines and lets the query planner decide how to combine them.
- Two native index architectures under one roof:
- Lexical search: an inverted index optimized for lexical relevance,
- Vector search: an ANN index optimized for vector similarity.
- Both are first-class engines, integrated at the query planner level.
- Query planner with dedicated QueryModes and FusionTypes
- Query planner mode can be automatically or manually selected.
- Active QueryModes mode is returned for explainability, relatability and credibility.
- Separate internal index, storage layouts, indexing, search, scoring, top-k candidates - unified query planner and result fusion (Reciprocal Rank Fusion - RRF).
- But the user is fully shielded from the complexity, as if it was only a single index.
- Enables pure lexical, pure vector or hybrid search (exhaustive, not only re-ranking of preliminary candidates).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
ββββββββββββββββββββββ
β User / API β
β (hybrid query) β
βββββββββββ¬βββββββββββ
β
βΌ
ββββββββββββββββββββββ
β Query Planner β
β (intent + strategy)β
βββββββββ¬ββββββββ¬βββββ
β β
ββββββββββββββββ ββββββββββββββββ
βΌ βΌ
ββββββββββββββββββββββ ββββββββββββββββββββββ
β Lexical Engine β β Vector Engine β
β Inverted Index β β Native ANN Index β
β (BM25 / Boolean) β β (LeveledβIVF) β
βββββββββββ¬βββββββββββ βββββββββββ¬βββββββββββ
β β
βΌ βΌ
Ranked Results L Ranked Results V
β β
βββββββββ¬ββββββββββββββββ¬ββββββββββ
βΌ βΌ
ββββββββββββββββββββββββββββββ
β Result Fusion β
β (RRF / rerank strategies) β
β β
ββββββββββββββ¬ββββββββββββββββ
βΌ
Final Ranked Results
Leveled IVF index (vector)
- Disk-based, Leveled IVF index for unlimited index size.
- Sharded index for lock-free utilization of all processor cores.
- true real-time indexing and search capable.
- Approximate Nearest Neighbor Search (ANNS) and exhaustive k-nearest neighbor search (kNN)
- K-Medoid clustering: PAM (Partition Around Medoids) with actual data points as centers.
Benchmark vector search

- 1 million vectors, 128 dimensions, f32 precision
- nprobe=16 -> recall@10=95%, average latency=188 microseconds
- nprobe=33 -> recall@10=99%, average latency=302 microseconds
- See CHANGELOG.md for details.
Repository
SeekStorm: vector & lexical search - in-process library & multi-tenancy server, in Rust.
SeekStorm is open-source, under the Apache-2.0 license, available at our GitHub repository.
