Search as a service architecture

SeekStorm architecture — what’s different?

Highly integrated approach

To achieve the best scalability and performance while keeping operational costs (and prices) at a minimum, we had to build everything from scratch: database, search server, crawler. Only by getting full control of every byte processed, transferred, and stored, we were able to strictly optimize for search, performance, and efficiency, getting rid of boundaries between different packages and jettisoning unnecessary ballast.

Because SeekStorm utilizes the infrastructure more efficiently, we can pass on the lower operational cost and give you more performance at a lower price.

Novel Log-structured merge-tree database

To achieve low latency under high concurrent load, for billions of documents, in a multi-tenant architecture, we had to rethink search-specific database architecture and take the Log-structured merge-tree (LSM-tree) to the next level.

We built a resilient, reliable and highly parallelized architecture with uniform term-partitioned sharding, n-fold double buffering, zero-copy, lock-free synchronization, thread-per-core architecture optimized for high throughput, short average- and tail-latencies for high-concurrency and web-scale scenarios.

Private information retrieval (PIR)

SeekStorm is comitted to adress the biggest challenge in search-as-a-service: privacy and security.

From a security standpoint, a cloud service provider (CSP) has to be seen as untrusted and a potential adversary - even if cloud service providers do everything to protect the customer data. But cloud services offer also many benefits: turn-key, low time-to-market, low development and operational cost, no inhouse expertise required, low project risk.

The only way to reconcile risk and benefit is private information retrieval (PIR), either via Searchable encryption (SE) or via Homomorphic encryption (HE).

The core idea is to encrypt everything at the client, transfer it encrypted to the server where it stays encrypted all the time. The search is performed with encrypted queries on encrypted data, so it is not necessary to trust the cloud service provider. It simply doesn't matter whether it is compromised or an adversary or not. The results are returned encrypted to the client, where they are decrypted for the first time since the documents left the client.

The permanent end-to-end encryption does not only protect the privacy of the information, it protects also against the manipulation of the information.

There are different approaches to achieve Private information retrieval (PIR), differing in practicability, information leakage, and performance and storage overhead:

While fully-homomorphic encryption and oblivious RAM provide perfect protection against information leakage, they are highly inefficient performance-wise.

For SeekStorm, to build an integrated architecture from scratch with full control of every byte processed, transferred, and stored, was the precondition for implementing Private information retrieval (PIR) based on Searchable encryption (SE), we are currently working on.

Searchable encryption (SE) allows indexing and searching of encrypted data without decrypting it first. The data are never decrypted on the severer and never disclosed as plain-text to the server at any time. Searchable encryption (SE) allows privacy-preserving storing, indexing, and searching of documents, while utilizing the benefits cloud service providers (CSP) offer.

Searchable encryption (SE) eliminates attack vectors and risks posed by staff, intrusion, hacking, misconfiguration, or surveillance. This allows data to be encrypted and out-sourced to third-party cloud environments for processing, all while encrypted. In highly regulated industries, such as health care, finance and defense, searchable encryption removes privacy barriers previously inhibiting data sharing and outsourcing.

A search as a service with Searchable encryption (SE) offers a higher level of security than a standard unencrypted in-house/on-premises search and the same level of security as an encrypted in-house/on-premises search (which as of today is not available as a standard solution).

The encryption client will be open-source and open to security audits.

End-to-end encryption

End-to-end encryption prevents any intermediate to see the unencrypted content. In the case of search as a service, the search server is an intermediate. The intermediate search as a service server should not see the unencrypted documents, queries, and search results at any time. But still, the search server has to index the documents, which are encrypted by the user and arrive encrypted at the search server, which doesn't see them unencrypted at any time. When the user sends an encrypted query, the search server has to find those documents which match the encrypted query, which isn't known unencrypted by the server at any time. The search server then returns the matching encrypted results, which aren't known unencrypted by the server at any time. The user then decrypts the results for the first time since he sent the encrypted documents to the server to be indexed. End-to-end encryption will be achieved by Searchable encryption (SE), facilitating search on encrypted data while preventing the intermediate Search as a service accessing the plain-text information at any time.

High-level architecture diagram

SeekStorm Search as a Service architecture with search API and crawl API