Search API features
Web-scale full-text search
Index billion documents and return results for thousand concurrent users within milliseconds.
Indexing is easy, search is easy - for a few documents and a single user. Such search engine prototype can be hacked together within a day.
But both combined, fast indexing and fast searching, with low RAM consumption, low disk consumption, high concurrency, multi tenancy, with faceting, field search, unlimited fields, and unlimited field sizes, in realtime, with unlimited scaling, with milliseconds latency is hard to achieve.
SeekStorm's novel database and index architecture ist extremely fast (single-digit milliseconds), allowing Real-time indexing and instant search for a virtually unlimited index size (up to 100 TB). More on SeekStorms index architecture
True real-time search
With real-time search & indexing, at the same millisecond a document is indexed it is instantly searchable in real-time. In contrast to conventional Near real-time search (NRT), SeekStorm offers a True real-time search with zero soft-commit delays and without sacrificing indexing performance.
SeekStorm search API can be used to implement instant search, where results are instantly searched and displayed, even while you are still typing the seach query. SeekStorm uses an extremely fast spelling correction and query completion to achieve this complex within sub-millisecond latency.
SeekStorm provides an automatic spelling correction for queries. We use SymSpell, the Symmetric Delete spelling correction algorithm developed by us to achieve 1 million times faster spelling correction & fuzzy search compared to other algorithms.
SeekStorm provides an auto-completion for queries. For incomplete queries (e.g. while typing) SeekStorm provides a list of suggestions with the most likely matching queries.
The dynamic, self-learning auto-completion dictionary is automatically compiled during indexing from all indexed fields. The dictionary is updated in real-time with every newly indexed document. To ensure the best possible relevance and recency the dictionary continiously identifies the top-k most frequent terms from the possibly unlimited stream of index data while using only limited RAM.
For each index there is a separate dictionary created, ensuring language independence, support of user and domain-specific vocabulary, privacy and newly occuring terms.
SeekStorm supports automatic query rewriting. If the query is corrected or completed and the instant parameter is set to true, then the query is automatically rewritten and replaced with the best matching suggestion und results are immediately returned for the new corrected query.
When enabled, key text ectraction Remove ads and navigation elements from the fetched web page before storing and indexing. This increases the relevance of the search results, indexing speed, query speed and reduces the index size. Keytext extraction is enabled with keytextOnly=true in Create crawljob via the REST API.
Sometimes the title of a web page contains the same repetitive string on each page, like in "Crawler - Wikipedia". SeekStorm is able to detect and remove those strings. In this example "- Wikipedia" would be removed, while "Crawler" would remain and rewriten as title of this page in the index. Title rewriting improves both ranking and perceived relevance. Title rewriting is enabled with keytextOnly=true in Create crawljob via the REST API.
Field search allows you to restrict the search within specified fields of the indexed documents. SeekStorm allows you to combine full-text search and field search in different fields within a single query, to refine and narrow down your search results and increase their relevancy to you.
Besides full-text search SeekStorm is able to restrict the search to specific fields, e.g. to titel, URL or domain, author or product category, within the indexed JSON documents. But faceted search is not just a filter, but a complete set of filters is automatically generated/derived from the indexed documents.
Faceted search allows counting, filtering and sorting of documents matching specific facet field values that occurs with the indexed documents. Faceted search is often used in product search to narrow down, count and sort the search results according to specific product features like brand, manufacturer, rating, or price. Lern more about faceted search.
The query parameter group allows the grouping of the returned results into a nested structure, where all results with the same field value are grouped together and limited to a number specified by the length property.
Often results are dominated by results with the same value in a specific field, while the existing diversity is hidden in the long tail and not immediately visible. E.g. all search results are from the same popular domain or all shirts are from the same company or color.
The group parameter allows enforcing variety in the results, by limiting results with the same value or the same range for a given facet field to a number specified by the length property. When length is set to 1 then all results are distinct for the specified field value. The field property defines for which field the results with the same field value or range will be grouped together and limited or distinct. Together with the order property it also defines the sort order of the groups. The length property defines how many results for the same field value will be returned. The groups are sorted by the field value of the group (which is identical for all results within the same group), while the sort order between groups is defined by the order property. The results within a group are sorted as defined by the query parameter sort.
Value groups: If the field is of field type stringFacet then a group is created for each distinct value occuring in that field. E.g. group per domain, company, country, color, name.
Range groups: If the field is of field type integerFacet, floatFacet or dateFacet AND ranges are defined for that field, then a group is created for each defined range section. E.g. group per price range, by sallary range, per data range.
Define sort order of results at query time by rank and facet fields.
Combine (tie-break) key word matching rank and any number of facet fields
Promote documents for all or selected keywords and pin them above all other search results.
Often users don't know exactly how certain foreign words, brands, or product names are spelled. Instead of forcing the user to trial and error, SeekStorm assist them with fuzzy search and spelling correction, that instantly delivers results, even for misspelled or incomplete queries.
We use SymSpell, the Symmetric Delete spelling correction algorithm developed by us to achieve 1 million times faster spelling correction & fuzzy search compared to other algorithms.
Fused search results
SeekStorm enables the aggregation of information and data from different sources, different crawl jobs. It also allows adding auxiliary fields and values (e.g. genre='comedy', product-category='grocery', language='english') to the JSON document created for every crawled document. Those fields will then be returned together with the search results.
SeekStorm search API supports the following field filters: e.g. intitle, intext, inurl, site, allintitle, allinurl, allintext, and additionally for each field defined in the index there is an aptly named field filter available.
SeekStorm search API supports the following boolean search operators: AND, NOT, PHRASE, Implicit PHRASE
StemmingSingular and plural forms of nouns are folded together during indexing and search. Searching for "cars" will return documents containing "car" and vice versa. Verbs are not stemmed as for the average use case that would introduce too much noise and degrade the search result precision.
Stemming is can be enabled or disabled during index creation and is currently supported for English and German.
SeekStorm is language independent, in crawling, indexing, searching, spelling correction, and query completion. SeekStorm also supports Chinese word segmentation to index Chinese content.
Keyword in Context summaries (KWIC)
For search not only relevancy and speed matters. It is also the representation of search results that heavily influences the perceived relevance and how fast and effortless the user can find those results that matter most to him within the returned search result pages (SERP).
Highlighting the keywords from the query within the search results contributes significantly to the perceived relevancy and saves valuable time when evaluating search results.
Crawl and index thousand documents per second
Scoped API Keys
Full and query-only API keys for different scope of usage.
Powerful, yet concise API allows to rapidly implement search into you products.
Supports Cross-origin ressource sharing (CORS). CORS allows the Browser to request the API from a different domain outside the domain from which the enclosing web page was served.