Search API features
Web-scale full-text search
Index billion documents and return results for thousand concurrent users within milliseconds.
Indexing is easy, search is easy - for a few documents and a single user. Such search engine prototype can be hacked together within a day.
But both combined, fast indexing and fast searching, with low RAM consumption, low disk consumption, high concurrency, multi tenancy, with faceting, field search, unlimited fields, and unlimited field sizes, in realtime, with unlimited scaling, with milliseconds latency is hard to achieve.
With SeekStorm real-time search & indexing, at the same millisecond a document is indexed it is instantly searchable in real time.
SeekStorm search API can be used to implement instant search, where results are instantly searched and displayed, even while you are still typing the seach query. SeekStorm uses an extremely fast spelling correction and query completion to achieve this complex within sub-millisecond latency.
SeekStorm provides an automatic spelling correction for queries. We use SymSpell, the Symmetric Delete spelling correction algorithm developed by us to achieve 1 million times faster spelling correction & fuzzy search compared to other algorithms.
SeekStorm provides an auto-completion for queries. For incomplete queries (e.g. while typing) SeekStorm provides a list of suggestions with the most likely matching queries. The completion dictionary is automatically compiled during indexing from all indexed fields. The dictionary is updated in real-time with every newly indexed document. For each index there is a separate dictionary created, ensuring language independence, support of user and domain-specific vocabulary, and privacy.
SeekStorm supports automatic query rewriting. If the query is corrected or completed and the instant parameter is set to true, then the query is automatically rewritten and replaced with the best matching suggestion und results are immediately returned for the new corrected query.
When enabled, key text ectraction Remove ads and navigation elements from the fetched web page before storing and indexing. This increases the relevance of the search results, indexing speed, query speed and reduces the index size. Keytext extraction is enabled with keytextOnly=true in Create crawljob via the REST API.
Sometimes the title of a web page contains the same repetitive string on each page, like in "Crawler - Wikipedia". SeekStorm is able to detect and remove those strings. In this example "- Wikipedia" would be removed, while "Crawler" would remain and rewriten as title of this page in the index. Title rewriting improves both ranking and perceived relevance. Title rewriting is enabled with keytextOnly=true in Create crawljob via the REST API.
Field search allows you to restrict the search within specified fields of the indexed documents. SeekStorm allows you to combine full-text search and field search in different fields within a single query, to refine and narrow down your search results and increase their relevancy to you.
Besides full-text search SeekStorm is able to restrict the search to specific fields, e.g. to titel, URL or domain, author or product category, within the indexed JSON documents. But faceted search is not just a filter, but a complete set of filters is automatically generated/derived from the indexed documents.
Faceted search allows counting and filtering of documents matching specific facet field values that occurs with the indexed documents. Faceted search is often used in product search to narrow down and count the search results according to specific product features like brand, manufacturer, rating, or price. Lern more about faceted search.
Often users don't know exactly how certain foreign words, brands, or product names are spelled. Instead of forcing the user to trial and error, SeekStorm assist them with fuzzy search and spelling correction, that instantly delivers results, even for misspelled or incomplete queries.
We use SymSpell, the Symmetric Delete spelling correction algorithm developed by us to achieve 1 million times faster spelling correction & fuzzy search compared to other algorithms.
Fused search results
SeekStorm enables the aggregation of information and data from different sources, different crawl jobs. It also allows adding auxiliary fields and values (e.g. genre='comedy', product-category='grocery', language='english') to the JSON document created for every crawled document. Those fields will then be returned together with the search results.
SeekStorm search API supports the following field filters: e.g. intitle, intext, inurl, site, allintitle, allinurl, allintext, and additionally for each field defined in the index there is an aptly named field filter available.
SeekStorm search API supports the following boolean search operators: AND, NOT, PHRASE, Implicit PHRASE
SeekStorm is language independent, in crawling, indexing, searching, spelling correction, and query completion. SeekStorm also supports Chinese word segmentation to index Chinese content.
Keyword in Context summaries (KWIC)
For search not only relevancy and speed matters. It is also the representation of search results that heavily influences the perceived relevance and how fast and effortless the user can find those results that matter most to him within the returned search result pages (SERP).
Highlighting the keywords from the query within the search results contributes significantly to the perceived relevancy and saves valuable time when evaluating search results.