Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh Luu, PhD Senior Search Engineer local.ch AG
In this talk Requirements of an autosuggestion feature Autosuggestion architecture Evaluation
local.ch Local search engine in Switzerland (web, mobile) Each month: > 4 millions unique users > 8 millions queries on mobile (ios, android, ) Users search for: Services (e.g restaurant zurich ) Resident information (e.g toan luu ) Phone number (e.g. 079574xxyy) Addresses, weather,...
Why autosuggestion is important? User taps on the phone 8 times instead of 34 times to get to the result list when searching for Electric installation Wallisellen
What should we suggest to user?
Popular data suggestion
Popular queries suggestion mc donalds has less entries than muller but is queried >10x >2000 queries/month for cablecom which have only 1 entry
Query history suggestion 9% mobile queries are historical queries. 38% users search by a query in the past
Spellchecker suggestion >700 000 mistakes per month on mobile (9%)
Detail entry suggestion
Special information suggestion
Autosuggestion Architecture Autosuggest API/Search API SuggestData component Spellchecker component Popular query component Query history component Index Index Index Index Local.ch Database Popular query processor Index Query log
Data suggestion Pre-generating suggested queries from the data Entry: Name: Subito Category: Restaurant Street: Konradstrasse Zipcode: 8005 City: Zürich Possible suggested queries: Restaurant Subito Restaurant Zürich Restaurant Subito Restaurant Subito Zürich Konradstrasse, 8005 Zürich Zürich
Compute data popularity Use faceting to get suggested queries sorted by frequency This approach guarantees near real-time suggestion Suggested queries are copied to 2 fields: Search field used for matching, apply analyzers, tokenizer Facet field used for displaying and for computing frequency Example: q=restaurant zu* => suggest Restaurant Zürich q=zurich restau* => suggest Restaurant Zürich
Improvement Faceting is expensive for short prefix match queries Store suggested results in a Cache for all queries with 1, 2 characters Filter duplicated suggestion Restaurant Subito and Restaurant Subito Zürich is 1 entity if they have same frequency => keep only 1 suggestion Store location, language with suggested queries to filter out irrelevant suggestion to user.
How do we process popular queries Popular is just not high frequency! Depend on user s language 4 languages are used in Switzerland. Fail if we suggest bäckerei for a French speaking user Depend on location Fail if we suggest a hospital in Zurich for an user in Geneva Misspell Fail if we suggest zürich and züruch Number of unique users Fail if we suggest toan just because I searched my name thousands of times Blacklist Fail if we suggest f**k, pe**is
Popular query processor Preprocessing query log: Text normalization, stopword, blacklist, keep only queries return results A query log item in elasticsearch index { "q": "restaurant", "language": "de", "lon": 8.50646, "lat": 47.4192, "datetime": "2014-06-02 11:10:07, "user": eeaad0c09abc41676c1c99530693
Find candidate popular queries for each language { "query" : { "query_string" : { "query" : "language:" + language, "facets" : { "q" : { "terms" : { "field" : "q.untouched", "size" : TOP_POPULAR
Find number of unique users given a query { "query" : { "query_string" : { "query" : "q.untouched:" + query, "aggs": { "num_users": { "cardinality": { "field": "user"
Bounding box to limit popular queries given location 300 250 200 150 100 50 90% Popular query: Chuv (Centre Hospitalier Universitaire Vaudois) 0 5.95 6.05 6.15 6.25 6.35 6.45 6.55 6.65 6.75 6.85 6.95 7.05 7.15 7.25 7.35 7.45 7.55 7.65 7.75 7.85 7.95 8.05 8.15 8.25 8.35 8.45 8.55 8.65 8.75 8.85 8.95 9.05 9.15 9.25 9.35 9.45 9.55 9.65 9.75 9.85 9.95 10.05 10.15 10.25 10.35 10.45
47.77 47.7 47.63 47.56 47.49 47.42 47.35 47.28 47.21 47.14 47.07 47 46.93 46.86 46.79 46.72 46.65 46.58 46.51 46.44 46.37 46.3 46.23 46.16 46.09 46.02 45.95 45.88 45.81 Histogram of query chuv based on freq, longitude and latitude 5.95 6.04 6.13 6.22 6.31 6.4 6.49 6.58 6.67 6.76 6.85 6.94 7.03 7.12 7.21 7.3 7.39 7.48 7.57 7.66 7.75 7.84 7.93 8.02 8.11 8.2 8.29 8.38 8.47 8.56 8.65 8.74 8.83 8.92 9.01 9.1 9.19 9.28 9.37 9.46 9.55 9.64 9.73 9.82 9.91 10 10.09 10.18 10.27 10.36 10.45
46.52,6.63 46.5243,6.6397 46.53,6.64
Percentiles aggregation to find min, max value of querying location "query" : { "match" : {"q" : {"query" : chuv, "aggs" : { "lat_outlier" : { "percentiles" : { "field" : "lat", "percents" : [5, 95], "lon_outlier" : { "percentiles" : { "field" : "lon", "percents" : [5, 95]
Popular query stored in Solr index { "q": "chuv", "lang": ["de,"fr, "en ], "users": 7435, "min_lat": 46.2245, "max_lon": 7.3332, "max_lat": 46.9909, "min_lon": 6.29637, "freq": 9524
Solr request to suggest popular query q:ch* lang:en users: [100 TO *] min_lat:[* TO " + user_lat + "] min_lon:[* TO " + user_lon + "] max_lat:[" + user_lat + " TO *] max_lon:[" + user_lon + " TO *] & sort=freq desc
Evaluation Several metrics are used to evaluate autosuggestion feature Number of typed characters to get to result list Average length of input: 10.0 chars Average length of suggestion: 15.4 chars Number of clicks on suggested items Average rank of clicked item
Number of clicks on suggested items since query history release Release date
2.5 Average rank of clicked item 2 1.5 1 0.5 Release query history suggestion 0
Conclusion Requirement of an autosuggestion feature: reduces number of user s interactions with your application to get search result. We can combine 2 search frameworks to bring better search experience to user: Solr is efficient for querying, faceting and caching Elasticsearch is efficient for big data aggregation and query log storing
Contact information Search team at local.ch toan.luu@localsearch.ch cesar.fuentes@localsearch.ch pascal.chollet@localsearch.ch