Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search



Similar documents
Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

The Search API in Drupal 8. Thomas Seidl (drunken monkey)

RESCO MOBILE CRM QUICK GUIDE. for MS Dynamics CRM. ios (ipad & iphone) Android phones & tablets

Eurex Mobile App Overview on functionalities

Performance Tuning for the Teradata Database

Using Apache Solr for Ecommerce Search Applications

ifinder ENTERPRISE SEARCH

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications

Search Big Data with MySQL and Sphinx. Mindaugas Žukas

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

BIOCHEM Database Application System. Archival Account Database Dictionary. Edited January 2006

Information Retrieval Elasticsearch

Kaseya 2. User Guide. Version 1.0

TIBCO Spotfire Metrics Modeler User s Guide. Software Release 6.0 November 2013

5 Mistakes to Avoid on Your Drupal Website

Unifying Search for the Desktop, the Enterprise and the Web

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Oracle Database 12c: Performance Management and Tuning NEW

Powering Monitoring Analytics with ELK stack

DOCUMENT REFERENCE: SQ EN. SAMKNOWS SMARTPHONE-BASED TESTING SamKnows App for Android White Paper. March 2014

Student service user guide Version 1.0, July 2013

DOCUMENT REFERENCE: SQ EN. SAMKNOWS SMARTPHONE-BASED TESTING SamKnows App for Android White Paper. May 2015

Contents. Meltwater Quick-Start Guide

QAD Business Intelligence Dashboards Demonstration Guide. May 2015 BI 3.11

Twitter and Natural Disasters Peter Ney

What s New with Search in Alfresco 5. Mike Farman Alfresco Product Manager Andy Hind Alfresco Senior Engineer

SEA START Climate Change Analysis Tool v1.1

1 Energy Data Problem Domain. 2 Getting Started with ESPER. 2.1 Experimental Setup. Diogo Anjos José Cavalheiro Paulo Carreira

MCTS Guide to Microsoft Windows 7. Chapter 10 Performance Tuning

Privacy Policy I. THE INFORMATION WE COLLECT AND HOW WE USE IT

Effective Performance Tuning Oracle Applications

mdata from Mobile Commons enables organizations to make any data accessible to the public via text message, no programming required.

Below is a table called raw_search_log containing details of search queries. user_id INTEGER ID of the user that made the search.

WebSphere Commerce Overview for Vector IBM Corporation

Relution Enterprise App Store. Mobilizing Enterprises. 2.6 Release Note

Database Administration with MySQL

DBMS / Business Intelligence, SQL Server

Things Made Easy: One Click CMS Integration with Solr & Drupal

SAP Business Objects BO BI 4.1

Integrating Phone Validation into Marketo

Google Analytics and Google Analytics Premium: limits and quotas

Web Dashboard. User Manual. Build

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

WORLD WEATHER ONLINE

OECD.Stat Web Browser User Guide

SQL Performance for a Big Data 22 Billion row data warehouse

Integrating Verification and Hygiene into Marketo

Full-text Search in Intermediate Data Storage of FCART

Load Imbalance Analysis

SQL SELECT Query: Intermediate

T R A D I N G S O L U T I O N S

Using Database Performance Warehouse to Monitor Microsoft SQL Server Report Content

Chorus Tweetcatcher Desktop

PayLess: A Low Cost Network Monitoring Framework for Software Defined Networks

Site Management Abandoned Shopping Cart Report Best Viewed Products Report Control multiple websites and stores from one

Note : It may be possible to run Test or Development instances on 32-bit systems with less memory.

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Searching the Social Network: Future of Internet Search?

MT Search Elastic Search for Magento

Beginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition

SAS REPORTS ON YOUR FINGERTIPS? SAS BI IS THE ANSWER FOR CREATING IMMERSIVE MOBILE REPORTS

Find, track, pipeline, and manage your highly-skilled talent.

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Create Beautiful Reports with AWR Cloud and Prove the Value of Your SEO Efforts

WhatsUp Gold v16.1 Wireless User Guide

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

The New Dataroom Center Administration in. Brainloop Secure Dataroom Service Version Administrative Guide for Dataroom Center Managers

Quick Start for Network Agent. 5-Step Quick Start. What is Network Agent?

Spryng Making Business Mobile (0) Spryng Headquarters: Herengracht BW Amsterdam - The Netherlands

Network Agent Quick Start

Mobile App Framework For any Website

Big Data for Satellite Business Intelligence

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Rackspace Cloud Databases and Container-based Virtualization

Privacy guide. How to manage and protect your data. Instructions. Privacy guide

SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization. MicroStrategy World 2016

Improve performance and availability of Banking Portal with HADOOP

Connecting Software Connect Bridge - Mobile CRM Android User Manual

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

How To Manage Your Locations On Facebook On Facebook

Apache Lucene. Searching the Web and Everything Else. Daniel Naber Mindquarry GmbH ID 380

Forumbee Single Sign- On

Ad Hoc Advanced Table of Contents

PORTLANDDIOCESE.ORG - How to Connect Table of Contents

CSSEA Helpdesk User Guide

The features of GEPARD Ecommerce includes:

Transcription:

Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh Luu, PhD Senior Search Engineer local.ch AG

In this talk Requirements of an autosuggestion feature Autosuggestion architecture Evaluation

local.ch Local search engine in Switzerland (web, mobile) Each month: > 4 millions unique users > 8 millions queries on mobile (ios, android, ) Users search for: Services (e.g restaurant zurich ) Resident information (e.g toan luu ) Phone number (e.g. 079574xxyy) Addresses, weather,...

Why autosuggestion is important? User taps on the phone 8 times instead of 34 times to get to the result list when searching for Electric installation Wallisellen

What should we suggest to user?

Popular data suggestion

Popular queries suggestion mc donalds has less entries than muller but is queried >10x >2000 queries/month for cablecom which have only 1 entry

Query history suggestion 9% mobile queries are historical queries. 38% users search by a query in the past

Spellchecker suggestion >700 000 mistakes per month on mobile (9%)

Detail entry suggestion

Special information suggestion

Autosuggestion Architecture Autosuggest API/Search API SuggestData component Spellchecker component Popular query component Query history component Index Index Index Index Local.ch Database Popular query processor Index Query log

Data suggestion Pre-generating suggested queries from the data Entry: Name: Subito Category: Restaurant Street: Konradstrasse Zipcode: 8005 City: Zürich Possible suggested queries: Restaurant Subito Restaurant Zürich Restaurant Subito Restaurant Subito Zürich Konradstrasse, 8005 Zürich Zürich

Compute data popularity Use faceting to get suggested queries sorted by frequency This approach guarantees near real-time suggestion Suggested queries are copied to 2 fields: Search field used for matching, apply analyzers, tokenizer Facet field used for displaying and for computing frequency Example: q=restaurant zu* => suggest Restaurant Zürich q=zurich restau* => suggest Restaurant Zürich

Improvement Faceting is expensive for short prefix match queries Store suggested results in a Cache for all queries with 1, 2 characters Filter duplicated suggestion Restaurant Subito and Restaurant Subito Zürich is 1 entity if they have same frequency => keep only 1 suggestion Store location, language with suggested queries to filter out irrelevant suggestion to user.

How do we process popular queries Popular is just not high frequency! Depend on user s language 4 languages are used in Switzerland. Fail if we suggest bäckerei for a French speaking user Depend on location Fail if we suggest a hospital in Zurich for an user in Geneva Misspell Fail if we suggest zürich and züruch Number of unique users Fail if we suggest toan just because I searched my name thousands of times Blacklist Fail if we suggest f**k, pe**is

Popular query processor Preprocessing query log: Text normalization, stopword, blacklist, keep only queries return results A query log item in elasticsearch index { "q": "restaurant", "language": "de", "lon": 8.50646, "lat": 47.4192, "datetime": "2014-06-02 11:10:07, "user": eeaad0c09abc41676c1c99530693

Find candidate popular queries for each language { "query" : { "query_string" : { "query" : "language:" + language, "facets" : { "q" : { "terms" : { "field" : "q.untouched", "size" : TOP_POPULAR

Find number of unique users given a query { "query" : { "query_string" : { "query" : "q.untouched:" + query, "aggs": { "num_users": { "cardinality": { "field": "user"

Bounding box to limit popular queries given location 300 250 200 150 100 50 90% Popular query: Chuv (Centre Hospitalier Universitaire Vaudois) 0 5.95 6.05 6.15 6.25 6.35 6.45 6.55 6.65 6.75 6.85 6.95 7.05 7.15 7.25 7.35 7.45 7.55 7.65 7.75 7.85 7.95 8.05 8.15 8.25 8.35 8.45 8.55 8.65 8.75 8.85 8.95 9.05 9.15 9.25 9.35 9.45 9.55 9.65 9.75 9.85 9.95 10.05 10.15 10.25 10.35 10.45

47.77 47.7 47.63 47.56 47.49 47.42 47.35 47.28 47.21 47.14 47.07 47 46.93 46.86 46.79 46.72 46.65 46.58 46.51 46.44 46.37 46.3 46.23 46.16 46.09 46.02 45.95 45.88 45.81 Histogram of query chuv based on freq, longitude and latitude 5.95 6.04 6.13 6.22 6.31 6.4 6.49 6.58 6.67 6.76 6.85 6.94 7.03 7.12 7.21 7.3 7.39 7.48 7.57 7.66 7.75 7.84 7.93 8.02 8.11 8.2 8.29 8.38 8.47 8.56 8.65 8.74 8.83 8.92 9.01 9.1 9.19 9.28 9.37 9.46 9.55 9.64 9.73 9.82 9.91 10 10.09 10.18 10.27 10.36 10.45

46.52,6.63 46.5243,6.6397 46.53,6.64

Percentiles aggregation to find min, max value of querying location "query" : { "match" : {"q" : {"query" : chuv, "aggs" : { "lat_outlier" : { "percentiles" : { "field" : "lat", "percents" : [5, 95], "lon_outlier" : { "percentiles" : { "field" : "lon", "percents" : [5, 95]

Popular query stored in Solr index { "q": "chuv", "lang": ["de,"fr, "en ], "users": 7435, "min_lat": 46.2245, "max_lon": 7.3332, "max_lat": 46.9909, "min_lon": 6.29637, "freq": 9524

Solr request to suggest popular query q:ch* lang:en users: [100 TO *] min_lat:[* TO " + user_lat + "] min_lon:[* TO " + user_lon + "] max_lat:[" + user_lat + " TO *] max_lon:[" + user_lon + " TO *] & sort=freq desc

Evaluation Several metrics are used to evaluate autosuggestion feature Number of typed characters to get to result list Average length of input: 10.0 chars Average length of suggestion: 15.4 chars Number of clicks on suggested items Average rank of clicked item

Number of clicks on suggested items since query history release Release date

2.5 Average rank of clicked item 2 1.5 1 0.5 Release query history suggestion 0

Conclusion Requirement of an autosuggestion feature: reduces number of user s interactions with your application to get search result. We can combine 2 search frameworks to bring better search experience to user: Solr is efficient for querying, faceting and caching Elasticsearch is efficient for big data aggregation and query log storing

Contact information Search team at local.ch toan.luu@localsearch.ch cesar.fuentes@localsearch.ch pascal.chollet@localsearch.ch