Search Big Data with MySQL and Sphinx Mindaugas Žukas www.ivinco.com
Agenda Big Data Architecture Factors and Technologies MySQL and Big Data Sphinx Search Server overview Case study: building a Big Data search engine System Overview Web Layer Architecture Data Store Architecture Distributed Search Engine Architecture System monitoring overview
Big Data Architecture: Technologies There are many different technologies for data storage and analytics No one tool is the right fit for all Big Data problems More than 50% of Big Data projects fail With Big Data and High Load projects people invent their own bicycles everyday.. And it is OK* * Consult with your doctor Big Data experts in your circle
Big Data Architecture: Factors Data characteristics (volume, velocity, variety, quality, complexity etc.) Database workloads (I/O patterns, OLTP/Analytics/mixed, Real-time/Batch, etc.) Planned use cases Available hardware choices and capabilities (cloud, commodity hardware, state of the art hardware) Business requirements, System quality attributes (accuracy, efficiency, scalability, reliability, availability, maintainability, security etc.) Team size and experience level Budget constraints Time constraints many more
MySQL and Big Data
MySQL and Big Data Why MySQL? Open Source Actively developed and supported Very popular, large community Many developers familiar with MySQL Easier to hire experts
MySQL and Big Data Typical use cases: Hadoop (or other) data store + MySQL for aggregated data, reports Sharded MySQL used as a Big Data store
What is Sharding? All data in a single MySQL database Data distributed over a number of MySQL databases with the same structure One state of the art server Number of commodity servers
MySQL Sharding To shard or not to shard? You can keep quite large amounts of data (multi-tb) having high performance in MySQL without sharding Sharding adds complexity But if you really have large-scale DB growth plans (Big Data), sharding may be the only option (not just with with MySQL) Sharding methods By ID range, By hash, By function, Look-up table Using modern tools like Oracle MySQL Fabric, ScaleBase, Vitess, Jetpants
Sphinx Search Server
Sphinx Overview Implements advanced search features Enables apps respond faster Scalable to billions of documents Quick to learn. Easy to use. Simple to maintain.
Why Sphinx? Speed 10x-1000x faster than built-in search (MySQL, Postgres, MS SQL, Oracle..) Real-time indexes Feature-rich search Relevancy, synonyms, stopwords, index multiple languages, use 3 rd party linguistic libraries etc. Scalable Aggregates search results from thousands of boxes (largest known installation 1200+ boxes) 300M queries per day on Craigslist.org Built-in High Availability / Load balancing Easy to integrate SphinxQL Great documentation Easy learning curve
Why Sphinx? Boolean search AND OR NOT: hello world hello & world hello -world Per-field search @title hello, @body world Field combination @(title, body) hello world Search within first N words @body[50] hello Phrase search hello world Per field relevancy ranking weights Proximity search hello world ~10 Word Distance hello NEAR/10 world Quorum matching GEO distance search (with syntax for mi/km/m) Add attributes to the index and use WHERE, ORDER, GROUP for integers, floats, strings Many more
3. Fetch docs by ID Sphinx At Work Application 1. Search Query 2. Search results (IDs) Sphinx daemon Database Sphinx indexer Sphinx index
Sphinx Advanced Indexing Application 1. Search Query 3. Search results (IDs) Sphinx forwarder Re-index often, Only new records and updates Big Database Re-index once in a while all database, Reset delta index Delta index Main index 2. Query all indexes and aggregate results
Sphinx Indexing (Character Level) charset_table define what characters matter: Use ranges: a..z, U+410..U+42F Char mapping: A->a, A..Z->a..z ngram_chars indexing hieroglyphs as separate tokens: 我 喜 歡 iphone, 這 是 一 個 偉 大 的 手 機 ngram_chars = U+3000..U+2FA1F 我 喜 歡 iphone 這 是 一 個 偉 大 的 手 機
Sphinx Indexing (Word Level) Stopwords on, a, the, my search on my site = search on a site search on the site search my site Exceptions and wordforms U.S.A. => USA U.S. => USA US => USA United States => USA Vitamin a The Matrix AT&T => AT&T Stemming (does => do)
Sphinx And Big Text Sphinx can use 3 rd party tools to work with Big Text : Chinese phrase ( I like the iphone, it is a great phone ) 我 喜 歡 iphone, 這 是 一 個 偉 大 的 手 機 Using ngram_chars indexing hieroglyphs as separate tokens: ngram_chars = U+3000..U+2FA1F 我 喜 歡 iphone 這 是 一 個 偉 大 的 手 機 Using Basis Rosette linguistics technology: 我 喜 歡 iphone 這 是 一 個 偉 大 stopword( 的 ) 手 機
Sphinx Lithuanian Stemming Example select * from LTtest where match('lietuvai'); +------+ id +------+ 1 2 3 4 +------+ select * from LTtest where match('žemės'); +------+ id +------+ 1 2 3 +------+
Case Study: Building a Big Data Search Engine
Some Stats MySQL stores 120TB of compressed data and growing (tens of billions of text documents) Incoming data up to 5,000 new docs/s Data indexing latency under 5 minutes System uses 200+ different servers (~half of it for HA/redundancy) Up to 25,000 queries per second on main MySQL DB server API responses vary from small result sets with a few documents to tens of megabytes of result data
The Architecture - Factors Prerequisites High performance High Load High Availability Scalability (keep up with fast-growing data and usage) Near real-time (Low-latency) Feature-rich, quality search (multi-language, boolean, relevancy, synonyms etc.) Efficiency and maintainability (unreasonably small budget, small team, commodity hardware) Lots of structured data (Forums, Blogs, Comments, News, Twitter etc.)
The Architecture - Technologies Main technologies CentOS PfSense Squid Apache Percona MySQL Server PHP Java Memcached RabbitMQ Kafka Sphinx Search Server (+ Basis Rosette Linguistics)
Building a Big Data Search Engine The Web Layer Data Flow Technologies
Web Application (Data flow) 1. User search query Web layer 4. Search results Firewall Load balancers / Cache Application web servers 2. Run search query and get matching doc IDs Data store Search engine Data collection MySQL clusters Data DB clusters Main DB Indexing Indexers Indexers Sphinx Clusters
Web Layer (Technologies) pfsense 1 Failover pfsense 2 Load Balancing Squid 1 Squid 2 Squid 3 Squid 4 Cache Cache Cache Cache Load Balancing Web 1 Web 2 Web 3 Web 4 Web N Logs MySQL Data Store MySQL Main DB Memcached Distributed Sphinx Search Index Analytics/Mo nitoring
Building a Big Data Search Engine The Data Store Structure Sharding High-availability and Backups Data loading
MySQL Data Store MySQL (Percona Server) Sharding: Main DB cluster + distributed data storage clusters Currently stores more than 120TB Scalability (scale out / scale up) High Availability (MySQL Replication; Percona Replication Manager)
MySQL Data Store Structure Forum Data Group Different data types stored in separate Data groups (e.g. Blogs, Twitter, Forums etc.) Within Data groups, data is splitted into a number of Shards (MySQL databases) DB cluster 1 DB cluster 2... Twitter Data Group DB cluster 1 DB cluster 2... DB cluster N DB cluster N Main DB highavailability cluster System service data and sharding meta data Shards are distributed over a number of DB Clusters Sharding/routing information is stored in Main DB or defined algorithmically (hash) Data Group N DB Cluster 1 DB Cluster 2 DB Cluster 3... DB Cluster N Shard 1 Shard 2 Shard 15 Shard 16 Shard 17 Shard 25 Shard 26 Shard 27 Shard 38 Shard X Shard Y Shard Z
MySQL Data Store High-availability And Backups Each DB cluster consists of three servers to ensure high availability {DataType}dbN-1 {DataType}dbN-2 {DataType}dbN-3 Master A Slave A Backup A Slave B Backup B Master B Backup C Backup Master C Replication Slave C Copy Big Backup Archive When 1 server is down, failover is automatic, when 2 servers are down we can manually enable backup instance to ensure availability.
MySQL Main DB High-availability Percona Replication Manager (PRM) agents running on all servers MainDB Master PRM agent Replication MainDB Slave 1 PRM agent MainDB Slave 2 Application PRM agent
MySQL Main DB High-availability When Master goes down PRM agents on the Slaves make instant decision on who of them will become new Master. On failover maindbmaster VIP gets assigned to a Slave which becomes a new Master, application just keeps using maindbmaster MainDB Master PRM agent MainDB Slave 1 PRM agent Application Writes to maindbmaster MainDB Master (ex-slave 2) PRM agent
MySQL - Data Loading Incoming data XML files Data store group Data store group Kafka RabbitMQ Other data sources Multi-process Multi-process Loaders Multi-process Loaders Loaders Data store group Shard 0 Shard 1 Shard 255 Logs Rejected data Multi-process Loaders: - validate the data - inserts data into the proper DB shards - Having many shards we can write large amounts of data in parallel
Building a Big Data Search Engine The Search Engine SE Summary Distributed Index Architecture Dive Into Indexing Configuration Centralized Indexer and HA
Sphinx Search Index Summary Sphinx Search index is distributed across Search Engine Clusters. 100% automated centralized data indexing High availability High Scalability (scale up and scale out)
Sphinx Distributed Index Architecture Web Server N Forums Sphinx Forwarder Blogs Sphinx Forwarder X Sphinx Forwarder Forum Search Engine Group Forum SE01 Blogs Search Engine Group Blogs SE01 Forum SE02 Blogs SE02...... Forum SE-N Blogs SE-N Application uses Sphinx Forwarders on Web servers to run the search queries for different data groups The index in each data group is split into a number of Search Nodes distributed over a number of SE boxes (e.g. blogsse01, blogsse02 etc.). Each box has a pair with the same Search Nodes for High Availability Search Engine Group X xse01 Node 1 Node 2 Node 3 Node 4 xse02 Node 5 Node 6 Node 7 Node 8 xse03 Node 9 Node 10 Node 11 Node 12... xse-n Node N Node N Node N Node N
Sphinx Distributed Index Architecture Forum SE Group / ForumSE01 Each Search Node serves index for several DB Shards. ForumSE01-2 ForumSE01-1 Each server has a pair with the same Search Nodes to ensure high availability. Node 4 Node 3 Node 2 1 Node 4 Node 3 Node 2 1 Both servers can be used by Sphinx with automatic load balancing. Data store / Forum Data group DB Cluster 1 DB Cluster 2 DB Cluster 3... DB Cluster N [Shard X] Shard X Shard X Shard X [Shard X] Shard X Shard X Shard X [Shard X] [Shard X] Shard X Shard X
Sphinx Single Index Node Structure SE Group X -> SE Cluster SE01 -> Node 1 DELTA DELTA-week [daily] - Re-index few times per minute - Takes a few seconds - Re-index at midnight - Takes a few minutes - Resets delta index indexing index inc_node1'... collected 6216 docs, 2.9 MB sorted 1.2 Mhits, 100.0% done total 6216 docs, 2910165 bytes total 1.014 sec, 2869062 bytes/sec, 6128.20 docs/sec indexing index w eek_node1'... collected 1279665 docs, 622.7 MB sorted 261.6 Mhits, 100.0% done total 1279665 docs, 622726249 bytes total 434.410 sec, 1433495 bytes/sec, 2945.74 docs/sec DELTA-3month [weekly] - Re-index every Sunday, - Takes 45 minutes - Resets other delta indexes indexing index 3month_node1'... collected 8839041 docs, 4293.2 MB sorted 1883.2 Mhits, 100.0% done total 8839041 docs, 4293164850 bytes total 4686.063 sec, 916155 bytes/sec, 1886.24 docs/sec MAIN - Re-index all node once in a month or on demand - Takes several hours - Resets all indexes indexing index big_node1'... collected 45609788 docs, 19276.2 MB sorted 2985.2 Mhits, 100.0% done total 45609788 docs, 19276157566 bytes total 11271.542 sec, 1710161 bytes/sec, 4046.45 docs/sec
Sphinx Centralized Indexing Indexer01 Blogs SE Group -> BlogSE01 Jobs for BlogSE01-node1 Build index 1 Build index 2 Build indexes Indexer Worker Indexer Worker BlogSE01-1 Build index N Indexer Worker Get data from MySQL Node 1 Index 1 Index 2 Index N Node 4 Jobs for BlogSE01-node2 Jobs for BlogSE01-node3 Jobs for BlogSE01-node4 Jobs for BlogSE02 Jobs Jobs for for BlogSE02 BlogSE02 Jobs for BlogSE-N Convert to XML enrich, normalize, process, etc. Feed to Sphinx indexer Validate index Copy index to destination boxes BlogSE01-2 Node 1 Index 1 Index 2 Index N Node 4
Building a Big Data Search Engine System Monitoring Main tools Best practices
System Monitoring Over 12,000 service checks Main tools: Nagios - Monitoring, Alerts Zabbix - Monitoring, Charts Pingdom - Availability, Responsiveness OpsGenie (alt: Pagerduty) - Alert escalation, calls/notifications VividCortex - DB monitoring, performance analysis ThousandEyes - Network monitoring
System Monitoring With distributed system good instrumentation is vital log as much as you can and link logs entries by ids so you can track every query for the request (web request -> sphinx/mysql) Watch P95/P99 performance Continuously improve monitoring, incident escalation and alerting systems Do incident post mortem analysis
Take-aways Carefully consider all your Big Data project factors before choosing tools for your architecture Be not afraid to experiment and invent your own wheel MySQL scales well and can be a dependable tool to store large amounts of structured data Sphinx Search is a powerful search engine that: can scale to thousands of servers can be used with any data sources (directly or via XML stream) can be useful not only for search, but also for speeding up analytics tasks MySQL/Sphinx allowed us to build a successful scalable system that makes large amounts of data searchable and operates under high load
Thank You For Your Attention! Questions? mz@ivinco.com