Solr Cloud vs Replication



Similar documents
BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

GEM Network Advantages and Disadvantages for Stand-Alone PC

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Building Heavy Load Messaging System

A Performance Analysis of Distributed Indexing using Terrier

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Session Storage in Zend Server Cluster Manager

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

A Brief Outline on Bigdata Hadoop

Frequently Asked Questions

Eloquence Training What s new in Eloquence B.08.00

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Foundations for your. portable cloud

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR A REMOTE TRACING FACILITY FOR DISTRIBUTED SYSTEMS

Cloud Computing and Advanced Relationship Analytics

ADRA MATCH CLOUD-BASED SOFTWARE

24/7 Monitoring Pro-Active Support High Availability Hardware & Software Helpdesk. itg CloudBase

Introduction to Cloud Computing

Simplify Suite in a VDI Environment

Mark Bennett. Search and the Virtual Machine

Search and Real-Time Analytics on Big Data

Enabling Cloud Computing for Enterprise Web Applications:

Data Solutions with Hadoop

Welcome to Remote Access Services (RAS)

THE WINDOWS AZURE PROGRAMMING MODEL

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Cloud Computing. Chapter 8 Virtualization

Post Genie TM WebMail Server 2400/2208R

Using a Java Platform as a Service to Speed Development and Deployment Cycles

Serving 4 million page requests an hour with Magento Enterprise

High Availability Solutions for the MariaDB and MySQL Database

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Market Maturity. Cloud Definitions

ZooKeeper. Table of contents

Configuring Failover

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Application Performance Testing Basics

Managed Web Hosting & Application Maintenance Package

Training Proposal for Network OSS Training Project

Application Brief: Using Titan for MS SQL

Directions for VMware Ready Testing for Application Software

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Your simple guide to cloud computing

Container-based Network Function Virtualization for Software-Defined Networks

Protecting your Data in a New Generation Virtual and Physical Environment

Computer Science. About PaaS Security. Donghoon Kim Henry E. Schaffer Mladen A. Vouk

Mary E. Shacklett President Transworld Data

Contents. SnapComms Data Protection Recommendations

9 Tried and Tested Tips to Increase the Power of your Magento Store

Event Manager. LANDesk Service Desk

AVLOR SERVER CLOUD RECOVERY

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

VIRTUALIZATION. While not a new concept virtualization has hit the main stream over the last few

IT infrastructure and user interface: The Galaxy architecture and ARIES cluster

Epimorphics Linked Data Publishing Platform

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Jeffrey D. Ullman slides. MapReduce for data intensive computing

DESYcloud: an owncloud & dcache update

Time series IoT data ingestion into Cassandra using Kaa

Saf April Saf Helping your business reach further with hosted at UK based, ISO 27001, Tier 4 data centres.

Kernel comparison of OpenSolaris, Windows Vista and. Linux 2.6

Overview. The Cloud. Characteristics and usage of the cloud Realities and risks of the cloud

ILMT Central Team. Performance tuning. IBM License Metric Tool 9.0 Questions & Answers IBM Corporation

Client Requirement. Why SharePoint

JBOSS ENTERPRISE APPLICATION PLATFORM MIGRATION GUIDELINES

Apache HBase. Crazy dances on the elephant back

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Offline Scanner Appliance

Integrating Active Directory Federation Services (ADFS) with Office 365 through IaaS

Achieving business benefits through automated software testing. By Dr. Mike Bartley, Founder and CEO, TVS

End-to-end management

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

preliminary experiment conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

High Availability Databases based on Oracle 10g RAC on Linux

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Deployment Topologies - DPAdmin An isoagroup Product

Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Chapter 15 Windows Operating Systems

MakeMyTrip CUSTOMER SUCCESS STORY

RESEARCH NOTE CYBER-ARK FOR PRIVILEGED ACCOUNT MANAGEMENT

Parallels IaaS and virtualization roadmap. Amir Sharif, VP Virtualization and IaaS James Bottomley, CTO, Server Virtualization and IaaS

Project Proposal. Data Storage / Retrieval with Access Control, Security and Pre-Fetching

Transcription:

Solr Cloud vs Replication vs

Solr Cloud implementation Single point of failure Single point of failure Data Sources 4 x Solr Servers (Windows) 3 x Zookeeper Servers (Linux) Load Balancer Server (Mule - Windows) Web Server (Windows) Internet Users

Solr Cloud implementation Solr Cloud working is supposed to provide a fault-tolerant, high availability system, providing a stable platform for blazing fast fulltext and faceted searches.

Solr Cloud implementation Solr Cloud working is supposed to provide a fault-tolerant, high availability system, providing a stable platform for blazing fast full-text and faceted searches. A Solr Cloud implementation would typically consist of (x) number of Solr servers, (x 1) number of Zookeeper servers and a load balancer.

Solr Cloud implementation Solr Cloud working is supposed to provide a fault-tolerant, high availability system, providing a stable platform for blazing fast full-text and faceted searches. A Solr Cloud implementation would typically consist of (x) number of Solr servers, (x 1) number of Zookeeper servers and a load balancer. The Zookeepers deal with the replication of configuration files and indexes across the Solr server ensemble and report the server state to the load balancer.

Looks good in theory BUT Solr Cloud implementation

Solr Cloud implementation Looks good in theory BUT We had three environments, Integration (development), Staging (testing) and Live. All three environments were different though

Solr Cloud implementation Looks good in theory BUT We had three environments, Integration (development), Staging (testing) and Live. All three environments were different though The Live environment had FOUR Solr servers, THREE Zookeepers and a Mule. The Staging environment had TWO Solr servers, ONE Zookeeper and a Mule. The Integration environment had ONE Solr server, NO Zookeeper and a Mule.

Solr Cloud implementation Looks good in theory BUT We had three environments, Integration (development), Staging (testing) and Live. All three environments were different though The Live environment had FOUR Solr servers, THREE Zookeepers and a Mule. The Staging environment had TWO Solr servers, ONE Zookeeper and a Mule. The Integration environment had ONE Solr server, NO Zookeeper and a Mule. The Live environment suffered from a maintenance issue (the C-drive on the Master Solr server ran out of space). This brought the entire system down! We had to switch Live to the simplified Staging environment to provide service to our users.

Solr Cloud implementation Looks good in theory BUT We had three environments, Integration (development), Staging (testing) and Live. All three environments were different though The Live environment had FOUR Solr servers, THREE Zookeepers and a Mule. The Staging environment had TWO Solr servers, ONE Zookeeper and a Mule. The Integration environment had ONE Solr server, NO Zookeeper and a Mule. The Live environment suffered from a maintenance issue (the C-drive on the Master Solr server ran out of space). This brought the entire system down! We had to switch Live to the simplified Staging environment to provide service to our users. The Zookeeper notion was unreliable, with indexes getting out of synchronisation daily, hence the maintenance overhead was huge in order to simply try and provide a stable and reliable service for our users.

Time for a rethink Solr Cloud implementation

Time for a rethink What do we need? Reliability Search results should be consistent. Solr Cloud implementation

Solr Cloud implementation Time for a rethink What do we need? Reliability Search results should be consistent. Low Maintenance Apart from routine checks a Solr environment should be fairly maintenance free, with errors encountered being reported by email.

Solr Cloud implementation Time for a rethink What do we need? Reliability Search results should be consistent. Low Maintenance Apart from routine checks a Solr environment should be fairly maintenance free, with errors encountered being reported by email. Scalability The ability to grow or shrink the number of servers in an environment should be made as simple as possible. One should be able to easily change which is the master server and which are to be the slaves.

Solr Cloud implementation Time for a rethink What do we need? Reliability Search results should be consistent. Low Maintenance Apart from routine checks a Solr environment should be fairly maintenance free, with errors encountered being reported by email. Scalability The ability to grow or shrink the number of servers in an environment should be made as simple as possible. One should be able to easily change which is the master server and which are to be the slaves. Easily Configurable A simple system for uploading the Solr configuration files across the different environments, with the ability of allowing different behaviours for each Core within them.

Single point of failure Master Server Data Sources Slave Servers 3 x Solr Servers (Windows) Web Server & Load Balancer (Windows) Internet Users

Advantages All environments (Integration, Staging, Live & QA (a new one)) use the same layout and number of servers.

Advantages All environments (Integration, Staging, Live & QA (a new one)) use the same layout and number of servers. All the hardware specifications are the same across all servers in all environments.

Advantages All environments (Integration, Staging, Live & QA (a new one)) use the same layout and number of servers. All the hardware specifications are the same across all servers in all environments. Everything now runs on the Windows OS (no more Linux and no need for bespoke Java coding).

Advantages All environments (Integration, Staging, Live & QA (a new one)) use the same layout and number of servers. All the hardware specifications are the same across all servers in all environments. Everything now runs on the Windows OS (no more Linux and no need for bespoke Java coding). The Mule has been replaced by an in-house produced load balancer.

Advantages All environments (Integration, Staging, Live & QA (a new one)) use the same layout and number of servers. All the hardware specifications are the same across all servers in all environments. Everything now runs on the Windows OS (no more Linux and no need for bespoke Java coding). The Mule has been replaced by an in-house produced load balancer. The single points of failure have been reduced to just one, which is in the actual Web Server and is where the load balancer now resides.

Advantages All environments (Integration, Staging, Live & QA (a new one)) use the same layout and number of servers. All the hardware specifications are the same across all servers in all environments. Everything now runs on the Windows OS (no more Linux and no need for bespoke Java coding). The Mule has been replaced by an in-house produced load balancer. The single points of failure have been reduced to just one, which is in the actual Web Server and is where the load balancer now resides. Everything is now easily configurable, scalable, almost maintenance-free and above all is ultra-reliable.

Disadvantages A slight delay, barely noticeable in a Web UI, may be experienced if a request to a given Solr server fails and the load balancer has to redirect the request to the other available server.

So what s so different in Solr Replication? Indexing is performed on a master server.

So what s so different in Solr Replication? Indexing is performed on a master server. As there is no Zookeeper component to replicate the indexes to the other servers, the onus is now on the slave servers to poll the master periodically and update their indexes from the master copies. This is a built-in feature of Solr and the location of the master server and the polling time is easily configurable.

So what s so different in Solr Replication? Indexing is performed on a master server. As there is no Zookeeper component to replicate the indexes to the other servers, the onus is now on the slave servers to poll the master periodically and update their indexes from the master copies. This is a built-in feature of Solr and the location of the master server and the polling time is easily configurable. Unlike the Mule in Cloud working, the extremely lightweight in-house load balancer does not know the state of the servers in the Solr ensemble. It knows the last server it tried so for the next query it will try another server it knows about. If it is unable to process the query then it will switch to another one, until there are no more servers to try, whereupon it will send an error email to concerned parties (ADM).

Scalablity? We normally perform Solr searches on the two slave servers. This is configured in the load balancer and each Solr server in the environment.

Scalablity? We normally perform Solr searches on the two slave servers. This is configured in the load balancer and each Solr server in the environment. We could however put the master server as a server used for Solr searches too, if required.

Scalablity? We normally perform Solr searches on the two slave servers. This is configured in the load balancer and each Solr server in the environment. We could however put the master server as a server used for Solr searches too, if required. We could easily add in another slave server if required, it s all in the configuration files.

Configuration There is a new in-house tool that has been built for easily changing and uploading configuration files:

Configuration Environments, master servers, Cores and replication polling times can easily be configured:

Configuration Progress and event logging is presented clearly:

Configuration Solr configuration files are under source control.

Configuration Solr configuration files are under source control. The solr.properties file, that should be present in each Core, controls the replication timeout for each Core. This is altered by the configuration application to the required timings (hours, minutes, seconds), and which server in the environment is to be the master.

Configuration Solr configuration files are under source control. The solr.properties file, that should be present in each Core, controls the replication timeout for each Core. This is altered by the configuration application to the required timings (hours, minutes, seconds) ), and which server in the environment is to be the master. The in-house load balancer must be configured separately and told what Solr servers to point at.

Configuration Solr configuration files are under source control. The solr.properties file, that should be present in each Core, controls the replication timeout for each Core. This is altered by the configuration application to the required timings (hours, minutes, seconds) ), and which server in the environment is to be the master. The in-house load balancer must be configured separately and told what Solr servers to point at. The Web application should be pointed at the load balancer.

Conclusion Solr Replication implementation