Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search
|
|
- Clementine Parks
- 8 years ago
- Views:
Transcription
1 Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search
2 Table of Contents Introduction... 3 Why Search?... 3 General Search Requirements... 3 Traditional Deployment Strategies for Search... 4 What is DataStax Enterprise?... 4 What is DSE Search?... 6 Enterprise Search Support... 6 Distributed Enterprise Search... 6 Multi-Data Center and Cloud Search... 6 Always-On Search... 6 Online Elasticity... 7 Live Search... 7 Secure Search... 7 Fault-Tolerant Search... 7 Workload Isolation and Management... 7 Automatic Query Routing... 7 Solr Compatible... 7 Integration with Cassandra Query Language... 8 Integration with Spark Analytics... 8 Visual Management and Monitoring... 8 Common Use Cases... 8 Document and Message Search... 8 Master Data Management (MDM)... 8 Real-Time Search Analytics... 9 Hybrid Search/Batch Analytics... 9 DSE Search Customer Examples... 9 Clear Capital... 9 Penn Mutual... 9 Internet Identity Conclusions Appendix A Search Feature Comparison About DataStax Page: 2
3 Introduction Nearly every web, mobile and Internet-of-Things (IoT) application has search functionality that helps its users locate the information they want. Depending on the industry and need, the search requirements for an application may be very simple or astonishingly complex. While the need for search capabilities is ubiquitous in modern applications, it is surprising to find that most enterprises struggle with building high-performance, robust and cost effective search components into their business systems. Even though most database vendors have search functionality as part their platform, and various specialized search software exists in both open source and proprietary form, enterprises still wrestle with scaling their systems so they can process increasing data volumes and users, and keeping the search capabilities of their web and mobile applications always online. This paper describes the general search requirements that most web, mobile and IoT applications have, and the common ways enterprises have tried to deploy search systems in the past. It then describes how NoSQL database systems are fast becoming the standard database platform for these types of applications and why DataStax Enterprise, with its integrated enterprise search capabilities, can make developing powerful search components for an application fast, easy, and cost effective. Why Search? Whether it s a retail web application that smartly guides its user through their buying decision process or a mobile entertainment app that seems to know what each particular user wants, search technology is behind the scenes, personalizing every experience. Search functionality has become critical in every web and mobile application for helping users navigate directly to the products, services, and media they are interested in. General Search Requirements While every application is different, there are certain core search requirements that most every modern web and mobile application has. When it comes to specific search features, the most routine must-have s include the following: Directed navigation the ability to guide a user through a series of choices to find what they want, most oftentimes accomplished through faceted search and other search features like wildcarding, groupings, and more. Breadcrumbs an important part of navigation for retail sites, they help inform a user where they are within a site and assist with upward navigation. Query assistance involves a multitude of aids that assist the user in entering a search request and delivering search results that both directly and indirectly match the user s request. Assistance can be in the form of auto-completion, spell checking, synonym and acronym extensions, and providing more like this based results. Relevancy control helps handle the ranking of search results and is typically adjusted quite a bit to ensure the proper placement of products and services an enterprise wishes to promote. Spotlights part of relevancy control, it allows businesses to directly affect search queries to ensure products and services are consistently ranked high in search results. Personalization signals assists in smartly personalizing search results using data such as current user location (e.g. enabled with geospatial search) and historical search patterns. Rich document handling allows for the inspection of document contents that may be in formats such as Adobe PDF, Microsoft Word, etc. Content integration provides a blending of core search results with other integrated and related data (e.g. product reviews for products listed in search output). Analytics supplies the ability to understand events such as failed searches, conversion metrics and more. From an architectural perspective, a search subsystem needs the following to support global web and mobile applications: Page: 3
4 Continuous uptime capabilities users must be able to use an application s search component without encountering any outages or downtime of search functionality. Multi-data center and cloud support web and mobile applications are follow you everywhere in nature, and as such, it is important that a search subsystem be able to span multiple data centers and cloud zones around the globe so that search operations are fast regardless of a user s location. Scalable performance the search system must be able to grow in an online fashion to accommodate increasing data volumes and user connections, while delivering consistent performance. Further, new data should be quickly indexed and made available for search as fast as possible. Standards based interfaces the search API s should include usual and customary interfaces such as HTTP, XML, and others. Traditional Deployment Strategies for Search IT organizations learned long ago that, even though RDBMS s provided search options within their engines, search traffic caused resource contention for data and compute resources in ways that impacted the performance of transactional (OLTP) work. Because of this, in the same way that IT groups separated OLTP and analytics workloads, they began to also break out search into a different system. These mixed workload situations typically result in sharded data management environments that have separate data platforms for OLTP, analytics and search functionality, and an application that is specially coded to access each distinct vendor s data platforms. In addition, data is constantly extracted-transformed-and-loaded (ETL d) between the three data platforms as data common to OLTP, analytic, and search requests must be present on all systems. Figure 1 Typical Sharded Application Even with NoSQL databases beginning to usurp the place of RDBMS s where web, mobile, and IoT applications are concerned, the data platform sharding approach is still being utilized with only the vendor names changing (e.g. OLTP might be handled by Apache Cassandra, analytics by Apache Hadoop, and search by Apache Solr ). Most agree that, regardless of the data platforms used, sharded systems are difficult to manage and maintain and can deliver higher than expected total cost of ownership even when free open source software is used. Businesses needing to solve their mixed workload problem want an easier approach that also offers a way to quickly build robust search functionality into their applications. What is DataStax Enterprise? As previously alluded to, modern web, mobile, and IoT applications have evolved past centralized systems that made use of relational databases (RDBMS s) as their data management foundation. These modern applications require a database platform that is able to meet the scale, performance, and data distribution needs of radicallyconnected systems. Page: 4
5 Figure 2 The evolution of data-driven applications. To meet these requirements, DataStax Enterprise (DSE) delivers Apache Cassandra in database platform that meets the performance and availability demands of IoT, web, and mobile applications. It gives enterprises a secure, fast, always-on database that remains operationally simple when scaled in a single datacenter or across multiple datacenters and clouds. Figure 3 Components that make up DataStax Enterprise. DataStax Enterprise provides everything needed to deploy NoSQL and Cassandra in production environments. A certified version of Cassandra that is tested and optimized for production applications is included, along with advanced security features to protect sensitive data, management services that automatically perform key maintenance and tuning functions, advanced visual management and administration capabilities, and around-theclock expert support. DSE also solves web, mobile, and IoT application s mixed workload problem by smartly integrating analytics and search functionality into the platform along with built-in workload isolation and replication abilities that keep OLTP, analytics, and search workloads separate from one another. This functionality of DSE eliminates the need for multiple data management providers and application sharding. Page: 5
6 What is DSE Search? DataStax Enterprise supplies built-in enterprise search functionality (DSE Search) on Cassandra OLTP data in a way that is tailor-made for the search requirements of modern Web, mobile and IoT applications. Built on Apache Solr, DSE Search provides full Solr compatibility along with additional enterprise search capabilities that enable enterprises to quickly build robust search components into their systems. Including enterprise search capabilities in a DSE cluster is very easy. DSE Search can be enabled in a new or existing cluster by provisioning nodes that are devoted to search operations. For web or mobile applications that include mixed workloads involving OLTP, analytics, and search, an administrator provisions however many nodes are needed to service each workload, with those nodes operating as a distinct group of compute resources within a single cluster. Data is then automatically replicated between the OLTP, analytic, and search node so there is no need to manually ETL data between different systems and platforms. DSE Search delivers all of the features and functionality of Solr, with additional enterprise search, uptime, and performance benefits included as well. The following sections highlight the key benefits of DSE Search that make it well suited for the search requirements of today s web, mobile, and IoT applications. Enterprise Search Support DSE Search meets the general search requirements of web and mobile applications previously listed. Use cases supported by DSE search include general web, full-text, faceted (categorization), hit prioritization and highlighting, log mining, rich document (PDF, MS Word, etc.) analysis, geospatial, and social media match ups. Distributed Enterprise Search DSE Search is built for distributed web and mobile applications that need to easily search data contained in large data stores. The divide-and-conquer architecture allows for consistently fast response times across large data volumes that may be distributed across multiple machines and locations. Multi- Data Center and Cloud Search DSE Search s distributed capabilities include the capacity to run across multiple data centers and cloud availability zones in Active-Active-Active-nActive manner, which allows search operations on OLTP data to be easily carried out in different geographical regions. The key benefit is that search results can be sent back to users in those locations in the fastest possible time. DSE Search differs from other search solutions like Elasticsearch in that it allows for full write and read operations in multi-active manner, whereas products like Elasticsearch are limited by their master-slave architectures and can only support Active-Passive configurations where a master and read slave machines are used for multi-dc operations. Always- On Search DSE Search is perfect for applications that need search functionality that is always available and never goes down. DSE s always-on architecture, built on Cassandra, ensures 100% uptime for search operations. DSE Search allows a user to create multiple copies of search data across multiple nodes, data centers, and clouds so even if certain machines or data centers go down, data is always available for search tasks. Page: 6
7 Online Elasticity Additional capacity (i.e. more search nodes) can be added online so search workloads can easily scale to meet incoming data and customer demands. Live Search DSE Search contains a unique Live Indexing feature that allows new data entered into the database to be immediately available for search. Whereas typical search systems may have gaps involved between when new data enters the system and when it is ready for search operations, DSE s Live Indexing feature indexes fresh data making it quickly available for search. With Live Indexing, enabled indexing throughput doubles, and indexing throughput remains linear with CPU cores on each node. Secure Search DSE Search offers native support for user authentication, user authorization, data encryption and firewall configuration. DSE Search does not require costly external plug-ins or external security configurations. All DSE Search settings and enablements are sourced from a single configuration space. Security features enabled in DSE Search include LDAP, Active Directory, and Kerberos authentication, client-tonode and node-to-node encryption, and data auditing, which provides administrators with a view into events happening in their DSE cluster. Fault- Tolerant Search DSE Search includes options to automatically retry search queries that fail due to a node going down, with other replicas containing the same data being transparently accessed. Another option provides the ability for partial results to be returned when the use case allows for it. Workload Isolation and Management DSE Search fully supports workload isolation and management, ensuring that search workloads do not compete with OLTP or analytic workloads for data or compute resources. Cassandra s powerful replication abilities automatically copies and moves data among nodes so there is no need to extract data from transactional databases and load them into another search system. Everything is contained within one database cluster. Automatic Query Routing To help ensure fast search response times, DSE Search automatically routes a search request to the best performing replica in a cluster that holds the data needed to satisfy the request. The system takes into account a number of factors including each search node s uptime, current workload, and network distance to the user, and sends the request to the node that is able to handle the request in the most optimized manner. Solr Compatible Helping power DSE Search is a production-certified version of Apache Solr. DSE Search inherits all the power and capabilities of Solr and builds on top of it to create even more powerful enterprise search functionality. Anyone familiar with Solr can immediately begin to develop with DSE Search using the same Solr API s. Using Solr and Lucene as its foundation, DSE Search merges the ability to perform complex transactional queries with the solubility and high availability of Cassandra. In addition, DSE Search s Solr compatibility layer allows legacy search workloads to be seamlessly transferred to DataStax Enterprise with no modification to existing client code or behavior. Page: 7
8 Integration with Cassandra Query Language Search/Solr syntax is integrated with the Cassandra Query Language (CQL), which enhances CQL in a way that allows it to operate as a powerful search language. Solr syntax (e.g. a wildcard search) may be passed directly through a CQL WHERE clause so that data can be searched for via CQL in addition to the native Solr API s. The CQL language and transport is cluster aware meaning that it is aware of changes to the schema or cluster topology in real-time. If a node gets added to the cluster it will be automatically be added to the connection pool of all clients. If a node is removed for maintenance or fails it will be removed from the connection pool of all the clients minimizing high latency request timeouts or failures. Because the CQL protocol is cluster aware it is able to avoid added complexity, fragility, and the cost of load balancers. Integration with Spark Analytics DSE Search integrates with Apache Spark SparkSQL in the same way it does with CQL. Solr search syntax may be passed through a SparkSQL WHERE clause on DSE analytic nodes running Spark, which greatly expands Spark s analytic query ability and combines both search and analytic functions in one statement. Visual Management and Monitoring DSE Search functionality and operations can easily be visually provisioned, managed, and monitored with DataStax OpsCenter. Common Use Cases In addition to typical search application usage, there are a number of common use cases that benefit from DSE Search. Document and Message Search A common use case for DSE is either as a document or message store. In the instance of a document store, the documents may be all documents, or correspondence pertaining to a customer account. These records could be chat messages, correspondence, financial statements, reports, transactions, or records. This model typically has two user personas: the auditor and the end user. The auditor is concerned with all documents across all users that match a certain criteria. The end user is concerned with which of their documents match a particular key phrase, or were sent by a particular sender. In this model both the metadata for each record, and the full text body of each record is indexed. When an auditor performs a search, all records across all users can be obtained. Conversely, when a user performs a search, the result set is filtered to only records associated with the user account. Additionally, DSE Search provides additional functionality to pass locality information so that when user facing, high volume, low latency queries are performed, the fan out processing normally done by other search software is avoided and only a single node needs to participate in the query processing. Master Data Management (MDM) Master Data Management is a paradigm where a central repository contains all information about an item. An item could be simple like a t-shirt that has 30 attributes, or an item could be complex like a TV and have over 600 attributes. With MDM, it is typical that every field for an item be both indexed and searchable. Data sources are typically pulled from multiple sources such as suppliers, shippers, and the vendor s own internal processes. Typical applications include product catalog for vendors, retailers or manufacturers. The catalog could service either internal process or external customers. In both instances the focus is on a large volume of low latency queries. Page: 8
9 Real- Time Search Analytics Real-time analytics typically operates over an event stream of many small machine generated events. This could be log data from servers or marketing and revenue data from an online retailer. With real-time search analytics, the emphasis is to quickly generate a report of events, users, etc., which satisfies some conditions. Examples could include find all servers where this exception occurred, count all users which have logged on in the previous 30 days, and so on. The emphasis is on counting or identifying records which match search criteria. Hybrid Search/Batch Analytics Real-time search analytics can be limited when the process can t perform aggregations or do deeper more complicated calculations without help. To perform these types of calculations, DataStax Enterprise integrates Apache Spark, which is a perfect fit for such use cases. As mentioned earlier in this document, Spark is a batch analytics framework with advanced functionality such as graph abstractions and machine learning, and DSE s unique integration with analytics and search allows customers to initiate a batch job that uses a search query as it s source. This greatly reducing the number of records that must be processed, and thus reduces response times. Using this methodology it s possible to reduce complex batch-reporting times to seconds or possibly sub-second. DSE Search Customer Examples The following DataStax customer examples illustrate how DSE Search is being deployed in enterprise environments. Clear Capital Clear Capital is the premium provider of data and solutions for residential and commercial real estate asset valuation and collateral risk assessment for large financial services companies. Clear Capital facilitates the ordering, tracking and delivery of valuation reports by leveraging massive data sets, human-based review and automated review tools. Clear Capital currently stores valuation data for over 90% of all properties in the U.S. With the largest source of valuation data for residential and commercial properties in the U.S., Clear Capital recognized a market opportunity to develop a software application that can digest their massive database of properties to deliver highly accurate and recent valuation data quickly to financial institutions and banks. They understood relational database systems were not built to support high volumes of small transactions, linear scalability, real-time performance, and continuous availability all key requirements for their cutting-edge valuation review management software as a service platform. Clear Capital s deciding factor to go with DataStax Enterprise was DSE Search and powerful analytics abilities. A major component of Clear Capital s appraisal system is their geospatial search and analysis capability. This level of analysis allows Clear Capital to gain valuable market insights based on aggregate statistics, which feeds into the accuracy of the valuations delivered. DataStax Enterprise powers our geospatial search and delivers real time insights to our customers who rely on the most recent data to support their market-level decisions, said David Prinzing, solutions architect at Clear Capital. This was a significant reason why we chose DataStax Enterprise to power our system. Penn Mutual Penn Mutual is a life insurance and annuities company that has operated since Across almost 170 years of business, Penn Mutual has been dedicated to help people do more in life by creating solutions that deliver the complete value of life insurance across all life s stages. Page: 9
10 In 2010, Penn Mutual s Information Management and Technology Division, the IT arm of the business, started a project called Core Services aiming to merge all data domains spread throughout the company into a single source by marrying their service oriented architecture and master data management capabilities into a comprehensive system. Penn Mutual started out with a traditional RDBMS approach for the persistence layer of their Core Service, but soon realized that it could not meet their requirements for application performance or scalability with the existing RDBMS footprint without a large cost commitment. Penn Mutual chose DataStax Enterprise for their MDM system, with a prime motivator being DSE Search. DSE Search allows Penn Mutual to offer traditional data access services and ad-hoc query to create more data discovery type applications, with the end result being improved ability to find information and pull reports. Internet Identity Internet Identity (IID) is a cyber security company that provides the platform to easily exchange cyber threat intelligence between enterprises and governments. Fortune 500 companies and large government agencies leverage IID to detect and mitigate threats. With high volumes of multi-structured data pulled from a wide array of sources, IID struggled to keep up with the ongoing flow of malicious threats due to their reliance on legacy relational database technology. They made a move to NoSQL and chose DataStax Enterprise in part because of DSE Search. DSE Search allows IID to easily search and index data, while DSE s integrated analytics allow them to quickly identify security threats and deal with abnormal behavior. IID s CTO remarked, The fact that DataStax integrated three core elements with an operations console on top of it that allows us to monitor and measure was enormous. And I have all of this along with the scalability and availability of Apache Cassandra, advanced security capabilities and 24x7 support all for one-fifth the price that I would pay for a relational database. Conclusions DataStax Enterprise with DSE Search provides everything modern Internet Enterprises need to build scalable and high-performance search capabilities into their Web, mobile, and IoT applications. For more resources and downloads of DataStax Enterprise, visit today. Appendix A Search Feature Comparison This section provides a general feature comparison between DSE Search, Apache Solr, and Elasticsearch. INDEXING DSE Search Solr ES Schema Creation (Schema- less) Yes Yes Yes CJK Support Yes Yes Yes Partial Document Updates Yes Yes Yes (Atomic) Live Indexing Yes No No SEARCH/QUERY Query Syntax CQL + JSON (Solr format) key/value pair based using / and () to delineate and nest queries. JSON Page: 10
11 Distributed Group By/Collapse by Yes Yes Yes field Full Text Search Yes Yes Yes Geospatial queries Yes (Solr) Yes Yes Field Types and analyzers Yes Yes Yes Query Rescore Yes Yes Yes Auto- Mapping Yes Yes Yes Query time join Yes Yes Yes Deep paging Yes Yes Yes Per- segment filters Yes No Yes Multi- threaded queries Yes No No Auto- retry Yes No No Distributed Search Distributed Queries Yes Yes Yes Node Discovery Cassandra Requires ZooKeeper Zen Discovery Coordination Cassandra Requires ZooKeeper Self-contained Automatic Shard Rebalancing Cassandra No Yes OPERATIONS Continuous Availability Yes No No Integration of analytics and Yes No No search workloads Data Resiliency Yes No No Shard Splitting Cassandra Ring No need to reindex Requires reindexing when adding more shards Linear scaling Yes Yes Limited Durability Yes NA NA Partition Tolerance Yes Need ZooKeeper Split-Brain Consistency Eventual (tunable) Sync Sync / Async Admin Interface Yes Yes Yes Fine grained memory statistics Yes No No Multi- data center write/read anywhere Yes No No SECURITY Authentication System Support Yes No No (LDAP) Encrypted Communications Yes No No Audit Logging Yes No No Kerberos Yes No No API Format XML, CSV, JSON XML, CSV, JSON JSON Page: 11
12 HTTP REST API Yes (Through Solr Yes Yes API) Binary API Yes Yes Yes JMX Support Yes Yes No Drivers Comprehensive Solr4J Comprehensive Output Solr JSON output Solr JSON output JSON in, JSON out OTHER Analytics Yes Through 3rd Party Through 3rd Party Integration with Spark analytics Yes No Partial Manageability OpsCenter Admin UI Marvel About DataStax DataStax delivers Apache Cassandra in a database platform purpose built for the performance and availability demands of Web, Mobile, and IOT applications, giving enterprises a secure always-on database that remains operationally simple when scaled in a single datacenter or across multiple datacenters and clouds. DataStax has more than 500 customers in 38 countries including leaders such as Netflix, Rackspace, Pearson Education, and Constant Contact, and spans verticals including web, financial services, telecommunications, logistics, and government. Based in San Mateo, Calif., DataStax is backed by industry-leading investors including Lightspeed Venture Partners, Meritech Capital, and Crosslink Capital. Page: 12
The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success
The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success 1 Table of Contents Abstract... 3 Introduction... 3 Requirement #1 Smarter Customer Interactions... 4 Requirement
More informationHow Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
More informationIntroduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise
Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 The Growth in Multiple
More informationComplying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric
Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric Table of Contents Table of Contents... 2 Overview... 3 PIN Transaction Security Requirements... 3 Payment Application
More informationIntroduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER
Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER By DataStax Corporation August 2012 Contents Introduction...3 The Growth in Multiple Data Centers...3 Why
More informationDon t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers
Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers White Paper BY DATASTAX CORPORATION AUGUST 2013 Table of Contents
More informationBig Data: Beyond the Hype. Why Big Data Matters to You. White Paper
Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You
More informationBig Data: Beyond the Hype
Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Big Data and You... 5 Big Data Is More Prevalent Than You Think... 5 Big
More informationEnabling SOX Compliance on DataStax Enterprise
Enabling SOX Compliance on DataStax Enterprise Table of Contents Table of Contents... 2 Introduction... 3 SOX Compliance and Requirements... 3 Who Must Comply with SOX?... 3 SOX Goals and Objectives...
More informationHighly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014
Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western
More informationIntroduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
More informationSimplifying Database Management with DataStax OpsCenter
Simplifying Database Management with DataStax OpsCenter Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 DataStax OpsCenter... 3 How Does DataStax OpsCenter Work?... 3 The OpsCenter
More informationBig Data: Beyond the Hype
Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER Big Data: Beyond the Hype Why Big Data Matters to You By DataStax Corporation October 2011 Table of Contents Introduction...4 Big Data
More informationBASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
More informationComparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4
More informationEvaluating Apache Cassandra as a Cloud Database WHITE PAPER
Evaluating Apache Cassandra as a Cloud Database WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Why Move to a Cloud Database?... 3 The Cloud Promises Transparent Elasticity...
More informationEvaluating Apache Cassandra as a Cloud Database White Paper
Evaluating Apache Cassandra as a Cloud Database White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 Why Move to a Cloud Database? 3 The Cloud Promises Transparent
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationTable of Contents... 2
Why NoSQL? Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 You Have Big Data... 3 How Does DataStax Helps Manage Big Data... 3 Big Data Performance... 4 You Need Continuous Availability...
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationDell Reference Configuration for DataStax Enterprise powered by Apache Cassandra
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave
More informationEvaluating Apache Cassandra as a Cloud Database WHITE PAPER
Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database By DataStax Corporation November 2011 Contents Introduction... 3 Why Move to a Cloud Database?...
More informationAnalytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationComparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER By DataStax Corporation September 2012 Contents Introduction... 3 Overview of HDFS... 4 The Benefits
More informationOn- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
More informationThe Multi-Model Database Cloud Applications in a Complex World
The Multi-Model Database Cloud Applications in a Complex World Table of Contents INTRODUCTION MULTI-MODEL: AN EVOLUTIONARY TALE FROM RDBMS TO NOSQL TO MULTI-MODEL DATASTAX ENTERPRISE AND MULTI-MODEL DECIDING
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationXpoLog Center Suite Data Sheet
XpoLog Center Suite Data Sheet General XpoLog is a data analysis and management platform for Applications IT data. Business applications rely on a dynamic heterogeneous applications infrastructure, such
More informationBig Data Solutions. Portal Development with MongoDB and Liferay. Solutions
Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and
More informationDataStax Enterprise, powered by Apache Cassandra (TM)
PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies
More informationAffordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
More informationAugmented Search for Web Applications. New frontier in big log data analysis and application intelligence
Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationEvaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group
NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationBuilding a Scalable News Feed Web Service in Clojure
Building a Scalable News Feed Web Service in Clojure This is a good time to be in software. The Internet has made communications between computers and people extremely affordable, even at scale. Cloud
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationElasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationAtScale Intelligence Platform
AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationNo-SQL Databases for High Volume Data
Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram
More informationXpoLog Competitive Comparison Sheet
XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationINDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES
INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security
More informationModern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers
Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING
More informationBenchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
More informationHow To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationCloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise
Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service
More informationScaleArc for SQL Server
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
More informationDataStax Enterprise Reference Architecture
DataStax Enterprise Reference Architecture DataStax Enterprise Reference Architecture 7.8.15 1 Table of Contents ABSTRACT... 3 INTRODUCTION... 3 DATASTAX ENTERPRISE... 3 ARCHITECTURE... 3 OPSCENTER: EASY-
More informationPlanning the Migration of Enterprise Applications to the Cloud
Planning the Migration of Enterprise Applications to the Cloud A Guide to Your Migration Options: Private and Public Clouds, Application Evaluation Criteria, and Application Migration Best Practices Introduction
More informationwww.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
More informationQlik Sense Enabling the New Enterprise
Technical Brief Qlik Sense Enabling the New Enterprise Generations of Business Intelligence The evolution of the BI market can be described as a series of disruptions. Each change occurred when a technology
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More information[Hadoop, Storm and Couchbase: Faster Big Data]
[Hadoop, Storm and Couchbase: Faster Big Data] With over 8,500 clients, LivePerson is the global leader in intelligent online customer engagement. With an increasing amount of agent/customer engagements,
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationScaleArc idb Solution for SQL Server Deployments
ScaleArc idb Solution for SQL Server Deployments Objective This technology white paper describes the ScaleArc idb solution and outlines the benefits of scaling, load balancing, caching, SQL instrumentation
More informationHadoop in the Hybrid Cloud
Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationMicrosoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,
More informationCloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
More informationFrom Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
More informationObject Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationCloudwick. CLOUDWICK LABS Big Data Research Paper. Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data
Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data was commissioned to evaluate and test the Nebula One Private and Hybrid Cloud Appliance using DataStax, a leading Apache Cassandra
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationNetwork Services in the SDN Data Center
Network Services in the SDN Center SDN as a Network Service Enablement Platform Whitepaper SHARE THIS WHITEPAPER Executive Summary While interest about OpenFlow and SDN has increased throughout the tech
More informationMiddleware- Driven Mobile Applications
Middleware- Driven Mobile Applications A motwin White Paper When Launching New Mobile Services, Middleware Offers the Fastest, Most Flexible Development Path for Sophisticated Apps 1 Executive Summary
More informationMarkLogic Enterprise Data Layer
MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer September 2011 September 2011 September 2011 Table of Contents Executive Summary... 3 An Enterprise Data
More informationBig Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
More informationHP Virtualization Performance Viewer
HP Virtualization Performance Viewer Efficiently detect and troubleshoot performance issues in virtualized environments Jean-François Muller - Principal Technical Consultant - jeff.muller@hp.com HP Business
More informationSimplified Management With Hitachi Command Suite. By Hitachi Data Systems
Simplified Management With Hitachi Command Suite By Hitachi Data Systems April 2015 Contents Executive Summary... 2 Introduction... 3 Hitachi Command Suite v8: Key Highlights... 4 Global Storage Virtualization
More informationNon-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
More informationFully Managed, High-performance Cassandra Service Powered by DataStax Enterprise
Fully Managed, High-performance Cassandra Service Powered by DataStax Enterprise Fully Managed, High-performance Cassandra Service Cover Table of Contents 1. Introducing Managed Cassandra 1 2. Challenges
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationAugmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence
Augmented Search for IT Data Analytics New frontier in big log data analysis and application intelligence Business white paper May 2015 IT data is a general name to log data, IT metrics, application data,
More informationTHE REALITIES OF NOSQL BACKUPS
THE REALITIES OF NOSQL BACKUPS White Paper Trilio Data, Inc. March 2015 1 THE REALITIES OF NOSQL BACKUPS TABLE OF CONTENTS INTRODUCTION... 2 NOSQL DATABASES... 2 PROBLEM: LACK OF COMPREHENSIVE BACKUP AND
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationData Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
More informationORACLE COHERENCE 12CR2
ORACLE COHERENCE 12CR2 KEY FEATURES AND BENEFITS ORACLE COHERENCE IS THE #1 IN-MEMORY DATA GRID. KEY FEATURES Fault-tolerant in-memory distributed data caching and processing Persistence for fast recovery
More informationIncrease Agility and Reduce Costs with a Logical Data Warehouse. February 2014
Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4
More informationComparing Oracle with Cassandra / DataStax Enterprise
Comparing Oracle with Cassandra / DataStax Enterprise Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Oracle and Today s Online Applications... 3 Architectural Limitations... 3
More informationEnterprise Private Cloud Storage
Enterprise Private Cloud Storage The term cloud storage seems to have acquired many definitions. At Cloud Leverage, we define cloud storage as an enterprise-class file server located in multiple geographically
More informationwww.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
More informationBig data blue print for cloud architecture
Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges
More informationGigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
More information