Apache Trafodion. Table of contents ENTERPRISE CLASS OPERATIONAL SQL-ON-HADOOP

Size: px
Start display at page:

Download "Apache Trafodion. Table of contents ENTERPRISE CLASS OPERATIONAL SQL-ON-HADOOP"

Transcription

1 Apache Trafodion ENTERPRISE CLASS OPERATIONAL SQL-ON-HADOOP Table of contents Introducing Trafodion... 2 Trafodion overview... 2 Targeted Hadoop workload profile... 2 Transactional SQL application characteristics and challenges... 3 Trafodion innovations built upon Hadoop software stack... 3 Leveraging HBase for performance, scalability, and availability... 4 Trafodion innovation value add improvements over vanilla HBase... 4 Salting of row keys... 5 Trafodion feature overview... 5 Full-functioned ANSI SQL language support... 5 Trafodion software architecture overview... 6 Integrating with native Hive and HBase data stores... 6 Trafodion process overview and SQL execution flow... 7 Trafodion s optimizer technology... 7 Extensible optimizer technology... 7 Optimized execution plans based on statistics... 8 Trafodion s data flow SQL executor technology with optimized DOP... 8 Trafodion optimizations for transactional SQL workloads... 9 Trafodion innovation - Distributed Transaction Management High availability and data integrity features Summary of Trafodion benefits Where to go for more information... 12

2 Trafodion Apache Trafodion (incubating) is an open source initiative to deliver an enterprise class SQL-on-Hadoop DBMS engine that specifically targets transactional protected operational workloads. Trafodion represents the combination of Apache HBase and transactional SQL technologies that have been developed leveraging more than 20 years of investments into database technology and solutions. Introducing Trafodion Trafodion is an open source initiative to develop an enterprise class SQL-on-Hadoop DBMS engine that specifically targets big data transactional or operational workloads as opposed to analytic workloads. Transactional SQL encompasses workloads previously described as OLTP (online transaction processing) workloads which were generated in support of traditional enterprise-level transactional applications (ERP, CRM, etc.) and enterprise business processes. Additionally, transactions have evolved to include social and mobile data interactions and observations using a mixture of structured and semi-structured data. Trafodion overview Comprehensive and full-functioned SQL DBMS which allows companies to reuse and leverage existing SQL skills to improve developer productivity. Extends Hadoop HBase by adding support for ACID (atomic, consistent, isolated and durable) transaction protection that guarantees data consistency across multiple rows, tables, SQL statements. Includes many optimizations for low-latency read and write transactions in support of the fast response time requirements of the transactional SQL workloads. Hosted applications can seamlessly access and join data from Trafodion, native HBase, and Hive tables without expensive replication or data movement overhead. Provides interoperability with new or existing applications and 3 rd party tools via support for standard ODBC and JDBC access. Designed to seamlessly fit within the existing IT infrastructure with no vendor lock-in by remaining neutral to the underlying Linux and Hadoop distributions. Targeted Hadoop workload profile Hadoop workloads can be broadly categorized into 4 different workload types as shown in Figure 1 i.e. Operational, Interactive, Non-Interactive, and Batch. These categories vary greatly in terms of their response time expectations as well as the amount of data that is typically processed. The rightmost 3 categories are where the marketplace (vendors and customers) have predominantly focused their attention and therefore these are the most mature in nature in terms of development efforts and solution offerings. For the most part these categories represent efforts centered around analytics and business intelligence processing on big data problems. These workloads are well positioned to leverage Hadoop strengths and capabilities, mapreduce in particular. In contrast, the leftmost workload defined as Operational is an emerging Hadoop market category and therefore the least mature in nature. In part, this is a direct result of Hadoop being perceived as having a number of weaknesses (or gaps) in terms of addressing the requirements for transactional SQL workloads. Traditionally these workloads have been relegated to the domain of relational databases but there is growing interest and pressure to embrace these workloads in Hadoop due to Hadoop s perceived benefits of significantly reduced costs, reduced vendor lock-in, and its ability to seamlessly scale to larger workloads and data. This is exactly the workload that Trafodion is targeting. Let s next look at the characteristics and requirements of this workload to better understand Hadoop s gaps and weaknesses and to better understand how Trafodion will address these. 2

3 Figure 1. Hadoop Workload Profiles Transactional SQL application characteristics and challenges Transactional protected operational workloads are typically deemed mission critical in nature because they help companies make money, touch their customers or prospects, or help them run and operate their business. Typically they have very stringent requirements in terms of response times (sub-second) expectations, transactional data integrity, number of users, concurrency, availability, and data volumes. With the advent of the growing internet of things, the number and types of access devices has driven tremendous transaction and data growth and also changes in the type of data that needs to be captured and utilized as part of these transactions. These next generation operational applications often require multi-structured data types which implies that operational data is evolving rapidly to include a variety of data formats and types of data, for example transactional structured data combined with visual images. Combined, these requirements can expose Hadoop limitations in terms of transaction support, bulletproof data integrity, real time performance, operational query optimization, and managing workloads comprised of a complex mix of concurrently executing transactions all with varying priorities. Trafodion addresses each of these limitations and as a result provides a differentiated DBMS capable of hosting these applications and their data. Trafodion innovations built upon Hadoop software stack Trafodion is designed to build upon and leverage Apache Hadoop and HBase core modules. Operational applications using Trafodion transparently gain Hadoop s advantages of affordable performance, scalability, elasticity, availability, etc. Figure 2 depicts a subset of the Hadoop software stack and those items colored in orange are specifically leveraged by Trafodion, namely HBase, HDFS, and Zookeeper. To this stack, Trafodion adds (items colored in green) ODBC/JDBC drivers, the Trafodion database software, and a new HBase distributed transaction management (DTM) subsystem for distributed transaction protection across multiple HBase regions. Trafodion interfaces to Hadoop services using their standard APIs. By maintaining API compatibility, Trafodion becomes Hadoop distribution neutral thereby eliminating vendor lock-in by offering customers a choice of distributions to choose from. Trafodion is initially targeted to deliver innovation on top of Hadoop in these key areas: A full-featured ANSI SQL implementation whose database services are accessible via a standard ODBC/JDBC connection Provides a SQL relational schema abstraction which makes Trafodion look and feel like any other relational database Distributed ACID transaction protection Performant response times for transactions comprised of both reads and writes Parallel optimizations for both transactional and operational reporting workloads 3

4 Figure 2. Trafodion and Hadoop Ecosystem Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages attributed to HBase including parallel performance, virtually unlimited scalability, elasticity, and availability/disaster recovery protection. These features are key to supporting operational workloads in production. For example: Fine grained load balancing, scalability, and parallel performance is provided via standard HBase services such as autosharding Trafodion table data across multiple regions and region servers. Data availability and recovery in the event a server or disk fails or is decommissioned is provided by standard Hadoop and HBase services such a replication and snapshots. Additionally Trafodion is able to transparently leverage Hadoop distribution (e.g. Cloudera, Hortonworks) specific features and capabilities since it accesses these distribution services via native HBase API s. Powerful features such as compression or encryption can be supplied under the covers for Trafodion defined tables as a result. Next let s look at how Trafodion brings innovation and value add to vanilla HBase. Trafodion innovation value add improvements over vanilla HBase Although Trafodion stores its database objects in HBase/HDFS storage structures, it differs and brings valueadd over vanilla HBase in a multitude of ways as described below: Trafodion provides a relational schema abstraction on top of HBase which allows customers to leverage known and well tested relational design methodologies and SQL programming skills. From a physical layout perspective, Trafodion uses standard HBase storage mechanisms (column family store using key-value pairs) to store and access objects. Trafodion currently stores all columns in a single column family to improve access efficiency and speed for operational data. Additionally Trafodion incorporates a column name encoding mechanism to save space on disk and to reduce messaging overhead for the purposes of improving SQL performance. Unlike vanilla HBase that treats stored data as an uninterpreted array of bytes, Trafodion defined columns are assigned specific data types that are enforced by Trafodion when inserting or updating its data contents. This not only greatly improves data quality/integrity, it also eliminates the need to develop application logic to parse and interpret the data contents. Vanilla HBase provides ACID transaction protection only at the row level. Trafodion extends ACID protection to application defined transactions that can span multiple SQL statements, multiple tables, and rows. This greatly improves database integrity by protecting the database against partially completed transactions i.e. ensuring that either the whole transaction is completely materialized in the database or none of it. HBase s native API is at a very low level and is not a commonly used programming API. In contrast, Trafodion s API is ANSI SQL which is a familiar and well known programming interface and allows companies to leverage existing SQL knowledge and skills. 4

5 Unlike HBase s key structure that is comprised of a single uninterpreted array of bytes, Trafodion supports the common relational practice of allowing the primary key to be a composite key comprised of multiple columns. Finally unlike vanilla HBase, Trafodion supports the creation of secondary indexes that can be used to speed transaction performance when accessing row data by a column value that is not the row key. Salting of row keys One known problematic area for HBase is supporting transactional workloads where data is inserted into a table in row key order. When this happens, all of the I/O gets concentrated to a single HBase region which in turn creates a server and disk hotspot and performance bottleneck. To alleviate this problem, Trafodion provides an innovative feature called salting the row key. To enable this feature the DBA specifies the number of partitions (i.e. regions) the table is to be split over when creating the table e.g. SALT USING 4 PARTITIONS. Trafodion creates the table pre-split with one region per salt value. An internal hash value column, _SALT_, is added as a prefix to the row key. Salting is handled automatically by Trafodion and is transparent to application written SQL statements. As data is inserted into the table, Trafodion computes the salt value and directs the insert to the appropriate region. Likewise, Trafodion calculates the salt value when data is retrieved from the table and automatically generates predicates where feasible. MDAM technology (which is described in more detail in the section entitled Trafodion optimizations for transactional SQL workloads ) makes this process especially efficient. This is a very lightweight operation with little overhead or impact to direct key access operations. The benefits of salting are that you get more even data distributions across regions and improved performance via hotspot elimination. In summary, Trafodion incorporates a number of enhancements over vanilla HBase for the purposes of improving transaction performance, data integrity, and DBA/developer productivity (i.e. by reducing application complexity through the use of standard and well known relational practices and APIs). Trafodion feature overview Let s now look at a high level overview of the Trafodion features. A more detailed drill down of each of these features is provided in the sections below. Trafodion includes: An enterprise-class SQL DBMS that provides all of the features you would expect from one of the merchant relational database products that are on the market. The difference is that Trafodion leverages Hadoop services i.e. HBase/HDFS for data storage. Full-functioned ANSI SQL language support including data definition, data manipulation, transaction control, and database utilities. Linux and Windows ODBC/JDBC drivers. Distributed transaction management protection. Many SQL optimizations designed to improve operational workload performance. All while retaining and extending expected Hadoop benefits! Now let s dive into more details on these features. Full-functioned ANSI SQL language support Unlike most (if not all) NOSQL and other SQL-on-Hadoop products, Trafodion provides comprehensive ANSI SQL language support including full-functioned data definition (DDL), data manipulation (DML), transaction control (TCL) and database utility support. Unlike vanilla HBase, Trafodion provides support for creating and managing traditional relational database objects including tables, views, secondary indexes, and constraints. Columns (table attributes) are assigned trafodion enforced data types including numeric, character, varchar, date, time, interval, etc. Internationalization (I18N) support is provided via Unicode encoding including UTF-8, UCS2, and ISO for both user data as well as the database metadata. Comparisons and data manipulation between differing data encodings is transparently handled via implicit casting and translation support. 5

6 Trafodion provides comprehensive and standard SQL data manipulation support including SELECT, INSERT, UPDATE, DELETE, and UPSERT/MERGE syntax with language options including join variants, unions, where predicates, aggregations (group by and having), sort ordering, sampling, correlated and nested sub-queries, cursors, and many SQL functions. Utilities are provided for updating table statistics used by the optimizer for costing (i.e. selectivity/cardinality estimates) plan alternatives, for displaying the chosen SQL execution plan, plan shaping, and a command line utility for interfacing with the database engine. Explicit control statements are provided to allow applications to define transaction boundaries and to abort transactions when warranted. Trafodion will support ANSI s grant/revoke semantics to define user privileges in terms of managing and accessing the database objects. Trafodion software architecture overview The Trafodion software architecture consists of 3 distinct layers: the client layer; the SQL database services layer; and the storage engine layer (see Figure 3). Figure 3. Trafodion's 3-layer software architecture The first layer is the Client Services layer where the operational application resides. The operational application can be either customer written or enabled via a 3rd party ISV tool/application. Access to the Trafodion database services layer is completed via a standard ODBC/JDBC interface using a Trafodion supplied Windows or Linux client driver. Both type 2 and type 4 JDBC drivers are supported and the choice is dependent on the application requirements for response times, number of connections, security, and other factors. The second layer is the SQL layer which consists of the all the Trafodion database services. This layer encapsulates all of the services required for managing Trafodion database objects as well as efficiently executing submitted SQL database requests. Services include connection management, SQL statement compilation and optimized execution plan creation, SQL execution (both parallel and non parallel) against Trafodion database objects, transaction management, and workload management. Trafodion provides transparent parallel SQL execution as warranted thereby eliminating the need for complex map-reduce programming development. The third layer is the Storage Engine layer which consists of standard Hadoop services that are leveraged by Trafodion including HBase, HDFS, and Zookeeper. Trafodion database objects are stored into native Hadoop (HBase/HDFS) database structures. Trafodion handles the mapping of SQL requests into native HBase calls transparently on behalf of the operational application. Trafodion provides a relational schema abstraction on top of HBase. In this way traditional relational database objects (tables, views, secondary indexes) are supported using familiar DDL/DML semantics including object naming, column definition and data types support, etc. Integrating with native Hive and HBase data stores One of the more powerful capabilities of Trafodion is its extensibility to also support and access data stored in native Hive or HBase tables (non-trafodion tables) using their native storage engines and data formats. The benefits that can be realized include: Ability to run queries against native HBase or Hive tables without needing to copy them into a Trafodion table structure 6

7 Optimized access to HBase and Hive tables without complex map-reduce programming Data can be joined across disparate data sources (e.g. Trafodion, Hive, HBase) Ability to leverage HBase s inherent schema flexibility capabilities Trafodion process overview and SQL execution flow The Trafodion SQL Layer is comprised of a number of services or processes used for the purposes of handling connection requests and SQL execution. The process flow begins with the operational application or 3 rd party client tool. The Windows or Linux client accesses the Trafodion DBMS via supplied ODBC/JDBC drivers. When the client requests to open a connection, Trafodion s database connection services (DCS) process the request and assigns the connection to a Trafodion Master SQL process. Trafodion uses Zookeeper to coordinate and manage the distribution of connection services across the cluster for load-balancing purposes as well as to ensure that a client can immediately reconnect in the event the assigned Master process should fail. The Master process is responsible for coordinating the execution of SQL statements passed from the client application. The Master calls upon the Compiler and Optimizer process (CMP) to parse, compile, and generate the optimized execution plan for the SQL statements. If the optimized plan calls for parallel execution, the Master divides the work among Executive Server Processes (ESP) to perform the work in parallel on behalf of the Master process. The results are passed back to the Master for consolidation. In some situations where there a highly complex plan specified (e.g. large n-way joins or aggregations), multiple layers of ESPs may be requested. If a non-parallel plan is generated, then the Master calls upon HBase services directly for optimal performance. For distributed transaction protection services the Trafodion DTM service is called upon to ensure the ACID protection of transactions across the Hadoop cluster. The DTM calls upon a Trafodion supplied HBase TRX service that provides transaction resource management on behalf of HBase. Last, but not least, vanilla HBase, HBase-trx, and HDFS services are called upon by either the Master or ESP processes using standard and native API s to complete the I/O requests i.e. retrieving and maintaining the database objects. Where appropriate Trafodion will push down SQL execution into the HBase layer using Filters or Coprocessors. Trafodion s optimizer technology Optimizer technology represents one of Trafodion s greatest sources of differentiation versus alternative SQLon-Hadoop projects or products. There are two primary areas to call out: the first is the extensible nature of the optimizer to adapt to change and add improvements and the second is the sophistication and maturity level of the optimizer to choose the best optimized plan for execution. Extensible optimizer technology Trafodion s optimizer is based on the Cascades optimization framework. Cascades is recognized as one of the most advanced and extensible optimizer frameworks available. The Cascades framework is a hybrid optimization engine in that it combines logical and physical operator transformation rules with costing models to generate the Trafodion Optimizer. 7

8 New rules or new costing models can be easily added or changed to generate an improved optimizer. In this way, the optimizer can quickly evolve and new operators can be rapidly added or changed to generate improved SQL optimization plan generation. Optimized execution plans based on statistics The second area of differentiation is the sophistication and maturity level of Trafodion s optimizer technology. First let s explain the role of the various elements of the optimizer: SQL Normalizer the parsed SQL statement is passed to the normalizer which performs unconditional transformations, including subquery transformations, of the SQL into a canonical form which renders the SQL in a form that can be optimized internally. SQL Analyzer - analyzes alternative join connectivity patterns, table access paths and methods, matching partition information, etc. to be used by the optimizer s rules. The results are passed to the plan generator for consideration in costing various plan alternatives. Table Statistics captured equal-height histogram statistics identifies data distributions for column data and correlations between columns. Sampling is used for large tables to reduce the overhead of generating the statistics. Cardinality Estimator - cardinalities, data skew, and histograms are computed for intermediate results throughout the operator tree. Cost Estimator - estimates Node, I/O, and message cost for each operator while accounting for data skew at the operator level. Plan Generator - using cost estimates the optimizer considers alternative plans and chooses the plan which has the lowest cost. Where feasible the optimizer will elect plans that incorporate SQL pushdown, sort elimination, and in-memory storage vs. overflow to disk. Also it determines the optimal degree of parallelism including non-parallel plans. In summary, the optimizer is designed to choose the execution plan that minimizes the system resource used and delivers the best response time. It provides optimizations for both operational transactions and reporting workloads. Trafodion s data flow SQL executor technology with optimized DOP Trafodion s SQL executor uses a dataflow and scheduler-driven task model to execute the optimized query plan. Each operator of the plan is an independent task and data flows between operators through inmemory queues (up and down) or by interprocess communication. Queues between tasks allow operators to exchange multiple requests or result rows at a time. A scheduler coordinates the execution of tasks and runs whenever it has data in one of its input queues. Trafodion s executor model is starkly different from alternative SQL-on-Hadoop DBMS that store intermediate results on disk for example, spool space. In most cases, the Trafodion executor is able to process queries with data flowing entirely through memory, providing superior performance and reduced dependency on disk space and I/O bandwidth. The executor incorporates several types of parallelism, such as: Partitioned parallelism which is the ability to work on multiple data partitions in parallel. In a partitioned parallel plan, multiple operators all work on the same plan. Results are merged by using multiple queues, or pipelines, enabling the preservation of the sort order of the input partitions. Partitioning is also called data parallelism because the data is the unit that gets partitioned into independently executable fractions. Pipelined parallelism is an inherent feature of the executor resulting from its dataflow architecture. This architecture interconnects all operators by queues with the output of one operator being piped as input to the next operator, and so on. The result is that each operator works independently of any other operator, producing its output as soon as its input is available. Pipelining occurs naturally and is engaged in almost all query plans. 8

9 Operator parallelism is also an inherent feature of the executor architecture. In operator parallelism, two or more operators can execute simultaneously, that is, in parallel. Except for certain synchronization conditions, the operators execute independently. Trafodion naturally provides parallelism without special processing such as Hadoop map-reduce programming or coding on the part of the application client. An individual query plan produced by the optimizer can contain any combination of partitioned, pipelined, or operator parallelism. The degree of parallelism at any plan stage may vary depending on the optimizer s heuristics. Trafodion optimizations for transactional SQL workloads Trafodion provides many compile and run-time optimizations for varying operational workloads ranging from singleton row accesses for OLTP like transactions to highly complex SQL statements used for operational reporting purposes. Figure 4 depicts a number of these optimization features: A Type 2 JDBC driver may be used which provides the client direct JNI access to HBase services to minimize service times For many OLTP like transactions, the Master can issue directed key access requests to HBase without needing intermediate ESP processes. For transactions including highly complex SQL statements (e.g. n-way joins or aggregations requiring rebroadcasting or redistribution of data), a parallel plan involving ESPs or multi-layers of ESP s can be used to significantly reduce the service time. Additional optimizations include: Masters and ESPs are retained after a connection is dropped and can be reused thereby eliminating the startup and shutdown overhead. Compiled SQL plans are cached thereby eliminating unnecessary recompilation overhead. SQL pushdown using standard HBase services such as filters (e.g. start-stop key predicates) and coprocessors (e.g. count aggregates). Secondary index support. A patented access method known as the Multidimensional Access Method (MDAM) to accelerate row retrieval performance using dimensional predicates. For example assume you have a table where the row-key is Week, Item, and Store but the application supplies only Item and Store predicates. Without MDAM, this would mean that the the DBMS must perform a full table scan or a secondary index on item and store would have to be created. In contrast, MDAM utilizes the inherent HBase clustering rowkeys to issue a series of probes and range jumps through the table reading only the minimal set of rows required to process the SQL statement. MDAM usage can be extended to a broad range of data retrieval requests (e.g. IN lists on multiple key index columns, NOT equal (<>) predicates, Figure 4. Optimized parallel execution multivalued predicates, etc.) thus improving response times and reducing the need for additional secondary indexes. It is also used to access tables with a salted row key efficiently. Rowsets support which is the ability to batch multiple SQL statements in a single request thus reducing the number of message exchanges between the client and the database engine. Availability enhancements including: service persistence (via Zookeeper) and automatic query resubmission. 9

10 Figure 5 below summarizes many of the Trafodion optimizations discussed to this point. This is proof that Trafodion provides optimizations for both operational transaction workloads that typically have very stringent response time requirements (e.g. sub-second in nature) as well as operational query and reporting workloads that typically have more relaxed response time requirements (e.g. minutes to hours) and may include SQL statements that require highly complex SQL operations that are best run in a parallel manner. Figure 5. Trafodion workload optimizations Trafodion innovation - Distributed Transaction Management Vanilla HBase provides only single table, row level ACID protection. Trafodion s distributed transaction management (DTM) in combination with the HBase-TRX service extends transaction protection to transactions spanning multiple SQL statements, multiple tables, or multiple rows of a single table. Additionally Trafodion DTM provides protection in a distributed cluster configuration across multiple HBase regions using an inherent 2-phase commit protocol. Transaction protection is automatically propagated across Trafodion components and processes. Trafodion eliminates the two-phase commit protocol overhead for read-only transactions and transactions updating only a single row. In the latter case, native HBase ACID protection is used. The DTM provides support for implicit (auto-commit) and explicit (BEGIN, COMMIT, ROLLBACK WORK) transaction control. Using HBase s Multi-Version Concurrency Control (MVCC) algorithm, Trafodion allows multiple transactions to be accessing the same rows concurrently. However, in the case of update, the first transaction to complete wins and other transactions are notified at commit that the transaction failed due to update conflict. High availability and data integrity features Trafodion leverages the inherent availability and data integrity features of HBase and HDFS as shown in the chart below. Additionally, Trafodion can leverage any Hadoop distribution provided enterprise-class availability extensions that may be offered. On top of the HBase and HDFS offered features, Trafodion provides a number of high availability features including: 10

11 Persistent connectivity services that ensure that a client is able to reestablish a connection in the event it s DCS service fails Automatic query resubmission (AQR) which resubmits a failed SQL statement in the event the statement fails inflight Summary of Trafodion benefits Trafodion delivers on the promise of a full featured and optimized transactional SQL-on-Hadoop DBMS solution with full transactional data protection. This combination of HBase and an enterprise-class transactional SQL engine overcomes Hadoop s weaknesses in terms of supporting operational workloads. Customers gain the following recognized benefits: Ability to leverage their in-house SQL learnings and expertise versus having to learn complex map/reduce programming. Seamless support for existing and new customer written or ISV operational applications drives investment protection and improved development productivity. Workload optimizations provide the foundation for the delivery of next generation real-time transaction processing applications. Guaranteed transactional consistency across multiple SQL statements, tables, and rows. Complements exisiting Hadoop investments and benefits - reduced cost, scalability, and elasticity. All with open source project sponsorship! 11

12 Where to go for more information Learn more at questions to Copyright 2015 Esgyn Corporation. August 2015

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011 the Availability Digest Raima s High-Availability Embedded Database December 2011 Embedded processing systems are everywhere. You probably cannot go a day without interacting with dozens of these powerful

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers

More information

Enterprise Operational SQL on Hadoop Trafodion Overview

Enterprise Operational SQL on Hadoop Trafodion Overview Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development

More information

The HP Neoview data warehousing platform for business intelligence

The HP Neoview data warehousing platform for business intelligence The HP Neoview data warehousing platform for business intelligence Ronald Wulff EMEA, BI Solution Architect HP Software - Neoview 2006 Hewlett-Packard Development Company, L.P. The inf ormation contained

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Apache Kylin Introduction Dec 8, 2014 @ApacheKylin

Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager lukhan@ebay.com @lukehq Yang Li Architect & Tech Leader yangli9@ebay.com Agenda What s Apache Kylin? Tech Highlights Performance

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com

Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,

More information

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved. EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc. Impala: A Modern, Open-Source SQL Engine for Hadoop Marcel Kornacker Cloudera, Inc. Agenda Goals; user view of Impala Impala performance Impala internals Comparing Impala to other systems Impala Overview:

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016 Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Actian Vector in Hadoop

Actian Vector in Hadoop Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Policy-based Pre-Processing in Hadoop

Policy-based Pre-Processing in Hadoop Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden yi.cheng@ericsson.com, christian.schaefer@ericsson.com Abstract While big data analytics provides

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the

More information

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working

More information

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. erno@component.hu www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

Move Data from Oracle to Hadoop and Gain New Business Insights

Move Data from Oracle to Hadoop and Gain New Business Insights Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides

More information

Crack Open Your Operational Database. Jamie Martin jameison.martin@salesforce.com September 24th, 2013

Crack Open Your Operational Database. Jamie Martin jameison.martin@salesforce.com September 24th, 2013 Crack Open Your Operational Database Jamie Martin jameison.martin@salesforce.com September 24th, 2013 Analytics on Operational Data Most analytics are derived from operational data Two canonical approaches

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server. Course Overview MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview: Performance Counters Technical Data Sheet Microsoft SQL Overview: Key Features and Benefits: Key Definitions: Performance counters are used by the Operations Management Architecture (OMA) to collect data

More information

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem

More information

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Integrate Master Data with Big Data using Oracle Table Access for Hadoop Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Using RDBMS, NoSQL or Hadoop?

Using RDBMS, NoSQL or Hadoop? Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385 brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

LearnFromGuru Polish your knowledge

LearnFromGuru Polish your knowledge SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Where is Hadoop Going Next?

Where is Hadoop Going Next? Where is Hadoop Going Next? Owen O Malley owen@hortonworks.com @owen_omalley November 2014 Page 1 Who am I? Worked at Yahoo Seach Webmap in a Week Dreadnaught to Juggernaut to Hadoop MapReduce Security

More information