Unleashing the Power of Hadoop for Big Data Analytics

Size: px
Start display at page:

Download "Unleashing the Power of Hadoop for Big Data Analytics"

Transcription

1 THOUGHT LEADERSHIP SERIES AUGUST Unleashing the Power of Hadoop for Big Data Analytics Data analytics, long the obscure pursuit of analysts and quants toiling in the depths of enterprises, has emerged as the must-have strategy of organizations across the globe. Competitive edge not only comes from deciphering the whims of customers and markets but also being able to predict shifts before they happen. Fueling the move of data analytics out of back offices and into the forefront of corporate strategy sessions is big data, now made enterprise-ready through technology platforms such as Hadoop and MapReduce. The Hadoop framework is seen as the most efficient file system and solution set to store and package big datasets for consumption by the enterprise, and MapReduce is the construct used to perform analysis over Hadoop files. Hadoop was first conceived as a web search engine for Yahoo!, whose developers were inspired by Google s now-wellknown MapReduce paper. It has become the cornerstone of a thriving big data marketplace. Estimates from Wikibon, the open source IT research community, put the worldwide big data market currently at approximately $18 billion, destined to reach roughly $50 billion by For years, and into the present day, enterprises have been applying data analytics against structured, relational datasets derived from transactional systems, using a wide range of tools from a variety of vendors, from data warehousing platforms to front-end desktop-based analysis software. Now, with the universe of unstructured data rapidly expanding, a new frontier is opening up for analysis, enabling potentially far-reaching insights. Hadoop handles data that traditional relational databases, data warehouses, and other analytics platforms have been unable to effectively manage including user-generated data through social media and machinegenerated data from sensors, appliances, Hadoop offers a range of advantages to data analytics efforts. and applications. Hadoop accomplishes this by applying more efficient formats and file systems to large datasets that would normally have been out of the reach of standard analytics solutions. Currently, the most prevalent application seen among Hadoop sites is log and event data analysis, particularly against the machine-generated data coming from web activity and devices. This may include the gathering and analysis of network traffic applications, capacity requirements, security events, and web interactions. As adoption grows, Hadoop-based data may increasingly play a role in more strategic business information, such as sales analysis and workforce allocation. WHY HADOOP? Hadoop offers a range of advantages to data analytics efforts. First and foremost, it enables the processing and analysis of all forms of data regardless of whether it is highly structured or unstructured. Hadoop is also more cost-effective than traditional analytics platforms such as data warehouses. With data warehouses, for example, investment needs to be made in the platform itself, along with investment in extract, transform, and loading (ETL) of data, and data cleansing, and modeling technologies. As a result, data has to be deemed important enough for the data warehouse investment, limiting its use and any ability to experiment with or pilot new forms of analysis. In Hadoop environments which also can accommodate data warehouse data big data stores can be brought in and processed cost effectively. At Hadoop s core is the principle of moving analytics closer to where the data resides. The framework is based on clusters that distribute the computing jobs required for big data analysis across various nodes. Hadoop is also

2 THOUGHT LEADERSHIP SERIES AUGUST cloud-friendly. While many enterprises choose to implement the framework within their data centers, Hadoop clusters can also be run from the cloud, either via cloud vendors or through hosting services. There is also a robust ecosystem of tools and technologies that has developed around Hadoop. Not only is the framework supported by a range of commercial software vendors, but there are also a number of open-source tools available as well, which enable enterprises to derive value out of big data. Many advanced analytic tools on the market also now support Hadoop, enabling visualization, data mining, predictive analytics, and text analytics against big datasets. There is also greater accuracy and flexibility possible in big data via Hadoop. First, analysis can be run against entire datasets, versus smaller samples, as has been the case in the past. In addition, the Hadoop Distributed File System packages datasets into files that can be easily absorbed by existing applications, without the need to upgrade them to a massively parallel version that can absorb big datasets. CHALLENGES While enterprise adoption of Hadoop is expanding, it brings new types of challenges, from manual coding demands, skills requirements, and a lack of native real-time capabilities. For example, the Hadoop Distributed File System does not offer the native resiliency or the real-time capabilities that enterprises have come to expect with enterprise-grade software packages. Hadoop is natively batchoriented, and thus real-time analysis may not be available without additional tools. Plus, if the Hadoop system goes down, it may take some time to recover and restore the framework. In addition, the technology first developed and released in 2006 is still relatively new on the scene, and implementations still relatively immature. It should be noted, for example, that the operational framework is on version 1.0, as offered through the Apache Foundation. To date, many implementations are either more commonly seen among the large web properties or within the depths of data management or IT departments, as pilot projects or as part of efforts to optimize operations within these departments. While enterprise adoption of Hadoop is expanding, it also brings new types of challenges. Hadoop also requires a high degree of skill and understanding to install and implement. Hadoop implementation and management skills, as well as MapReduce development skills, are in high demand and difficult to find. As a result, enterprises seeking to employ Hadoop-based data analysis environments will either require highly trained IT and data management departments, or will need to rely on thirdparty consultants. STEPS TO SUCCESS The following are steps to success for adopting Hadoop-based big data analytics in enterprises: Learn the technology. The Hadoop framework and ecosystem introduces new sets of solutions, such as the Hadoop Distributed File System and MapReduce Engine, along with a range of add-on applications such as Hbase, Hive, Pig, and Sqoop. There are numerous online training programs available, as well as online tutorials, webinars, books, and white papers to further acquaint enterprise teams with the features and technical details of Hadoop. Develop a test environment to pilot Hadoop projects. Reference architectures are now available across the industry for various scenarios of Hadoop implementations. This will also help in the selection of tools that will benefit users accessing the information coming from the Hadoop framework. Work with the business. Hadoop may be more commonly used for optimizing internal IT or data management operations, but it is rapidly gaining ground as a strategic big data analytics platform as well. Hadoop isn t operated in isolation, since it involves pulling in data from different systems and then publishing that data out to different systems. It s also important to develop use cases for the big data analysis project, to map out data flows and determine what data is required. Very importantly, there has to be a demonstrated return on investment. If Hadoop won t bring in additional revenue or cut costs for the business, it may not be worth implementing the technology. Solving the business problem is the ultimate return. Provide and encourage training and skills development. The Hadoop platform requires specialized skill sets that are not readily available on the job market across several different disciplines in the data analytics space systems administration, application development, data analysis and stewardship, and networking expertise. Many of these skills are already resident within organizations and many individuals in data management or IT departments will be able to grow into these roles.

3 THOUGHT LEADERSHIP SERIES AUGUST MarkLogic : The Best Database for Apache Hadoop* The Best Database for Hadoop is a very audacious claim. This essay will lay out the reasons why we believe it to be true, and why MarkLogic is the best and only choice for an enterprise-class database that integrates with Hadoop at both the storage and compute layers. The integration between MarkLogic and Hadoop provides a single platform that allows you to mix and match both realtime and analytics workloads, without having to duplicate data or create a oneoff infrastructure. WHAT IS HADOOP AND WHY IS IT IMPORTANT? Hadoop is a framework for distributed processing over large groups of commodity machines. It is typically used on data that is too large or unpredictable for traditional databases or data warehouses, or for problems that are so computationally expensive that dividing and conquering in parallel is the only way to perform an analysis in a reasonable amount of time. Google popularized the programming model of MapReduce with a paper in 2004 that described their distributed search indexing infrastructure. Hadoop is an open-source implementation of these concepts. It evolved out of search indexing work at Yahoo!* by Doug Cutting, the creator of Apache Nutch* and Apache Lucene*. Hadoop was a response to not being able to handle what we would now call Big Data in legacy RDBMSs or data warehouses. For Google, there was no database at the time that could scale to handle a crawl of the entire web. And, even if a database could handle computation over that amount of data, the storage would have required a complex and expensive SAN or NAS infrastructure. HDFS, the Hadoop Distributed File System, addresses that cost issue by allowing you to scale out versus up on commodity hardware. However, even with the virtually infinite pool of storage and compute resources that Hadoop promises, organizations still need to get data securely and in real-time to users. MapReduce is fine for batch analysis, but what if you need to provide users with the ability to quickly find specific pieces of data and provide granular updates to the data in real time? If you need to do near-instantaneous analysis and alerting for fraud detection, emergency crisis management or risk mitigation or assessment, can you afford the time it would take for a MapReduce job to complete? MARKLOGIC WITH HADOOP AS COMPUTE INFRASTRUCTURE Low-latency queries and granular updates, of course, require a database. Hadoop alone is not equipped for this type of workload. The popular tech press will have you think it s a stark trade-off between legacy relational databases which provide indexes, transactions, security, and enterprise operations and the popular open-source NoSQL databases which have a flexible data model and commodity scale-out while being distributed and fault-tolerant. What if you could have the best of both of these worlds: the flexibility and scalability of NoSQL along with the reliability and security organizations have come to trust in mature relational databases? MarkLogic is unique in the marketplace in providing the best of NoSQL while also being a hardened and proven enterpriseclass database technology. Created in 2001 to fill the need within enterprise organizations and government entities to store, manage, query and search data, no matter the format or structure, MarkLogic has these NoSQL characteristics: Flexible, with a schema-free, document data model (JSON, XML, Text, Binary) Fast, implemented in C++, optimized for today s I/O systems Scalable, leveraging a shared-nothing distributed architecture and lock-free reads And highly available, with transactional consistency, automatic failover, and replication As an Enterprise NoSQL database MarkLogic was designed from the start to support enterprise-class and enterprisescale application requirements, including: ACID (atomic, consistent, isolated, and durable) transactions Government-grade security features including fine-grained privileges,

4 THOUGHT LEADERSHIP SERIES AUGUST role-based security, document-level permissions, and HTTPS access Real-time indexing, full-text search, geospatial search and alerting Proven reliability, uptime, and over 500 deployed mission-critical and enterprise projects in government, media, financial services, energy, and other industries MarkLogic provides all of the scalability on commodity hardware that s come to define the NoSQL space. Yet, it doesn t force you to give up the enterprise capabilities that are required in a missioncritical application. Some will argue that it is possible to maintain enterprise capabilities with other databases whether RDBMS or NoSQL by moving data to the environment where it will be used. Whether you re setting up a data mart with a data warehouse or a search index, moving data around is costly. ETL is error prone and brittle. Once the data is in two places, you not only have to pay for the extra storage, but also deal with governance and security across multiple systems, often in different parts of an organization. Hadoop introduces yet another environment. However, MarkLogic has taken a different approach. In October of 2012 we introduced the ability to deploy MarkLogic directly into an existing Hadoop environment. This allows you to store all of your data in Hadoop s default file storage. This has several important benefits: You can build real-time enterprise applications for Hadoop-based data You can leverage existing (or upcoming) infrastructure investments to save time and money You will require less data movement and/or duplication over its life cycle You can support mixed workloads: index once, real-time or batch You will save money from using costeffective long-term and long-tail storage The new release is not a fork of MarkLogic or Hadoop. MarkLogic can be deployed against any of the leading commercial Hadoop distributions, allowing administrators to leverage existing infrastructure. Because HDFS is one of several file systems on which you can store MarkLogic data, administrators can also easily and consistently move data between SSD, local disk, SAN, NAS, S3, and HDFS to support specific SLAs and cost objectives without modifying downstream application code. And, because MarkLogic has been engineered to be distributed and minimize file-system I/O from the beginning, this wasn t a complete re-engineering of MarkLogic. It s the same scalable, reliable database that you run on your POSIX file system, just that it can now live on top of the Hadoop file system without modifying an existing application code. There is no other enterprise NoSQL database that can do that. MarkLogic and Hadoop work together in a complementary manner in a Big Data ecosystem. Hadoop excels at offline analytics; MarkLogic excels at online applications. Hadoop has been deployed for model-building; while hundreds of decision-making applications rely and run on MarkLogic. Long-haul batch analysis is best done on Hadoop; while real-time queries and alerts require MarkLogic. Finally, the file system in Hadoop is distributed while MarkLogic distributes indexes. In October 2011, MarkLogic released the MarkLogic Connector for Apache Hadoop*. The connector is a drop-in extension to Hadoop that allows you to efficiently move data between MarkLogic and Hadoop using MapReduce. There are three target use cases for the connector; the first and most common is ETL. Hadoop is a common environment to stage raw data in order to move it into other downstream systems. The Hadoop Connector allows you to tap into the large ecosystem of existing libraries to transform and aggregate data before loading it into MarkLogic. This is, in fact, the fastest way to load data into MarkLogic. MarkLogic s bulk loading tool, mlcp, takes advantage MapReduce Defined When someone says Hadoop, they typically mean the entire ecosystem of projects. At the core, however, are the principal compute and storage infrastructure components mentioned above: MapReduce for distributed computation, with a divide and conquer methodology, and HDFS, the distributed file system. These are the most mature parts of the ecosystem and the foundation for all of the other pieces. MapReduce allows you to break large or complex processing into small, independent pieces. Map processes or filters a chunk of the total input data, and Reduce aggregates and collates intermediate results. The Map and Reduce processes work in parallel. You can scale by adding more commodity hardware, not upgrading to bigger/faster hardware. It is centrally coordinated, so if a node goes down, the system reschedules its work to another. by scheduling Hadoop jobs under the covers to load terabytes, or even petabytes, in parallel. MapReduce is only the first chapter in the MarkLogic + Hadoop story, though. MARKLOGIC WITH HADOOP AS STORAGE INFRASTRUCTURE Beyond its MapReduce processing capabilities, Hadoop via its core Hadoop Distributed File System (HDFS) is also a cheap and reliable way to store Big Data. It is the default data storage for Hadoop, and scales to hundreds of petabytes on commodity hardware. HDFS sits right on top of raw local storage, so users do not need a SAN. Its other characteristics include:

5 THOUGHT LEADERSHIP SERIES AUGUST Designed only for reading large, opaque files from start to finish Optimizes for aggregate throughput, not latency Write-once, read-many Automatic 3x replication for availability and performance on commodity hardware File-level security designed to prevent accidental corruption HDFS is a cost-effective means to keep data that may have otherwise been discarded or archived to tape. It is not designed for real-time data access as needed by user-facing applications. A file system doesn t replace a database. There are no indexes in HDFS, so finding an individual record typically involves scanning through every record in a large file. That s great for large-scale analytics where the computation might need to read every record. However, that doesn t work for queries that need to be interactive to support end-user applications. Hadoop is designed for aggregate throughput. The primary consumer is, of course MapReduce which is all about reading large files end-to-end. Coincidently, this also happens to be MarkLogic s I/O pattern. MarkLogic buffers writes to RAM, with an on-disk journal for durability, and periodically spills those buffers, or stands, to disk. As you get more stands it becomes more expensive to run queries. So, in the background MarkLogic consolidates small stands into larger stands, similar in principle to compaction in HBase. Individual reads are aggressively cached in their compressed on-disk format at the MarkLogic data nodes as well as in their uncompressed form at the query evaluator nodes to avoid disk seeks. To describe how MarkLogic is integrated into Hadoop at the storage layer, it is important to first understand how MarkLogic stores data. MarkLogic stores its index and data in independent partitions, or forests (collections of JSON or XML trees). Hosts attach forests, and provide CPU and RAM. Applications scale up by adding forests, and scale out by adding hosts. Using a shared file system as storage provides shared-disk failover, centralized management, economies of scale, and advanced features like flash backup, de-duplication, and compression. These features come at a cost both in dollars and complexity. A truly shared architecture with pools of storage, like in a SAN or NAS, can be difficult to scale. MarkLogic can leverage Hadoop as a shared file system, and take advantage of Hadoop s data and indexes, journals, offline archives, backups, and storage for binaries. MarkLogic can also leverage Hadoop as a storage tier, allowing you to optimize among cost, performance, and availability. For example, you can benefit from less expensive Hadoop storage for archive data, with high density for efficiency, and shared-disk failover while using another tier of more expensive storage for active data, with low density for ingest performance, and replication for high availability. A tiered storage infrastructure with MarkLogic lets you fluidly and consistently switch between Active, Historical, and Archive data without expensive ETL or dedicated infrastructure. You can perform mixed batch and realtime workloads with Hadoop MapReduce and the MarkLogic Enterprise NoSQL Database. DISTRIBUTION STRATEGY AND INTEL* PARTNERSHIP ENTERPRISE HADOOP MEETS ENTERPRISE NOSQL MarkLogic believes that Hadoop is an emerging part of mainstream enterprise infrastructure and is investing heavily in this future. Intel is investing in making Hadoop easier to deploy and manage, making it faster, and making it more secure. Intel* Distribution for Apache Hadoop* (IDH) allows organizations to be successful with Hadoop without having to be project committers, letting organizations focus on their applications not their infrastructure. MarkLogic announced a partnership earlier this year with Intel that will provide integrated support for applications built with MarkLogic and IDH. This is in addition to the Hadoop distributions that MarkLogic already supports, such as those from Hortonworks* and Cloudera*. SUMMARY Hadoop is still early in terms of mainstream adoption, but MarkLogic is committed to supporting the technology as an emerging component of mainstream enterprise infrastructure, changing the economics of Big Data and we are investing heavily in this future today. MarkLogic is the best database for Hadoop organizations can deploy MarkLogic into an existing Hadoop stack to benefit from: Real-time enterprise applications for Hadoop Less data movement, duplication over its life cycle Mixed workloads for best value and efficiency: Index once, real-time or batch Cost-effective long-term and long-tail storage Leverage existing (or upcoming) infrastructure investments MarkLogic security, high availability, disaster recovery, transactional consistency, search and query MarkLogic and Hadoop are complementary technologies that work well together for today s Big Data challenges. By partnering with Intel, MarkLogic becomes your one-stop-shop for a fully supported platform that ensures low risk for your Big Data deployments. If you are currently implementing, or plan to implement Big Data solutions, use MarkLogic as the foundation of your stack to get the best of batch processing and real-time interactivity. MARKLOGIC For more information, please visit MarkLogic is a registered trademark of MarkLogic Corporation in the United States and/or other countries. All other trademarks mentioned are the property of their respective owners.

HADOOP. Unleashing the Power of. for Big Data Analysis. Best Practices Series. MarkLogic. Cisco. Attunity. Couchbase PAGE 14 PAGE 17 PAGE 18 PAGE 19

HADOOP. Unleashing the Power of. for Big Data Analysis. Best Practices Series. MarkLogic. Cisco. Attunity. Couchbase PAGE 14 PAGE 17 PAGE 18 PAGE 19 MarkLogic PAGE 14 MAKING HADOOP BETTER WITH MARKLOGIC Best Practices Series Cisco PAGE 17 CREATING SOLUTIONS TO MEET OUR CUSTOMERS DATA AND ANALYTICS CHALLENGES Attunity PAGE 18 HADOOP DATA LAKES: INCORPORATING

More information

HADOOP UNLEASHING THE POWER OF FOR BIG DATA ANALYTICS THOUGHT LEADERSHIP SERIES. MarkLogic MARKLOGIC : THE BEST DATABASE FOR APACHE HADOOP

HADOOP UNLEASHING THE POWER OF FOR BIG DATA ANALYTICS THOUGHT LEADERSHIP SERIES. MarkLogic MARKLOGIC : THE BEST DATABASE FOR APACHE HADOOP 4 7 9 11 MarkLogic MARKLOGIC : THE BEST DATABASE FOR APACHE HADOOP RainStor GETTING PAST ENTERPRISE HURDLES ON HADOOP Tableau HADOOP: FEEDING DATA-HUNGRY USERS Qubole A BIG DATA SUCCESS STORY FEATURING

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

and NoSQL Data Governance for Regulated Industries Using Hadoop Justin Makeig, Director Product Management, MarkLogic October 2013

and NoSQL Data Governance for Regulated Industries Using Hadoop Justin Makeig, Director Product Management, MarkLogic October 2013 Data Governance for Regulated Industries Using Hadoop and NoSQL Justin Makeig, Director Product Management, MarkLogic October 2013 Who am I? Product Manager for 6 years at MarkLogic Background in FinServ

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014 White Paper EMC Isilon: A Scalable Storage Platform for Big Data By Nik Rouda, Senior Analyst and Terri McClure, Senior Analyst April 2014 This ESG White Paper was commissioned by EMC Isilon and is distributed

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Proact whitepaper on Big Data

Proact whitepaper on Big Data Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources

More information

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014 Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D. Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Getting Started & Successful with Big Data

Getting Started & Successful with Big Data Getting Started & Successful with Big Data @Pentaho #BigDataWebSeries 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Your Hosts Today Davy Nys VP EMEA & APAC Pentaho Paul

More information

Understanding How Sensage Compares/Contrasts with Hadoop

Understanding How Sensage Compares/Contrasts with Hadoop Frequently Asked Questions Understanding How Sensage Compares/Contrasts with Hadoop 1. How does Sensage s approach to managing large, distributed data systems compare/contrast with Hadoop in terms of storage,

More information

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 Hadoop's Role in the Big Data Challenge 3 Cloudera:

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

You Have Your Data, Now What?

You Have Your Data, Now What? You Have Your Data, Now What? Kevin Shelly, GVP, Global Public Sector Data is a Resource SLIDE: 2 Time to Value SLIDE: 3 Big Data: Volume, VARIETY, and Velocity Simple Structured Complex Structured Textual/Unstructured

More information

Making Sense of Big Data in Insurance

Making Sense of Big Data in Insurance Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific

More information

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com;

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

Big Data must become a first class citizen in the enterprise

Big Data must become a first class citizen in the enterprise Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Big Data: Beyond the Hype

Big Data: Beyond the Hype Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Big Data and You... 5 Big Data Is More Prevalent Than You Think... 5 Big

More information

Microsoft Azure Data Technologies: An Overview

Microsoft Azure Data Technologies: An Overview David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information