Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC
|
|
- Sibyl Baker
- 8 years ago
- Views:
Transcription
1 Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC
2 Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design and Optimize an Impala Cluster What s in side: Intel Cofluent Technology for Big Data System Technologies & Optimization (STO) 2
3 Impala Overview Open-ource MPP query execution engine Built natively for Hadoop Efficiently access data stored in Hadoop using SQL Piplined execution mode enables fast data processing speed System Technologies & Optimization (STO) 3
4 Design Challenges of an Impala Cluster H/W Meet Performance Requirements Plan For the Future Not Over Provisioning 10TB 5TB 10 GB 50GB 1TB System Technologies & Optimization (STO) 4
5 Example: Cluster Sizing Requirements: a deep data analytic query over historical data should response within 10 seconds System Technologies & Optimization (STO) 5
6 Example: Storage Choice of One Use Case In general, SSD is faster than HDD, but there re exceptions ~0.0448% System Technologies & Optimization (STO) 6
7 Example: CPU Frequency No impact on the illustrated workload running on the Text formatted table Scaling well when running on the Parquet formatted table System Technologies & Optimization (STO) 7
8 Design Challenges of an Impala Cluster S/W Software Configuration Options HDFS Cache... HDFS Block Size Parquet Row Group Size System Technologies & Optimization (STO) 8
9 Example: HDFS Caching System Technologies & Optimization (STO) 9
10 Design Challenge Summary We have talked about deployment challenges, in terms of: hardware selections and settings Current Approach software configuration choices There s NO ONE SIZE FIT-ALL solution to the design challenges one would face with when deploying a system for production. Efficient Way to Predict System Performance? System Technologies & Optimization (STO) 10
11 Simulation Approach Adjust WL setting Deploy on Experimental Cluster Collect and Analyze System Log Change H/W config Change H/W knobs Simulation Plan Generate Simulation Report System Technologies & Optimization (STO)
12 Impala Simulator Overview Impala Query Execution Simulation Query Planning Flow Plan Nodes, Plan Fragments, Execution Nodes Geneation Task Scheduling and Distribution Data Processing Flow (Pull & Push) Data Distribution (Data Skew and Partitioning) Disk IO Scheduling and Scan Operations Execution nodes System Technologies & Optimization (STO) 12
13 One Banking Use Case Study Offline Customer Account Historical Data Analysis Complex and Deep Analytic Queries Low Latency Interactive Queries Reporting Queries Initially evaluated on Hive, now Impala System Technologies & Optimization (STO) 13
14 Step1: Deploy an Experimental Cluster Deploy a 4-node cluster Small scale of the data System Technologies & Optimization (STO) 14
15 Step2: Collect Simulation Input Hardware Configurations Node Count Processor Storage Network Memory Software Configurations HDFS Impala File Format Table / Column Metadata COMPUTE STATS SHOW TABLE STATS DESC FORMATTED SHOW COLUMN STATS Query Profile - PROFILE Tuple Descriptors Impala Daemon Log System Technologies & Optimization (STO) 15
16 Example: Configure Table Meta Data System Technologies & Optimization (STO) 16
17 Step 3: Baseline Validation on Experimental Cluster System Technologies & Optimization (STO) 17
18 Exchange Execution Node HashJoin Build Phase Aggregation HashJoin Probe Phase Not just query execution time. We also compare with Impala Log File to check the duration of each stage Hdfs Scan Operation disk-io-mgr.cc: disk id (1) reading for... exchange.cc: #rows... instance_id =... Disk Worker 4 Disk Worker 0 System Technologies & Optimization (STO) 18
19 Step 4: From Experimental Cluster to Production Cluster We have completed baseline verification on an experimental cluster Performance prediction for the production cluster Simulation assumptions: upper- and lower- data distribution boundaries small scale of the data System Technologies & Optimization (STO) 19
20 Step 5: Simulation Plan for Production Cluster Software Configuration Matrix Hardware Configuration Matrix File Fo rmat Compr ession Par tition Cache CPU Freq Netw or k Cluster Size Disk Type Text No Compression No Partition No Cache 2.7Gz 1GbE 2 HDD Avro Snappy Partitioned Cached 2.4Gz 10GbE 4 SDD Parquet GZIP Gz System Technologies & Optimization (STO) 20
21 Software Performance Predication System Technologies & Optimization (STO) 21
22 Software Performance Predication > 40GB data to cache System Technologies & Optimization (STO) 22
23 Cache Impact on Text Formatted Data With Cache Without Cache HdfsScanNode finishes at around 6 sec HdfsScanNode finishes at around 12 sec System Technologies & Optimization (STO) 23
24 Cache Impact on Text Formatted Data Block for a short period waiting for RowBatches Execution nodes are busy processing RowBatches System Technologies & Optimization (STO) 24
25 Cache Impact on Parquet Formatted Data With Cache Without Cache Fast Scan, CPU Bound System Technologies & Optimization (STO) 25
26 Cache Impact on Parquet Formatted Data CPU Bound,Scan Speed Does Not Have Impact on Overall Performance of Query Execution. System Technologies & Optimization (STO) 26
27 Software Configuration Recommandation Baseline Text No Compression No Partition No Cache Reporting Workload Avro Snappy Partitioned Cached 1.1% 7.37% 7.94% 4.62% Deep Analytic Workload Parquet GZIP Partitioned Cached 14.45% 2.74% -9.22% 0.49% System Technologies & Optimization (STO) 10x Files to Scan CPU Intensive 27
28 Hardware Performance Predication System Technologies & Optimization (STO)
29 Hardware Performance Predication Network Transfer Cost: MS Baseline Network Transfer Cost: MS 2 Nodes 4 Nodes 6 Nodes 8 Nodes 16 Nodes 20 Nodes System Technologies & Optimization (STO) 29
30 Hardware Performance Predication Expected Response Time System Technologies & Optimization (STO) 30
31 Overall Recommendation 1.8Gz 256 MB No Compression Text 80% 6 HDD 10GbE 4 Nodes Execution Time (Baseline): ~63.3 seconds System Technologies & Optimization (STO) Execution Time (Recommanded): ~12.4 seconds Cluster Size < 4x, 8 Nodes < 10x, 16 Nodes > 10x, 20 Nodes
32 What s Inside System Technologies & Optimization (STO) 32
33 Intel CoFluent Technology for Big Data FASTER CLUSTER DEPLOYMENT: Explore deployment options and meet performance goals OPTIMIZE CLUSTERS: Find performance bottlenecks and optimize software operation SCALE UP WITH CONFIDENCE: Simulate to determine the minimum cost to meet your future demand System Technologies & Optimization (STO)
34 Intel CoFluent Studio Based Simulation Enables fast What if? analysis with a virtual system System Technologies & Optimization (STO)
35 Layered Simulation Architecture S/W Stack Spark M/R HBase HDFS Impala System Topology Role Assignment Build a cluster OS JVM H/W Resource Monitoring and Performance Library CPU Memory Storage Ethernet Dynamic S/W & H/W Mapping Discrete Events Simulation Kernel on SystemC System Technologies & Optimization (STO)
36 Software Stack Coverage YARN System Technologies & Optimization (STO)
37 Hardware Coverage Validated: 50 Nodes SSD & HDD Pooled Compute Pooled Memory Pooled I/O 1GbE & 10GbE Rack Scale Architecture System Technologies & Optimization (STO)
38 Simulation Accuracy High Simulation Accuracy is achieved for Big Data applications running on different cluster size, hardware configurations and software stacks. System Technologies & Optimization (STO) 38
39 Fast Simulation Simulation vs. Real Time in minutes Hardware - 4 node Cluster (min) 71 Simulation Speed - Lenovo T420 (min) Abstract Modeling 36 Event Driven Simulation NUMBER OF CONCURRENT UPLOADING REQUESTS System Technologies & Optimization (STO)
40 Host machine to run simulations System Technologies & Optimization (STO)
41 Call to Actions Visit cofluent.intel.com for more information Request white papers Various customer success stories and use cases available Optimize a 50-node Hive/MR Cluster Predict the scalability of a large HBase Cluster Software Parameter tunings for Spark Applications Demo in the showcase Intel booth System Technologies & Optimization (STO) 41
42 cofluent.intel.com System Technologies & Optimization (STO)
43 Legal Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Statements in this document that refer to Intel s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel s results and plans is included in Intel s SEC filings, including the annual report on Form 10-K. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, CoFluent, Xeon, and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others Intel Corporation. System Technologies & Optimization (STO)
44 Risk Factors The above statements and any others in this document that refer to plans and expectations for the second quarter, the year and the future are forwardlooking statements that involve a number of risks and uncertainties. Words such as "anticipates," "expects," "intends," "plans," "believes," "seeks," "estimates," "may," "will," "should" and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel's actual results, and variances from Intel's current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be important factors that could cause actual results to differ materially from the company's expectations. Demand for Intel's products is highly variable and could differ from expectations due to factors including changes in business and economic conditions; consumer confidence or income levels; the introduction, availability and market acceptance of Intel's products, products used together with Intel products and competitors' products; competitive and pricing pressures, including actions taken by competitors; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Intel's gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; and product manufacturing quality/yields. Variations in gross margin may also be caused by the timing of Intel product introductions and related expenses, including marketing expenses, and Intel's ability to respond quickly to technological developments and to introduce new products or incorporate new features into existing products, which may result in restructuring and asset impairment charges. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Results may also be affected by the formal or informal imposition by countries of new or revised export and/or import and doing-business regulations, which could be changed without prior notice. Intel operates in highly competitive industries and its operations have high costs that are either fixed or difficult to reduce in the short term. The amount, timing and execution of Intel's stock repurchase program could be affected by changes in Intel's priorities for the use of cash, such as operational spending, capital spending, acquisitions, and as a result of changes to Intel's cash flows or changes in tax laws. Product defects or errata (deviations from published specifications) may adversely impact our expenses, revenues and reputation. Intel's results could be affected by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel's ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. Intel's results may be affected by the timing of closing of acquisitions, divestitures and other significant transactions. A detailed discussion of these and other factors that could affect Intel's results is included in Intel's SEC filings, including the company's most recent reports on Form 10-Q, Form 10-K and earnings release. Rev. 4/14/15 System Technologies & Optimization (STO)
investor meeting 2 0 1 5 SANTA CLARA
investor meeting 2 0 1 5 SANTA CLARA investor meeting 2 0 1 5 SANTA CLARA Brian Krzanich Chief Executive Officer agenda 2015 Results Intel s Corporate Strategy Intel s Foundation Intel's Growth Engines
More informationData center day. Big data. Jason Waxman VP, GM, Cloud Platforms Group. August 27, 2015
Big data Jason Waxman VP, GM, Cloud Platforms Group August 27, 2015 Big Opportunity: Extract value from data REVENUE GROWTH 50 x = Billion 1 35 ZB 2 COST SAVINGS MARGIN GAIN THINGS DATA VALUE 1. Source:
More informationData center day. Network Transformation. Sandra Rivera. VP, Data Center Group GM, Network Platforms Group
Network Transformation Sandra Rivera VP, Data Center Group GM, Network Platforms Group August 27, 2015 Network Infrastructure Opportunity WIRELESS / WIRELINE INFRASTRUCTURE CLOUD & ENTERPRISE INFRASTRUCTURE
More informationData center day. a silicon photonics update. Alexis Björlin. Vice President, General Manager Silicon Photonics Solutions Group August 27, 2015
a silicon photonics update Alexis Björlin Vice President, General Manager Silicon Photonics Solutions Group August 27, 2015 Innovation in the data center High Performance Compute Fast Storage Unconstrained
More informationNASDAQ CONFERENCE. Doug Davis Sr. Vice President and General Manager, internet of Things Group
NASDAQ CONFERENCE 2015 Doug Davis Sr. Vice President and General Manager, internet of Things Group Risk factors Today s presentations contain forward-looking statements. All statements made that are not
More informationDouglas Fisher Vice President General Manager, Software and Services Group Intel Corporation
Douglas Fisher Vice President General Manager, Software and Services Group Intel Corporation Other brands and names are the property of their respective owners. Other brands and names are the property
More information2015 Global Technology conference. Diane Bryant Senior Vice President & General Manager Data Center Group Intel Corporation
2015 Global Technology conference Diane Bryant Senior Vice President & General Manager Data Center Group Intel Corporation Risk Factors The above statements and any others in this document that refer to
More informationCFO Commentary on Full Year 2015 and Fourth-Quarter Results
Intel Corporation 2200 Mission College Blvd. Santa Clara, CA 95054-1549 CFO Commentary on Full Year 2015 and Fourth-Quarter Results Summary The fourth quarter was a strong finish to the year with record
More informationBig Data Analytics on Object Storage -- Hadoop over Ceph* Object Storage with SSD Cache
Big Data Analytics on Object Storage -- Hadoop over Ceph* Object Storage with SSD Cache David Cohen (david.e.cohen@intel.com ) Yuan Zhou (yuan.zhou@intel.com) Jun Sun (jun.sun@intel.com) Weiting Chen (weiting.chen@intel.com)
More informationMapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment
MapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment Ralph H. Castain Senior Architect, Intel Corporation Omkar Kulkarni Software Developer, Intel Corporation Xu, Zhenyu
More informationMedia Cloud Based on Intel Graphics Virtualization Technology (Intel GVT-g) and OpenStack *
Media Cloud Based on Intel Graphics Virtualization Technology (Intel GVT-g) and OpenStack * Xiao Zheng Software Engineer, Intel Corporation 1 SFTS002 Make the Future with China! Agenda Media Cloud Media
More informationIntel Many Integrated Core Architecture: An Overview and Programming Models
Intel Many Integrated Core Architecture: An Overview and Programming Models Jim Jeffers SW Product Application Engineer Technical Computing Group Agenda An Overview of Intel Many Integrated Core Architecture
More informationHadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation
Hadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation Agenda Overview HAM and HAL Hadoop* Ecosystem with Lustre * Benchmark results Conclusion and future work
More informationIntel Reports Second-Quarter Revenue of $13.2 Billion, Consistent with Outlook
Intel Corporation 2200 Mission College Blvd. Santa Clara, CA 95054-1549 News Release Intel Reports Second-Quarter Revenue of $13.2 Billion, Consistent with Outlook News Highlights: Revenue of $13.2 billion
More informationEnabling Innovation in Mobile User Experience. Bruce Fleming Sr. Principal Engineer Mobile and Communications Group
Enabling Innovation in Mobile User Experience Bruce Fleming Sr. Principal Engineer Mobile and Communications Group Agenda Mobile Communications Group: Intel in Mobility Smartphone Roadmap Intel Atom Processor
More informationIntel Reports Third-Quarter Revenue of $14.5 Billion, Net Income of $3.1 Billion
Intel Corporation 2200 Mission College Blvd. Santa Clara, CA 95054-1549 News Release Intel Reports Third-Quarter Revenue of $14.5 Billion, Net Income of $3.1 Billion News Highlights: Quarterly revenue
More informationNew Developments in Processor and Cluster. Technology for CAE Applications
7. LS-DYNA Anwenderforum, Bamberg 2008 Keynote-Vorträge II New Developments in Processor and Cluster Technology for CAE Applications U. Becker-Lemgau (Intel GmbH) 2008 Copyright by DYNAmore GmbH A - II
More informationIntel Reports Fourth-Quarter and Annual Results
Intel Corporation 2200 Mission College Blvd. P.O. Box 58119 Santa Clara, CA 95052-8119 CONTACTS: Reuben Gallegos Amy Kircos Investor Relations Media Relations 408-765-5374 480-552-8803 reuben.m.gallegos@intel.com
More informationThe Evolving Role of Flash in Memory Subsystems. Greg Komoto Intel Corporation Flash Memory Group
The Evolving Role of Flash in Memory Subsystems Greg Komoto Intel Corporation Flash Memory Group Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS and TECHNOLOGY.
More informationHow To Scale At 14 Nanomnemester
14 nm Process Technology: Opening New Horizons Mark Bohr Intel Senior Fellow Logic Technology Development SPCS010 Agenda Introduction 2 nd Generation Tri-gate Transistor Logic Area Scaling Cost per Transistor
More informationIntel Reports Second-Quarter Results
Intel Corporation 2200 Mission College Blvd. Santa Clara, CA 95054-1549 CONTACTS: Mark Henninger Amy Kircos Investor Relations Media Relations 408-653-9944 480-552-8803 mark.h.henninger@intel.com amy.kircos@intel.com
More informationIntel Desktop public roadmap
Intel Desktop public roadmap 1H Expires end of Q3 Info: roadmaps@intel.com Intel Desktop Public Roadmap - Consumer Intel High End Desktop Intel Core i7 Intel Core i7 processor Extreme Edition: i7-5960x
More informationDell* In-Memory Appliance for Cloudera* Enterprise
Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous
More informationFast, Low-Overhead Encryption for Apache Hadoop*
Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software
More informationNext-Gen Big Data Analytics using the Spark stack
Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our
More informationMaximum performance, minimal risk for data warehousing
SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationI N V E S T O R M E E T I N G 2 0 1 4
I N V E S T O R M E E T I N G 2 0 1 4 Diane Bryant Senior Vice President & General Manager Data Center Group Key Messages Big industry trends fuel data center growth Investing to win across workloads &
More informationIntegrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
More informationBig Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Buzzwords Berlin - 2015 Big data analytics / machine
More informationWITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationIntel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms
EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations
More informationVDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop
VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop Page 1 of 11 Introduction Virtual Desktop Infrastructure (VDI) provides customers with a more consistent end-user experience and excellent
More informationIntel Platform and Big Data: Making big data work for you.
Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable
More informationCOSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com 2012.10
COSBench: A benchmark Tool for Cloud Object Storage Services Jiangang.Duan@intel.com 2012.10 Updated June 2012 Self introduction COSBench Introduction Agenda Case Study to evaluate OpenStack* swift performance
More informationCloud based Holdfast Electronic Sports Game Platform
Case Study Cloud based Holdfast Electronic Sports Game Platform Intel and Holdfast work together to upgrade Holdfast Electronic Sports Game Platform with cloud technology Background Shanghai Holdfast Online
More informationMinimize cost and risk for data warehousing
SYSTEM X SERVERS SOLUTION BRIEF Minimize cost and risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (55TB) Highlights Improve time to value for your data
More informationDell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
More informationImprove Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationDIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
More informationHadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013
Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay
More informationIntroduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3
Wort ftoc.tex V3-12/17/2007 2:00pm Page ix Introduction xix Part I: Finding Bottlenecks when Something s Wrong Chapter 1: Performance Tuning 3 Art or Science? 3 The Science of Performance Tuning 4 The
More informationSignificantly Speed up real world big data Applications using Apache Spark
Significantly Speed up real world big data Applications using Apache Spark Mingfei Shi(mingfei.shi@intel.com) Grace Huang ( jie.huang@intel.com) Intel/SSG/Big Data Technology 1 Agenda Who are we? Case
More informationHur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER
Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationHadoop Applications on High Performance Computing. Devaraj Kavali devaraj@apache.org
Hadoop Applications on High Performance Computing Devaraj Kavali devaraj@apache.org About Me Apache Hadoop Committer Yarn/MapReduce Contributor Senior Software Engineer @Intel Corporation 2 Agenda Objectives
More informationxpaaerns on Spark, Shark, Tachyon and Mesos
xpaaerns on Spark, Shark, Tachyon and Mesos Spark Summit 2014 Claudiu Barbura Sr. Director of Engineering A>geo Agenda xpa&erns Architecture From Hadoop to BDAS & our contribu
More informationIntel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationLenovo Database Configuration for Microsoft SQL Server 2014 37TB
Database Lenovo Database Configuration for Microsoft SQL Server 2014 37TB Data Warehouse Fast Track Solution Data Warehouse problem and a solution The rapid growth of technology means that the amount of
More informationReal-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software
Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationCloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
More informationMicrosoft SharePoint Server 2010
Microsoft SharePoint Server 2010 Small Farm Performance Study Dell SharePoint Solutions Ravikanth Chaganti and Quocdat Nguyen November 2010 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationRemoving Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
More informationIntel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study
Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study The adoption of cloud computing creates many challenges and opportunities in big data management and storage. To
More informationPerformance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
More informationSAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013
SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase
More informationBest Practices for Increasing Ceph Performance with SSD
Best Practices for Increasing Ceph Performance with SSD Jian Zhang Jian.zhang@intel.com Jiangang Duan Jiangang.duan@intel.com Agenda Introduction Filestore performance on All Flash Array KeyValueStore
More informationIntel Solid-State Drives Increase Productivity of Product Design and Simulation
WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State
More informationSAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
More informationHADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com
More informationEloquence Training What s new in Eloquence B.08.00
Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium
More informationBest Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays
Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Database Solutions Engineering By Murali Krishnan.K Dell Product Group October 2009
More informationSolution Brief: Microsoft SQL Server 2014 Data Warehouse Fast Track on System x3550 M5 with Micron M500DC Enterprise Value SATA SSDs
Vinay Kulkarni Solution Brief: Microsoft SQL Server 2014 Data Warehouse Fast Track on System x3550 M5 with Micron M500DC Enterprise Value SATA SSDs Solution Reference Number: BDASQLRMS51 The rapid growth
More informationAccelerating Business Intelligence with Large-Scale System Memory
Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness
More informationAccomplish Optimal I/O Performance on SAS 9.3 with
Accomplish Optimal I/O Performance on SAS 9.3 with Intel Cache Acceleration Software and Intel DC S3700 Solid State Drive ABSTRACT Ying-ping (Marie) Zhang, Jeff Curry, Frank Roxas, Benjamin Donie Intel
More informationSuccessfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
More informationNFV Reference Platform in Telefónica: Bringing Lab Experience to Real Deployments
Solution Brief Telefonica NFV Reference Platform Intel Xeon Processors NFV Reference Platform in Telefónica: Bringing Lab Experience to Real Deployments Summary This paper reviews Telefónica s vision and
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationAmerica s Most Wanted a metric to detect persistently faulty machines in Hadoop
America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,andrewr1@facebook.com Presented at IFIP Workshop on Failure Diagnosis, Chicago June
More informationIntel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Platforms Ubuntu* Enterprise Cloud Executive Summary Intel Cloud Builder Guide Intel Xeon Processor Ubuntu* Enteprise Cloud Canonical*
More informationParquet. Columnar storage for the people
Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala Outline Context from various
More informationIntel True Scale Fabric Architecture. Enhanced HPC Architecture and Performance
Intel True Scale Fabric Architecture Enhanced HPC Architecture and Performance 1. Revision: Version 1 Date: November 2012 Table of Contents Introduction... 3 Key Findings... 3 Intel True Scale Fabric Infiniband
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationOracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
More informationImpala: A Modern, Open-Source SQL
Impala: A Modern, Open-Source SQL Engine Headline for Goes Hadoop Here Marcel Speaker Kornacker Name Subhead marcel@cloudera.com Goes Here CIDR 2015 Cloudera Impala Agenda Overview Architecture and Implementation
More informationPSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation
PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation 1. Overview of NEC PCIe SSD Appliance for Microsoft SQL Server Page 2 NEC Corporation
More informationCOLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationIntel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms
Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms Enomaly Elastic Computing Platform, * Service Provider Edition Executive Summary Intel Cloud Builder Guide
More informationExtended Attributes and Transparent Encryption in Apache Hadoop
Extended Attributes and Transparent Encryption in Apache Hadoop Uma Maheswara Rao G Yi Liu ( 刘 轶 ) Who we are? Uma Maheswara Rao G - umamahesh@apache.org - Software Engineer at Intel - PMC/committer, Apache
More informationJames Serra Sr BI Architect JamesSerra3@gmail.com http://jamesserra.com/
James Serra Sr BI Architect JamesSerra3@gmail.com http://jamesserra.com/ Our Focus: Microsoft Pure-Play Data Warehousing & Business Intelligence Partner Our Customers: Our Reputation: "B.I. Voyage came
More informationMicrosoft SQL Server: MS-10980 Performance Tuning and Optimization Digital
coursemonster.com/us Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital View training dates» Overview This course is designed to give the right amount of Internals knowledge and
More informationBig Data for Big Science. Bernard Doering Business Development, EMEA Big Data Software
Big Data for Big Science Bernard Doering Business Development, EMEA Big Data Software Internet of Things 40 Zettabytes of data will be generated WW in 2020 1 SMART CLIENTS INTELLIGENT CLOUD Richer user
More informationMicrosoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,
More informationThe Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014
The Flash-Transformed Financial Data Center Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014 Forward-Looking Statements During our meeting today we will
More informationMOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012
MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course Overview This course provides students with the knowledge and skills to design business intelligence solutions
More informationApplication of Predictive Analytics for Better Alignment of Business and IT
Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker
More informationIntel RAID SSD Cache Controller RCS25ZB040
SOLUTION Brief Intel RAID SSD Cache Controller RCS25ZB040 When Faster Matters Cost-Effective Intelligent RAID with Embedded High Performance Flash Intel RAID SSD Cache Controller RCS25ZB040 When Faster
More informationSafe Harbor Statement
Safe Harbor Statement "Safe Harbor" Statement: Statements in this presentation relating to Oracle's future plans, expectations, beliefs, intentions and prospects are "forward-looking statements" and are
More informationNear Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
More informationIntelligent Business Operations
White Paper Intel Xeon Processor E5 Family Data Center Efficiency Financial Services Intelligent Business Operations Best Practices in Cash Supply Chain Management Executive Summary The purpose of any
More informationInge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
More information