Oracle Big Data Handbook



Similar documents
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Introducing Oracle Exalytics In-Memory Machine

Oracle Big Data Fundamentals Ed 1 NEW

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Oracle Big Data Essentials

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Management. Oracle Fusion Middleware. 11 g Architecture and. Oracle Press ORACLE. Stephen Lee Gangadhar Konduri. Mc Grauu Hill.

Oracle Big Data SQL Technical Update

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

An Oracle White Paper June Oracle: Big Data for the Enterprise

Architecting for the Internet of Things & Big Data

TUT NoSQL Seminar (Oracle) Big Data

Tuning Tips & Techniques

Big Data Are You Ready? Thomas Kyte

Big Data: Are You Ready? Kevin Lancaster

Data Warehousing in the Age of Big Data

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Safe Harbor Statement

Connecting Hadoop with Oracle Database

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Constructing a Data Lake: Hadoop and Oracle Database United!

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Oracle Big Data Strategy Simplified Infrastrcuture

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

High Performance Data Management Use of Standards in Commercial Product Development

The Future of Data Management

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Qsoft Inc

Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

WebLogic Server 11g Administration Handbook

An Oracle White Paper October Oracle: Big Data for the Enterprise

Oracle Big Data Building A Big Data Management System

<Insert Picture Here> Big Data

An Oracle White Paper September Oracle: Big Data for the Enterprise

TRAINING PROGRAM ON BIGDATA/HADOOP

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Oracle Big Data Management System

BIG DATA TRENDS AND TECHNOLOGIES

Architecting your Business for Big Data Your Bridge to a Modern Information Architecture

ORACLE BIG DATA APPLIANCE X3-2

ORACLE BIG DATA APPLIANCE X4-2

Master Data Management and Data Governance Second Edition

How To Handle Big Data With A Data Scientist

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data Analytics From Strategie Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Implement Hadoop jobs to extract business value from large and varied data sets

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Disrupt or be disrupted IT Driving Business Transformation

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Using OBIEE for Location-Aware Predictive Analytics

Introduction to Big Data Training

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Oracle Big Data Appliance X5-2

Cost-Effective Business Intelligence with Red Hat and Open Source

Building and Managing

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Apache Hadoop: Past, Present, and Future

Managing Data in Motion

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Hadoop Ecosystem B Y R A H I M A.

Big Data Course Highlights

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Safe Harbor Statement

Hadoop & SAS Data Loader for Hadoop

Teradata s Big Data Technology Strategy & Roadmap

Big Data Too Big To Ignore

Networking. Sixth Edition. A Beginner's Guide BRUCE HALLBERG

Cisco. A Beginner's Guide Fifth Edition ANTHONY T. VELTE TOBY J. VELTE. City Milan New Delhi Singapore Sydney Toronto. Mc Graw Hill Education

HDP Hadoop From concept to deployment.

The Future of Data Management with Hadoop and the Enterprise Data Hub

A very short Intro to Hadoop

Big data blue print for cloud architecture

Security Information and

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Oracle R zum Anfassen: Die Themen

Big Data Use Cases Update

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

ORACLE CONFIGURATION SERVICES EXHIBIT

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Big Data Big Data/Data Analytics & Software Development

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

High Performance IT Insights. Building the Foundation for Big Data

Oracle Big Data Appliance X5-2

Transcription:

ORACLG Oracle Press Oracle Big Data Handbook Tom Plunkett Brian Macdonald Bruce Nelson Helen Sun Khader Mohiuddin Debra L. Harding David Segleau Gokula Mishra Mark F. Hornick Robert Stackowiak Keith Laker Mc Graw Hill Education New York Chicago San Francisco Athens London Madrid Mexico City Milan New Delhi Singapore Sydney Toronto

Contents Acknowledgments Introduction xxi xxv PART I Introduction 1 Introduction to Big Data 3 Big Data 4 Google's MapReduce Algorithm and Apache Hadoop 5 Oracle's Big Data Platform 7 Summary 10 2 The Value of Big Data 11 Am I Big Data, or Is Big Data Me? 12 Big Data, Little Data It's Still Me 15 What Happened? 16 Now What? 17 Reality, Check Please! 18 What Do You Make of It? 20 Information Chain Reaction (ICR) 21 Big Data, Big Numbers, Big Business? 23 Twitter 24 Facebook 25 Internal Source 25 ICR: Connect 26 ICR: Change 27 xi

xii Oracle Big Data Handbook Wanted: Big Data Value 29 Big Data Example 1: Clinical Trial Research Within the Healthcare Industry 30 Example 2: Improvements in Car Design for Driver Safety Within the Automotive 31 Industry 32 Summary PART II Big Data Platform 3 The Apache Hadoop Platform 37 Software vs. Hardware 39 The Hadoop Software Platform 39 Hadoop Distributions and Versions 40 The Hadoop Distributed File System (HDFS) 40 Scheduling, Compute, and Processing 43 Operating System Choices 45 I/O and the Linux Kernel 46 The Hadoop Hardware Platform 46 CPU and Memory 47 Network 47 Disk 48 Putting It All Together 48 4 Why an Appliance? 51 Why Would Oracle Create a Big Data Appliance? 52 What Is an Appliance? 53 What Are the Goals of Oracle Big Data Appliance? 54 Optimizing an Appliance 55 Oracle Big Data Appliance Version 2 Software 56 Oracle Big Data Appliance X3-2 Hardware 58 Where Did Oracle Get Hadoop Expertise? 61 Configuring a Hadoop Cluster 63 Choosing the Core Cluster Components 64 Assembling the Cluster 66 What About a Do-It-Yourself Cluster? 67 Total Costs of a Cluster 69

Contents xih Time to Value 73 How to Build Out Larger Clusters 75 Can I Add Other Software to Oracle Big Data Appliance? 75 Drawbacks of an Appliance 76 5 BDA Configurations, Deployment Architectures, and Monitoring 79 Introduction 80 Big Data Appliance X3-2 Full Rack (Eighteen Nodes) 82 Big Data Appliance X3-2 Starter Rack (Six Nodes) 86 Big Data Appliance X3-2 In-Rack Expansion (Six Nodes) 89 Hardware Modifications to BDA 89 Software Supported on Big Data Appliance X3-2 90 BDA Install and Configuration Process 92 Critical and Noncritical Nodes 94 Automatic Failover of the NameNode 95 BDA Disk Storage Layout 96 Adding Storage to a Hadoop Cluster 99 Hadoop-Only Config and Hadoop+NoSQL DB 99 Hadoop-Only Appliance 100 Hadoop and NoSQL DB 100 Memory Options 103 Deployment Architectures 103 Multitenancy and Hadoop in the Cloud 103 Scalability 105 Multirack BDA Considerations 106 Installing Other Software on the BDA 107 BDA in the Data Center 107 Administrative Network 107 Client Access Network 108 InfiniBand Private Network 108 Network Requirements 109 Connecting to Data Center LAN 111 Example Connectivity Architecture 111 Oracle Big Data Appliance Restrictions on Use 112 BDA Management and Monitoring 113 Enterprise Manager 115 Cloudera Manager 117 Hadoop Monitoring Utilities: Web GUI 117 Oracle ILOM 120 Hue 122 DCLI Utility 123

xiv Oracle Big Data Handbook 6 Integrating the Data Warehouse and Analytics Infrastructure to Big Data 125 The Data Warehouse as a Historic Database of Record 126 The Oracle Database as a Data Warehouse 127 Why the Data Warehouse and Hadoop Are Deployed Together 128 Completing the Footprint: Business Analyst Tools 130 Building Out the Infrastructure 131 7 BDA Connectors 133 Oracle Big Data Connectors 134 Oracle Loader for Hadoop 136 Online Mode 137 Oracle OCI Direct Path Output JDBC Output 139 Offline Mode 140 Oracle Data Pump Output 141 Delimited Text Output 141 Installation of Oracle Loader for Hadoop 142 Invoking Oracle Loader for Hadoop 143 Input Formats 144 DelimitedTextlnputFormat 145 RegexInputFormat 146 AvrolnputFormat 146 HiveToAvrolnputFormat 146 KVAvroInputFormat 147 Custom Input Formats 147 Oracle Loader for Hadoop Configuration Files 147 Loader Maps 150 Additional Optimizations 152 Leveraging InfiniBand 152 Comparison to Apache Sqoop 153 Oracle SQL Connector for HDFS 153 Installation of Oracle SQL Connector for HDFS 157 HIVE Installation 159 Creating External Tables Using Oracle SQL Connector for HDFS 160 ExternalTable Configuration Tool 161 Data Source Types 161 Configuration Tool Syntax 162 Required Properties 163 Optional Properties 164 ExternalTable Tool for Delimited Text Files 164 Testing DDL with -noexecute 167 139

Contents XV Adding a New HDFS File to the Location File 167 Manual External Table Configuration 1 68 Hive Sources 169 ExternalTable Example 170 Oracle Data Pump Sources 171 Configuration Files 173 Querying with Oracle SQL Connector for HDFS 175 Oracle R Connector for Hadoop 1 76 Oracle Data Integrator Application Adapter for Hadoop 177 8 Oracle NoSQL Database 181 What Is a NoSQL Database System? 182 NoSQL Applications 184 Oracle NoSQL Database 185 A Sample Use Case 186 Architecture 188 Client Driver 189 Key-Value Pairs 190 Storage Nodes 192 Replication 193 Smart Topology 194 Online Elasticity 194 No Single Point of Failure 195 Data Management 195 APIs 195 CRUD Operations 196 Multiple Update Operations 196 Lookup Operations 196 Transactions 197 Predictable Performance 198 Integration 199 Installation and Administration 200 Simple Installation 200 Administration 200 How Oracle NoSQL Database Stacks Up 201 Useful Links 202 PART III Analyzing Information and Making Decisions 9 In-Database Analytics: Delivering Faster Time to Value 205 Introduction 206 Oracle's In-Database Analytics 208 Why Running In-Database Is So Important 211

XVi Oracle Big Data Handbook Introduction to Oracle Data Mining and Statistical Analysis 211 Oracle's In-Database Advanced Analytics 213 Oracle Data Mining 213 Introduction to R 223 Text Mining 231 In-Database Statistical Functions 236 Making Bl Tools Smarter 237 Spatial Analytics 238 Understanding the Spatial Data Model 239 Querying the Spatial Data Model 239 Using Spatial Analytics 240 Making Bl Tools Smarter 241 Graph-Based Analytics 242 Graph Data Model 242 Querying Graph Data 243 Multidimensional Analytics 245 Making Bl Tools Smarter and Faster 246 In-Database Analytics: Bringing It All Together 247 Integrating Analytics into Extract-Load-Transform Processing 247 Delivering Guided Exploration 248 Delivering Analytical Mash-ups 249 Conclusion 249 10 Analyzing Data with R 251 Introduction to Open Source R 252 CRAN, Packages, and Task Views 252 GUIs and IDEs 255 Traditional R and Database Interaction vs. Oracle R Enterprise 256 Oracle's Strategic R Offerings 258 Oracle R Enterprise 259 Oracle R Distribution 260 ROracle 261 Oracle R Connector for Hadoop 261 Oracle R Enterprise: Next-Level View 261 Oracle R Enterprise Installation and Configuration 263 Using Oracle R Enterprise 265 Transparency Layer 265 Embedded R Execution 276 Predictive Analytics 293

Contents Xvii Oracle R Connector for Hadoop 309 Invoking MapReduce Jobs 311 Testing ORCH R Scripts Without the Hadoop Cluster 311 Interacting with HDFS from R 313 HDFS Metadata Discovery 314 Working with Hadoop Using the ORCH Framework 316 ORCH Predictive Analytics on Hadoop 317 ORCHhive 319 Oracle R Connector for Hadoop and Oracle R Enterprise Interaction 322 Summary 322 11 Endeca Information Discovery 325 Why Did Oracle Select Endeca? 326 Product Suites Overview 326 Endeca Information Discovery Platform 328 Major Functional Areas 328 Key Features 328 Endeca Information Discovery and Business Intelligence 331 Difference in Roles and Functions 332 Bl Development Process vs. Information Discovery Approach 333 Complementary But Not Exclusive 334 Architecture 335 Oracle Endeca Server 336 Oracle Endeca Studio 339 Oracle Endeca Integration Suite 342 Endeca on Exalytics 343 Scalability and Load Balancing 344 Unifying Diverse Content Sets 348 Endeca Differentiator 349 Industry Use Cases 349 Hands-On with Endeca 351 Installation and Configuration 351 Developing an Endeca Application 353 12 Big Data Governance 357 Key Elements of Enterprise Data Governance 359 Business Outcome 359 Information Lifecycle Management 359 Regulatory Compliance and Risk Management 360 Metadata Management 360

Xviii Oracle Big Data Handbook Data Quality Management 361 Master and Reference Data Management 361 Data Security and Privacy Management 362 Business Process Alignment 362 How Does Big Data Impact Enterprise Data Governance? 363 Modeled Data vs. Raw Data 363 Types of Big Data 366 Applying Data Governance to Big Data 370 Leveraging Big Data Governance 373 Industry-Specific Use Cases 377 Utilities 377 Healthcare 379 Financial Services 380 Retail 382 Consumer Packaged Goods (CPG) 383 Telecommunications 384 Oil and Gas 386 How Does Big Data Impact Data Governance Roles? 388 Governance Roles and Organization 388 An Approach to Implementing Big Data Governance 389 13 Developing Architecture and Roadmap for Big Data 393 Architecture Capabilities for Big Data 394 New Characteristics of Big Data 394 Conceptual Architecture Capabilities of Big Data 395 Product Capabilities and Tools 397 Making Big Data Architecture Decisions 399 Architecture Development Process for Realizing Incremental Values 400 Overview of Oracle Information Architecture Framework 400 Overview of Applied OADP for Information Architecture 406 Big Data Architecture Development Process 408 Impact on Data Management and Bl Processes Traditional Bl Development Process 415 Big Data and Analytics Development Process 415 Big Data Governance 416 Traditional Data Governance Focus 417 New Focus for Governance in Big Data 417 Developing Skills and Talent 418 Data Scientist 418 415

Contents XIX Big Data Developer 419 Big Data Administrator 419 Big Data Best Practices 419 Align Big Data Initiative with Specific Business Goals 420 Ensure a Centralized IT Strategy for Standards and Governance 420 Use a Center of Excellence to Minimize Training and Risk 420 Correlate Big Data with Structured Data 420 Provide High-Performance and Scalable Analytical Sandboxes 420 Reshape the IT Operating Model 421 Index 423