Oracle R zum Anfassen: Die Themen
|
|
|
- Claude Daniels
- 10 years ago
- Views:
Transcription
1 R zum Anfassen: Die Themen 09:30 Begrüßung 09:45 R Zum Anfassen Einführung 10:15 Minikurs in der Sprache R Sprachmittel, Hilfen, GUIs zum Erstellen der Skripte Schnell und einfach ansprechende Grafiken erstellen 11:00 Pause 11:15 Showcase Teil 1: Data Mining mit R Vergleich der Genauigkeit zweier Modelle 11:35 R Enterprise Einfache Anwendung, Performance Showcase Teil 2: Data Mining mit R in der Datenbank 12:20 Mittagspause 13:00 Big Data & R Hadoop, Map Reduce & R R als Instrument für Prototyping für Big Data 13:30 Abschließende Fragen 1
2 Warum und wie Big Data jetzt? Neue Art der Datenentstehung? Speicherung und Handling Beiläufig entstehende Daten Maschinengenerierte Massendaten Kommunikations-Daten Geo-Daten Text-Daten LOW DENSITY - Daten Neue Geschäftsideen Bessere Einsichten Optimierte Prozesse Was sind interessante Daten? Wie sind diese zu speichern? Welche Analysverfahren sind passend? Welche Kosten entstehen? 2
3 Externe Daten Interne Daten Data Warehouse und Big Data Klassisches BI Kunden Lieferanten Produkte Mitarbeiter Lager Verkäufe Buchhaltung Log Files Web-Clicks Mails Call-Center Verträge Kurse Berichte Webservices Kaufdaten Integration Enterprise Information Harmonisierung Stammdaten Prüfen Referenzdaten Umsätze / Fakten Relational Database 12c (DWH) Hodoop Loader HDFS nosql DB H a d o o p User View Kennzahlen Sandbox Event Processing SQL Realtime Decision Interactive Dashboards Reporting & Publishing Guide Search &Experiences Realtime Decisions Map Reduce Framework Predictive Analytics & Mining 3
4 In-Database Analytics Big Data Platform Big Data Appliance Big Data Connectors Exadata Exalytics Optimized for Hadoop, R, and NoSQL Processing System of Record Optimized for DW/OLTP Optimized for Analytics & In-Memory Workloads Event Processing Hadoop Open Source R NoSQL Database Applications Big Data Connectors Data Integrator Advanced Analytics Data Warehouse Database Enterprise Performance Management Business Intelligence Applications Business Intelligence Tools Endeca Information Discovery Embeds Times Ten 4 4
5 R Enterprise Predictive Analytics R Engine User R Engine Other R packages R Enterprise packages SQL Results Database Server Maschine Database User tables R Results R Engine(s) managed by DB R Engine Other R packages R Enterprise packages Lineare Modelle Clusterung Segmentierung Neuronale Netze MapReduce Nodes Hadoop Cluster (BDA) HDFS Nodes 5
6 Type of analysis using Hadoop Text mining Index building Graph creation and analysis Pattern recognition Collaborative filtering Prediction models Sentiment analysis Risk assessment 6
7 MapReduce Provides parallelization and distribution with fault tolerance MapReduce programs provide access to data on Hadoop "Map" phase Map task typically operates on one HDFS block of data Map tasks process smaller problems, store results in HDFS, and report success jobtracker "Reduce" phase Reduce task receives sorted subsets of Map task results One or more reducers compute answers to form final answer Final results stored in HDFS Computational processing can occur on unstructured or structured data Abstracts all housekeeping away from the developer 7
8 Map Reduce Example Graphically Speaking HDFS DataNode (key, values ) HDFS DataNode (key, values ) map map (key A, values ) (key B, values ) (key C, values ) (key A, values ) (key B, values ) (key C, values ) shuffle and sort aggregates intermediate values by output key (key A, intermediate values ) (key B, intermediate values ) (key C, intermediate values ) reduce reduce reduce Final key A values Final key B values Final key C values 8
9 Text analysis example Count the number of times each word occurs in a corpus of documents map One mapper per block of data Shuffle and Sort reduce One or more reducers combining results Documents divided into blocks in HDFS Outputs each word and its count: 1 each time a word is encountered Key Value The 1 Big 1 Data 1 word 1 count 1 example 1 One reducer receives only the key-value pairs for the word Big and sums up the counts Key Value Big 1... Big 1 It then outputs the final key-value result Key Value Big
10 Map Reduce Example Graphically Speaking HDFS DataNode (key, values ) HDFS DataNode (key, values ) For Word Count There s no key, only value as input to mapper map (key A, values ) (key B, values ) (key C, values ) map (key A, values ) (key B, values ) (key C, values ) Mapper output is a set of key-value pairs where key is the word and value is the count=1 shuffle and sort aggregates intermediate values by output key (key A, intermediate values ) (key B, intermediate values ) (key C, intermediate values ) reduce reduce reduce Each reducer receives values for each word key is the word value is a set of counts Final key A values Final key B values Final key C values Outputs key as the word and value as the sum 10
11 Mapper and reducer code in ORCH for Word Count corpus <- scan("corpus.dat", what=" ",quiet= TRUE, sep="\n") corpus <- " ", corpus) input <- hdfs.put(corpus) res <- hadoop.exec(dfs.id = input, mapper = function(k,v) { x <- strsplit(v[[1]], " ")[[1]] x <- x[x!=''] out <- NULL for(i in 1:length(x)) out <- c(out, orch.keyval(x[i],1)) out }, reducer = function(k,vv) { orch.keyval(k, sum(unlist(vv))) }, config = new("mapred.config", job.name = "wordcount", map.output = data.frame(key='', val=0), reduce.output = data.frame(key='', val=0) ) ) res hdfs.get(res) Load the R data.frame into HDFS Specify and invoke map-reduce job Split words and output each word Sum the count of each word 11
12 R Connector for Hadoop Big Data Appliance R script Hadoop Cluster (BDA) ORD R Client {CRAN packages} Hadoop Job Mapper Reducer R HDFS R MapReduce R sqoop MapReduce Nodes {CRAN packages} ORD HDFS Nodes Database Provide transparent access to Hadoop Cluster: MapReduce and HDFS-resident data Access and manipulate data in HDFS, database, and file system - all from R Write MapReduce functions using R and execute through natural R interface Leverage CRAN R packages to work on HDFS-resident data Transition work from lab to production deployment on a Hadoop cluster without requiring knowledge of Hadoop internals, Hadoop CLI, or IT infrastructure 12
13 Exploring Available Data HDFS, Database, file system HDFS hdfs.pwd() hdfs.ls() hdfs.mkdir("xq") hdfs.cd("xq") hdfs.ls() hdfs.size("ontime_s") hdfs.parts("ontime_s") hdfs.sample("ontime_s",lines=3) Database ore.ls() names(ontime_s) head(ontime_s,3) File System getwd() dir() # or list.files() dir.create("/home/oracle/orch") setwd("/home/oracle/orch") dat <- read.csv("ontime_s.dat") head(dat) 13
14 Load data in HDFS Data from File Use hdfs.upload Key is first column: YEAR Data from Database Table Use hdfs.push Key column: DEST Data from R data.frame Use hdfs.put Key column: DEST hdfs.rm('ontime_file') ontime.dfs_file <- hdfs.upload('ontime_s2000.dat', dfs.name='ontime_file') hdfs.exists('ontime_file') hdfs.rm('ontime_db') ontime.dfs_d <- hdfs.push(ontime_s2000, key='dest', dfs.name='ontime_db') hdfs.exists('ontime_db') hdfs.rm('ontime_r') ontime <- ore.pull(ontime_s2000) ontime.dfs_r <- hdfs.put(ontime, key='dest', dfs.name='ontime_r') hdfs.exists('ontime_r') 14
15 hadoop.exec() concepts 1. Mapper 1. Receives set of rows from HDFS file as (key,value) pairs 2. Key has the same data type as that of the input 3. Value can be of type list or data.frame 4. Mapper outputs (key,value) pairs using orch.keyval() 5. Value can be ANY R object packed using orch.pack() 2. Reducer 1. Receives (packed) input of type generated by a mapper 2. Reducer outputs (key, value) pairs using orch.keyval() 3. Value can be ANY R object packed using orch.pack() 3. Variables from R environment can be exported to Hadoop environment mappers and reducers using orch.export() (optional) 4. Job configuration (optional) 15
16 ORCH Dry Run Enables R users to test R code locally on laptop before submitting job to Hadoop Cluster Supports testing/debugging of scripts orch.dryrun(true) Hadoop Cluster is not required for a dry run Sequential execution of mapper and reducer code Creates row streams from HDFS input into mapper and reducer Constrained by the memory available to R Recommend to subset / sample input data to fit in memory Upon job success, resulting data put in HDFS No change in the R code is required for dry run 16
17 Example: Test script in dry run mode Take the average arrival delay for all flights to SFO orch.dryrun(t) dfs <- hdfs.attach('ontime_r') res <- NULL res <- hadoop.run( dfs, mapper = function(key, ontime) { if (key == 'SFO') { keyval(key, ontime) } }, reducer = function(key, vals) { sumad <- 0 count <- 0 for (x in vals) { if (!is.na(x$arrdelay)) {sumad <- sumad + x$arrdelay; count <- count + 1} } res <- sumad / count keyval(key, res) } ) res hdfs.get(res) 17
18 Example: Test script on Hadoop Cluster one change Take the average arrival delay for all flights to SFO orch.dryrun(f) dfs <- hdfs.attach('ontime_r') res <- NULL res <- hadoop.run( dfs, mapper = function(key, ontime) { if (key == 'SFO') { keyval(key, ontime) } }, reducer = function(key, vals) { sumad <- 0 count <- 0 for (x in vals) { if (!is.na(x$arrdelay)) {sumad <- sumad + x$arrdelay; count <- count + 1} } res <- sumad / count keyval(key, res) } ) res hdfs.get(res) 18
19 Executing Hadoop Job in Dry-Run Mode Linux Client Retrieve data from HDFS Execute script locally in laptop R engine R Connector For Hadoop Client Hadoop Cluster Software R Distribution BDA R Distribution R Connector For Hadoop Driver Package 19
20 Executing Hadoop Job on Hadoop Cluster Linux Client Submit MapReduce job to Hadoop Cluster Execute Mappers and Reducers using R instances on BDA task nodes R Connector For Hadoop Client Hadoop Cluster Software R Distribution BDA R Distribution R Connector For Hadoop Driver Package 20
21 <Insert Picture Here> DATA WAREHOUSE
OTN Developer Day: Oracle Big Data
OTN Developer Day: Oracle Big Data Hands On Lab Manual Oracle Big Data Connectors: Introduction to Oracle R Connector for Hadoop ORACLE R CONNECTOR FOR HADOOP 2.0 HANDS-ON LAB Introduction to Oracle R
Connecting Hadoop with Oracle Database
Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.
Oracle Big Data Essentials
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Big Data Essentials Duration: 3 Days What you will learn This Oracle Big Data Essentials training deep dives into using the
Oracle Big Data Handbook
ORACLG Oracle Press Oracle Big Data Handbook Tom Plunkett Brian Macdonald Bruce Nelson Helen Sun Khader Mohiuddin Debra L. Harding David Segleau Gokula Mishra Mark F. Hornick Robert Stackowiak Keith Laker
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture
An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP ESG Data Systems Architecture Big Data & Analytics as a Service Components Unstructured Data / Sparse Data of Value
<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Architecting for the Internet of Things & Big Data
Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
A very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
Oracle Big Data Strategy Simplified Infrastrcuture
Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly
Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica [email protected] Big
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
How To Use Hadoop
Hadoop in Action Justin Quan March 15, 2011 Poll What s to come Overview of Hadoop for the uninitiated How does Hadoop work? How do I use Hadoop? How do I get started? Final Thoughts Key Take Aways Hadoop
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining R The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
ITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Big Data: Are You Ready? Kevin Lancaster
Big Data: Are You Ready? Kevin Lancaster Director, Engineered Systems Oracle Europe, Middle East & Africa 1 A Data Explosion... Traditional Data Sources Billing engines Custom developed New, Non-Traditional
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
Big Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Copyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
Database Performance with In-Memory Solutions
Database Performance with In-Memory Solutions ABS Developer Days January 17th and 18 th, 2013 Unterföhring metafinanz / Carsten Herbe The goal of this presentation is to give you an understanding of in-memory
Big Data Introduction
Big Data Introduction Ralf Lange Global ISV & OEM Sales 1 Copyright 2012, Oracle and/or its affiliates. All rights Conventional infrastructure 2 Copyright 2012, Oracle and/or its affiliates. All rights
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
PassTest. Bessere Qualität, bessere Dienstleistungen!
PassTest Bessere Qualität, bessere Dienstleistungen! Q&A Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Version : DEMO 1 / 4 1.When is the earliest point at which the reduce
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
MapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise
An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Map Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Oracle Big Data Fundamentals Ed 1 NEW
Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big
Big Data and Analytics in Government
Big Data and Analytics in Government Nov 29, 2012 Mark Johnson Director, Engineered Systems Program 2 Agenda What Big Data Is Government Big Data Use Cases Building a Complete Information Solution Conclusion
MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012
MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
CS 378 Big Data Programming. Lecture 2 Map- Reduce
CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments
High Performance Data Management Use of Standards in Commercial Product Development
v2 High Performance Data Management Use of Standards in Commercial Product Development Jay Hollingsworth: Director Oil & Gas Business Unit Standards Leadership Council Forum 28 June 2012 1 The following
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
http://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Performance and Energy Efficiency of. Hadoop deployment models
Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
Big Data Use Cases Update
Big Data Use Cases Update Sanat Joshi Industry Solutions Manufacturing Industries Business Unit 1 Data Explosion Web & social networks experienced it first Infographic by Go-gulf.com 2 Number Of Connected
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
An Oracle White Paper September 2014. Oracle: Big Data for the Enterprise
An Oracle White Paper September 2014 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform...
CS 378 Big Data Programming
CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
Oracle EM 12cc als Datenlieferant für ITAM/SAM Tools?
Oracle EM 12cc als Datenlieferant für ITAM/SAM Tools? Zulfikar Salehmohamed ORACLE Deutschland B. V. & Co. KG München Schlüsselworte OEM12CC, LMS, ITAM, SAM, ORACLE, Discovery, Measurement Einleitung Mit
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
Big Data and Your Data Warehouse Philip Russom
Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
Hadoop for MySQL DBAs + 1 About me Sarah Sproehnle, Director of Educational Services @ Cloudera Spent 5 years at MySQL At Cloudera for the past 2 years [email protected] 2 What is Hadoop? An open-source
Hadoop & SAS Data Loader for Hadoop
Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle
OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)
Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco
Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco About the Speaker Mark Rittman, Co-Founder of Rittman Mead
