Massive Predictive Modeling using Oracle R Technologies Mark Hornick, Director, Oracle Advanced Analytics
|
|
- Debra Montgomery
- 8 years ago
- Views:
Transcription
1 Massive Predictive Modeling using Oracle R Technologies Mark Hornick, Director, Oracle Advanced Analytics
2 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 3
3 Agenda Massive Predictive Modeling Use cases Enabling technologies 4
4 Quick Survey: How many models have you built? in your lifetime > 10 > 100 > 1000 > > >
5 Data Size (rows) billions Massive Predictive Modeling 100s 1 millions Generalized Specialized # Models 7
6 billions Data Size (rows) 1000s Broad coverage # Models per Entity 100s 1 Targeted 1 millions # Models 8
7 Massive Predictive Modeling - Goals Build one or more models per entity, e.g., customer Understand and/or predict entity behavior Aggregate results across entities, e.g., to assess future demand model model model model model model model model model n Σ cust=1 Demand over time 9
8 Massive Predictive Modeling - Challenges Effectively dealing with Big Data Hardware, software, network, storage Algorithms that scale and perform with Big Data Building many models in parallel Production deployment Storing and managing models Backup, recovery, and security 10
9 Use Cases 14
10 Predicting Customer Electricity Usage 15
11 Motivation: Energy Theft Detecting patterns of meter tampering Storage of information about which meters have been tampered with Analysis and decision making SA country loses US$4 billion per year due to energy theft Forecast future behavior 16
12 Motivation: Different customers, different demands Creation of a demand and consumption curve for each customer Analysis: in which period will company have to deliver more energy? Price electricity in a given period Storage of information about the consumption of each customer in different periods of day Each customer has different demand and consumption patterns Customer decides when to use energy to reduce cost Company redirects the energy to where it is most needed at the moment, saving on the generation
13 Sensor Data Analysis Model each customer s usage to understand behavior and predict individual usage and overall aggregate demand Consider 200K customers, each with a utility smart meter 1 reading / meter / hour 200K x 8760 hours / year 1.752B readings 3 years worth of data 5.256B readings readings per customer 10 seconds to build each model hours (23.2 days) with 128 DOP 4.3 hours
14 Database-centric architecture Smart meter scenario Oracle Database Data c1 c2 ci cn R Datastore R Script Repository f(dat,args, ) f(dat,args, ) f(dat,args, ) f(dat,args, ) f(dat,args, ) { R Script build model Model c1 Model c2 Model ci Model cn }
15 Database-centric architecture Smart meter scenario Oracle Database Data c1 c2 ci cn R Datastore Model R Script Repository f(dat,args, ) f(dat,args, ) f(dat,args, ) f(dat,args, ) f(dat,args, ) { } R Script score data scores c1 scores c2 scores ci scores cn
16 How many lines of code do you think it should take to implement this?
17 Build models and store in database, partition on CUST_ID ore.groupapply (CUST_USAGE_DATA, 14 lines CUST_USAGE_DATA$CUST_ID, function(dat, ds.name) { cust_id <- dat$cust_id[1] mod <- lm(consumption ~. -CUST_ID, dat) mod$effects <- mod$residuals <- mod$fitted.values <- NULL name <- paste("mod", cust_id,sep="") assign(name, mod) ds.name1 <- paste(ds.name,".",cust_id,sep="") ore.save(list=paste("mod",cust_id,sep=""), name=ds.name1, overwrite=true) TRUE }, ds.name="mydatastore", ore.connect=true, parallel=true ) 22
18 Score customers in database, partition on CUST_ID ore.groupapply(cust_usage_data_new, CUST_USAGE_DATA_NEW$CUST_ID, 16 lines function(dat, ds.name) { cust_id <- dat$cust_id[1] ds.name1 <- paste(ds.name,".",cust_id,sep="") ore.load(ds.name1) name <- paste("mod", cust_id,sep="") mod <- get(name) prd <- predict(mod, newdata=dat) prd[as.integer(rownames(prd))] <- prd res <- cbind(cust_id=cust_id, PRED = prd) data.frame(res) }, ds.name="mydatastore", ore.connect=true, parallel=true, FUN.VALUE=data.frame(CUST_ID=numeric(0), PRED=numeric(0)) ) 23
19 Execution (sec) Execution Examples (with DOP=24) 1000 Models Data: 26,280,000 rows Total build time: 65.2 seconds Total scoring time: 25.7 seconds (all data) 50,000 Models Data: 1,314,000,000 rows Total build time: minutes Total scoring time: 18 minutes (all data) 10,000 Models Data: 262,800,000 rows Total build time: 516 seconds Total scoring time: 217 seconds (all data) 1 Model/Customer # rows (millions) Build Time Score Time 24
20 Simulation 25
21 Compute distribution of generated random normal values simulation <- function(index, n) { set.seed(index) x <- rnorm(n) res <- data.frame(t(matrix(summary(x)))) names(res) <- c("min","q1","median","mean","q3","max") res$id <- index res } (res <- simulation(1,1000)) 26
22 Simulation with sample size 1000 over 10 trials res <- ore.indexapply(10, simulation, n=1000, FUN.VALUE=res[1,], parallel=true) stats <- ore.pull(res) library(reshape2) melt.stats <- melt(stats, id.vars="id") boxplot(value~variable, data=melt.stats, main="distribution of Stats - sample 1000, 10 trials") 27
23 Simulation with sample sizes 10 1:6 and 100 trials num.trials <- 100 for(n in 10^(1:6)){ t1 <- system.time(stats <- ore.pull(ore.indexapply(num.trials, simulation, n=n, FUN.VALUE=res[1,], parallel=true)))[3] cat("n=",n,", time=",t1,"\n") melt.stats <- melt(stats, id.vars="id") boxplot(value~variable, data=melt.stats, main=paste("distribution of Stats - sample",n,",", num.trials, "trials")) gc() } 28
24 Plot Results: sample sizes 10 1:6 and 100 trials
25 Scalable Performance varying number of trials (10^x)
26 Enabling Technologies 32
27 Oracle R Enterprise Oracle Advanced Analytics Option to Oracle Database Eliminate memory constraint of client R engine Minimize or eliminate data movement latency Execute R scripts through database server machine for scalability and performance Achieve scalability and performance by leveraging Oracle Database as HPC environment Enable integration and management of R scripts through SQL Operationalize entire R scripts in production applications eliminate porting R code Avoid reinventing code to integrate R results into existing applications Client R Engine Transparency Layer ORE packages Oracle Database User tables In-db stats SQL Interfaces SQL*Plus, SQLDeveloper, Database Server Machine 34
28 Oracle s R Technologies Oracle R Distribution ROracle Software available to R Community for free Oracle R Enterprise Oracle R Advanced Analytics for Hadoop Come to our booth to learn more 35
29 Resources Oracle R Distribution ROracle Oracle R Enterprise Oracle R Advanced Analytics for Hadoop Book: Using R to Unlock the Value of Big Data Blog: Forum: 47
30 FastR New implementation of R in Java Uses the new Truffle interpreter framework and Graal optimizing compiler in conjunction with the HotSpot JVM for high performance, scalability and portability Dynamically compiles, adaptively optimizes and deoptimizes at run time Joint effort: Oracle Labs (Germany, USA, Austria), JKU Linz (Austria), Purdue University (USA), TU Dortmund (Germany) Open-source project (research prototype!) GPLv2 More info at the poster session 48
31 49
32
A Perfect Storm. Oracle Big Data Science for Enterprise R and SAS Users. Marcos Arancibia, Consulting Product Manager marcos.arancibia@oracle.
A Perfect Storm Oracle Big Data Science for Enterprise R and SAS Users Mark Hornick, Director, Advanced Analytics mark.hornick@oracle.com @MarkHornick Marcos Arancibia, Consulting Product Manager marcos.arancibia@oracle.com
More informationBig Data Analytics Scaling R to Enterprise Data user! 2013 Albacete Spain #user2013
Big Analytics Scaling R to Enterprise user! 2013 Albacete Spain #user2013 Luis Campos Mark Hornick 1 Big Solutions Lead, Oracle EMEA Director, Oracle base Advanced Analytics @luigicampos @MarkHornick 2
More informationStarting Smart with Oracle Advanced Analytics
Starting Smart with Oracle Advanced Analytics Great Lakes Oracle Conference Tim Vlamis Thursday, May 19, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed
More informationHigh-Performance Analytics
High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends
More informationOracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationOracle Advanced Analytics - Option to Oracle Database: Oracle R Enterprise and Oracle Data Mining. Data Warehouse Global Leaders Winter 2013
Oracle Advanced Analytics - Option to Oracle Database: Oracle R Enterprise and Oracle Data Mining Data Warehouse Global Leaders Winter 2013 Dan Vlamis, Vlamis Software Solutions Tim Vlamis, Vlamis Software
More informationLearning R Series Session 4: Oracle R Enterprise 1.3 Predictive Analytics Mark Hornick Oracle Advanced Analytics
Learning R Series Session 4: Oracle R Enterprise 1.3 Predictive Analytics Mark Hornick Oracle Advanced Analytics Learning R Series 2012 Session Title Session 1 Introduction to Oracle's
More informationOracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationSession 1: Introduction to Oracle's R Technologies
Session 1: Introduction to Oracle's R Technologies Mark Hornick, Director, Oracle Advanced Analytics Development Oracle Advanced Analytics Topics What is R? Oracle R Enterprise motivation
More informationOutils pour l'analyse prédictive parallèle de multiples sources de données non structurées
Outils pour l'analyse prédictive parallèle de multiples sources de données non structurées Forum Ter@tec Mercredi 25 juin 2015 Marc Wolff Application Engineer HPC & Big Data 2015 The MathWorks, Inc. 1
More informationI/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
More informationJVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationOracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining R The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationBig Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationCA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data
Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationJava/Scala Engineer Internet of Iot Competitors
JOB 1 Sr. Java/Scala Engineer Internet of Things, IoT, is a true digital revolution. Predictions of 20, 50 or 100 billion connected devices in 2020 are pointing to massive changes for people and industries.
More informationBig Data and Advanced Analytics Technologies for the Smart Grid
1 Big Data and Advanced Analytics Technologies for the Smart Grid Arnie de Castro, PhD SAS Institute IEEE PES 2014 General Meeting July 27-31, 2014 Panel Session: Using Smart Grid Data to Improve Planning,
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationTowards Smart and Intelligent SDN Controller
Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems
More informationINTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS
INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS CLOUD COMPUTING Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing
More informationPerformance And Scalability In Oracle9i And SQL Server 2000
Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability
More informationStreaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment
Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 2 SQLstream proves 15x faster
More informationFive Essential Components for Highly Reliable Data Centers
GE Intelligent Platforms Five Essential Components for Highly Reliable Data Centers Ensuring continuous operations with an integrated, holistic technology strategy that provides high availability, increased
More informationORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process
ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced
More informationUsing the Coherence Cloud Service
Using the Coherence Cloud Service An introduction Dave Felcey Coherence Product Manager July 2, 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationStreaming Big Data Performance Benchmark. for
Streaming Big Data Performance Benchmark for 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner Static Big Data is a
More informationCASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationSAP SE - Legal Requirements and Requirements
Finding the signals in the noise Niklas Packendorff @packendorff Solution Expert Analytics & Data Platform Legal disclaimer The information in this presentation is confidential and proprietary to SAP and
More informationCustomized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationMobile RFID solutions
A TAKE Solutions White Paper Mobile RFID solutions small smart solutions Introduction Mobile RFID enables unique RFID use-cases not possible with fixed readers. Mobile data collection devices such as scanners
More informationHarnessing the power of advanced analytics with IBM Netezza
IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced
More informationHolistic Performance Analysis of J2EE Applications
Holistic Performance Analysis of J2EE Applications By Madhu Tanikella In order to identify and resolve performance problems of enterprise Java Applications and reduce the time-to-market, performance analysis
More information<Insert Picture Here> Move to Oracle Database with Oracle SQL Developer Migrations
Move to Oracle Database with Oracle SQL Developer Migrations The following is intended to outline our general product direction. It is intended for information purposes only, and
More informationMicrosoft Research Windows Azure for Research Training
Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
More informationTesting & Assuring Mobile End User Experience Before Production. Neotys
Testing & Assuring Mobile End User Experience Before Production Neotys Agenda Introduction The challenges Best practices NeoLoad mobile capabilities Mobile devices are used more and more At Home In 2014,
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationJun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC
Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationMicrosoft Research Microsoft Azure for Research Training
Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
More informationEnabling R for Big Data with PL/R and PivotalR Real World Examples on Hadoop & MPP Databases
Enabling R for Big Data with PL/R and PivotalR Real World Examples on Hadoop & MPP Databases Woo J. Jung Principal Data Scientist Pivotal Labs 1 All In On Open Source Still can t believe we did this. Truly
More information2009 Oracle Corporation 1
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,
More informationLecture 9: Data Mining, Data Analytics and Big Data
Lecture 9: Data Mining, Data Analytics and Big Data Maaike Limper, Antonio Romero, Manuel Martin 1 Introduction Two openlab Projects in IT-DB Data Analytics In-Database Physics Analysis Both using data
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationOracle Communications WebRTC Session Controller: Basic Admin. Student Guide
Oracle Communications WebRTC Session Controller: Basic Admin Student Guide Edition 1.0 April 2015 Copyright 2015, Oracle and/or its affiliates. All rights reserved. Disclaimer This document contains proprietary
More informationIn-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
More informationApplication of Predictive Analytics for Better Alignment of Business and IT
Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker
More informationWhat s Cool in the SAP JVM (CON3243)
What s Cool in the SAP JVM (CON3243) Volker Simonis, SAP SE September, 2014 Public Agenda SAP JVM Supportability SAP JVM Profiler SAP JVM Debugger 2014 SAP SE. All rights reserved. Public 2 SAP JVM SAP
More informationTime series IoT data ingestion into Cassandra using Kaa
Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka ashvayka@cybervisiontech.com Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox
More informationOracle BI Publisher Enterprise Cluster Deployment. An Oracle White Paper August 2007
Oracle BI Publisher Enterprise Cluster Deployment An Oracle White Paper August 2007 Oracle BI Publisher Enterprise INTRODUCTION This paper covers Oracle BI Publisher cluster and high availability deployment.
More informationAn Oracle White Paper May 2012. Oracle Database Cloud Service
An Oracle White Paper May 2012 Oracle Database Cloud Service Executive Overview The Oracle Database Cloud Service provides a unique combination of the simplicity and ease of use promised by Cloud computing
More informationBW-EML SAP Standard Application Benchmark
BW-EML SAP Standard Application Benchmark Heiko Gerwens and Tobias Kutning (&) SAP SE, Walldorf, Germany tobas.kutning@sap.com Abstract. The focus of this presentation is on the latest addition to the
More informationAn Oracle White Paper June 2013. Oracle: Big Data for the Enterprise
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationCOMPUTER MEASUREMENT GROUP - India Hyderabad Chapter. Strategies to Optimize Cloud Costs By Cloud Performance Monitoring
COMPUTER MEASUREMENT GROUP - India Hyderabad Chapter Strategies to Optimize Cloud Costs By Cloud Performance Monitoring October 2013 www.cmgindia.org Computer Measurement Group, India 1 About Me Credentials
More informationORACLE DATABASE 10G ENTERPRISE EDITION
ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.
More informationSearch Big Data with MySQL and Sphinx. Mindaugas Žukas www.ivinco.com
Search Big Data with MySQL and Sphinx Mindaugas Žukas www.ivinco.com Agenda Big Data Architecture Factors and Technologies MySQL and Big Data Sphinx Search Server overview Case study: building a Big Data
More informationIoT Security Platform
IoT Security Platform 2 Introduction Wars begin when the costs of attack are low, the benefits for a victor are high, and there is an inability to enforce law. The same is true in cyberwars. Today there
More informationOptimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief
Optimizing Storage for Better TCO in Oracle Environments INFOSTOR Executive Brief a QuinStreet Excutive Brief. 2012 To the casual observer, and even to business decision makers who don t work in information
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationHow To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationInge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
More informationSAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ
SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ (Delta from SPS 08 to SPS 09) SAP HANA Product Management November, 2014 2014 SAP SE or an SAP affiliate company. All rights reserved. 1 Agenda
More informationBig Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
More informationArchiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage
Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage Cyrus Shahabi, Ph.D. Professor of Computer Science & Electrical Engineering Director, Integrated Media Systems Center (IMSC)
More informationSystem Requirements Table of contents
Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5
More informationVeeam Backup and Replication Architecture and Deployment. Nelson Simao Systems Engineer
Veeam Backup and Replication Architecture and Deployment Nelson Simao Systems Engineer Agenda Veeam Backup Server / Proxy Architecture Veeam Backup Server / Backup Proxy Backup Transport Modes Physical
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationHadoop for Enterprises:
Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative
More informationMigrating SaaS Applications to Windows Azure
Migrating SaaS Applications to Windows Azure Lessons Learned 04.04.2012 Speaker Introduction Deepthi Raju Marketing Technology Services Deepthi joined Smartbridge in 2005 and has over twenty years of technology
More informationDeveloping Relevant Dining Visits with Oracle Advanced Analytics Olive Garden s transition toward tailoring guests experiences
Developing Relevant Dining Visits with Oracle Advanced Analytics Olive Garden s transition toward tailoring guests experiences Matt Fritz Senior Data Scientist Business Challenge Darden comprises several
More informationFrom Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationEnabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
More informationOracle: Database and Data Management Innovations with CERN Public Day
Presented to Oracle: Database and Data Management Innovations with CERN Public Day Kevin Jernigan, Oracle Lorena Lobato Pardavila, CERN Manuel Martin Marquez, CERN June 10, 2015 Copyright 2015, Oracle
More informationCloud Computing Backgrounder
Cloud Computing Backgrounder No surprise: information technology (IT) is huge. Huge costs, huge number of buzz words, huge amount of jargon, and a huge competitive advantage for those who can effectively
More informationData Centric Computing Revisited
Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data
More informationHow Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
More informationPreview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationSAP HANA Reinventing Real-Time Businesses through Innovation, Value & Simplicity. Eduardo Rodrigues October 2013
Reinventing Real-Time Businesses through Innovation, Value & Simplicity Eduardo Rodrigues October 2013 Agenda The Existing Data Management Conundrum Innovations Transformational Impact at Customers Summary
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationGraph Database Performance: An Oracle Perspective
Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More information