Your Best Next Business Solution Big Data In R 24/3/2010
|
|
|
- Norah Horton
- 10 years ago
- Views:
Transcription
1 Your Best Next Business Solution Big Data In R 24/3/2010
2 Big Data In R R Works on RAM Causing Scalability issues Maximum length of an object is 2^31-1 Some packages developed to help overcome this problem Source:
3 Big Data In R RODBC Package biglm Package ff Package bigmemory package snow package Source:
4 RODBC Package Connecting to external DB from R to retrieve and handle data stored in the DB RODBC package support connection to SQL-based database (DBMS) such as: Oracle, SQL Server, SQLite, MySQL and more Require an ODBC driver which usually comes with the DBMS Windows offer an ODBC driver to flat files and Excel Supports client-server architecture Source:
5 # Defining DSN: > odbcdriverconnect() RODBC Package
6 # Defining DSN: > odbcdriverconnect() RODBC Package
7 RODBC Package # Main Commands: odbcconnect(dsn, uid = "", pwd = "",...) odbcgetinfo(channel) sqlcolumns(channel, sqtable, ) sqlfetch(channel, sqtable,..., colnames = FALSE, rownames = TRUE) sqlquery(channel, query, errors = TRUE,..., rows_at_time)
8 RODBC Package # Open connection: > xf <- odbcconnect(dsn="sql Server") # Read table: > go<-sqlfetch(xf,sqtable="adult,colnames=t) > str(go) data.frame': obs. of 15 variables:. # Alternatively: > go1 <- sqlquery(xf, "select * from samples.dbo.adult") > str(go1) data.frame': obs. of 15 variables:.
9 RODBC Package This allow you to run any SQL command on the database > sqlquery(xf, "CREATE TABLE Rdemo (id INT IDENTITY,x1 float,x2 float)") character(0) > sqlcolumns(xf,"rdemo") TABLE_CAT TABLE_SCHEM TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME 1 Samples dbo RDemo Id 4 int identity 2 Samples dbo RDemo x1 6 float 3 Samples dbo RDemo x2 6 float..
10 RODBC Package We can use R to run processes which are difficult or impossible in DBMS Example: calculate lag values > for (i in 1:10) > { > LagRDemo<-sqlQuery(xf,paste("SELECT * FROM Rdemo WHERE Id BETWEEN ",(100000*(i-1)-10)," AND ",(100000*i), " ORDER BY Id")) > add.val<-(i!=1)*10 >LagRDemo=cbind(LagRDemo[(add.val+1):(add.val ),],lagDF(La grdemo,10)[(add.val+1):(add.val ),2:3]) > sqlsave(xf, LagRDemo, append = (i!=1), rownames=f) > }
11 Example: calculate lag values RODBC Package
12 Biglm Package Building Generalized linear models on big data Loading data into memory in chunks Processing the last chunk and updating the sufficient statistic required for the model Disposes the last chunk and loading the next chunk Repeats until end of file
13 Biglm Package library(biglm) make.data<-function(filename, chunksize,...){ conn<-null function(reset=false){ if(reset){ if(!is.null(conn)) close(conn) conn<<- file (description=filename, open="r") } else{ rval<-read.csv(conn, nrows=chunksize,...) if (nrow(rval)==0) { close(conn) conn<<-null rval<-null } return(rval) }}}
14 Biglm Package > airpoll<-make.data("c:\\rdemo.txt",chunksize=100000,colclasses = list ("numeric","numeric","numeric"),col.names = c("id","x1","x2")) > lmrdemo <-bigglm(id~x1+x2,data=airpoll) >summary(lmrdemo) Large data regression model: bigglm(id ~ x1 + x2, data = airpoll) Sample size = 1e+06 Coef (95% CI) SE p (Intercept) x x
15 ff Package One of the main problems when dealing with large data set in R is memory limitations On 32-bit OS the maximum amount of memory (i.e. virtual memory space) is limited to 2-4 GB Therefore, one cannot store larger data into memory It is impracticable to handle data that is larger than the available RAM for it drastically slows down performance.
16 ff Package The ff package offers file-based access to data sets that are too large to be loaded into memory, along with a number of higher-level functions. It provides Memory-efficient storage of large data on disk and fast access functions. The ff package provides data structures that are stored on disk but behave as if they were in RAM by transparently mapping only a section (pagesize) in main memory A solution to the memory limitation problem is given by considering only parts of the data at a time, i.e. instead of loading the entire data set into memory only chunks thereof are loaded upon request
17 Source: ff Package
18 ff Package
19 ff Package > library ( ff ) > N <- 1,000 # sample size # > n <- 100 # chunk size # > years < : 2009 > types <- factor ( c ( " A ", " B "," C " ) ) # Creating a ( one-dimensional ) flat file : > Year <- ff ( years, vmode = 'ushort', length = N, update = FALSE, + filename = " d:/tmp/year.ff ", finalizer = "close" ) > Year ff (open) ushort length=1000 (1000) [1] [2] [3] [4] [5] [996] [997] [998] [999] [1000] :
20 ff Package # Modifying data: > for ( i in chunk ( 1, N, n ) ) + Year [ i ] <- sample ( years, sum ( i ), TRUE ) > Year ff (open) ushort length=1000 (1000) [1] [2] [3] [4] [996] [997] [998] [999] [1000] :
21 ff Package # And the same for : Type > Type <- ff ( types, vmode = 'quad', length = N, update =FALSE, + filename = " d:/tmp/type.ff ", finalizer = "close" ) > for ( i in chunk ( 1, N, n ) ) + Type [ i ] <- sample ( types, sum ( i ), TRUE ) > Type ff (open) quad length=1000 (1000) levels: A B C [1] [2] [3] [4] [996] [997] [998] [999] [1000] A A B B : C C B A C
22 ff Package # create a data.frame # > x <- ffdf ( Year = Year, Type = Type ) > x ffdf (all open) dim=c(1000,2), dimorder=c(1,2) row.names=null ffdf data Year Type A A B B : : : C C B A C >
23 ff Package The data used: ASA 2009 Data Expo: Airline on-time performance The data consisted of details of flight arrival and departure for all commercial flights within the USA, from October 1987 to April Nearly 120 million records, 29 variables (mostly integer-valued) > x <- read.big.matrix( AirlineDataAllFormatted.csv, header=true, + type= integer, backingfile= airline.bin, descriptorfile= airline.desc, + extracols= age ) Source:
24 ff Package The challenge: find min() on extracted first column; # With ff: > system.time(min(z[,1], na.rm=true)) user system elapsed > system.time(min(z[,1], na.rm=true)) user system elapsed # With bigmemory: > system.time(min(x[,1], na.rm=true)) user system elapsed > system.time(min(x[,1], na.rm=true)) user system elapsed Source:
25 ff Package The challenge: random extractions > theserows <- sample(nrow(x), 10000) > thesecols <- sample(ncol(x), 10) # With ff: > system.time(a <- z[theserows, thesecols]) user system elapsed # With bigmemory: > system.time(a <- x[theserows, thesecols]) user system elapsed Source:
26 bigmemory Package An R package which allows powerful and memory-efficient parallel analyses and data mining of massive data sets. Permits storing large objects (matrices etc.) in memory (on the RAM) using external pointer objects to refer to them. The data sets may also be file-backed, to easily manage and analyze data sets larger than available RAM. Several R processes on the same computer can also share big memory objects.
27 bigmemory Package BigMemory creates a variable X <- big.martix, such that X is a pointer to the dataset that is saved in the RAM or on the hard drive. When a big.matrix, X, is passed as an argument to a function, it is essentially providing call-byreference rather than call-by-value behavior Backingfile is the root name for the file(s) for the cache of X. Descriptorfile the name of the file to hold the filebacked description, for subsequent use with attach.big.matrix attach.big.matrix creates a new big.matrix object which references previously allocated shared memory or file-backed matrices.
28 bigmemory Package The default big.matrix is not shared across processes and is limited to available RAM. A shared big.matrix has identical size constraints as the basic big.matrix, but may be shared across separate R processes. A file-backed big.matrix may exceed available RAM by using hard drive space, and may also be shared across processes. big.matrix ( nrow, ncol, type = integer,.) shared.big.matrix ( nrow, ncol, type = integer,.) filedbacked.big.matrix ( nrow, ncol, type = integer,.) read.big.matrix ( filename, sep=,. )
29 bigmemory Package > library ( bigmemory) # Creating A new BigMemory object > X <- read.big.matrix ( BigMem.csv,type = double, backingfile = BigMem.bin, descriptorfile = BigMem.desc, shared = TRUE) > X An object of class big.matrix Slot "address": <pointer: 0x0196d058>
30 bigmemory Package > X [ 1:4, 1:4 ] [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] # Creating an existing BigMemory object on a different machine > Y <- attach.big.matrix ( "BigMem.desc" ) > Y An object of class big.matrix Slot "address": <pointer: 0x01972b18>
31 bigmemory Package > X [1,1] = 1111 > X [ 1:4, 1:4 ] [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] # On different R: > Y [ 1:4, 1:4 ] [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,]
32 bigmemory Package > Z <- shared.big.matrix ( 1000,70, type = double ) > describe( Z ) $sharedtype [1] "SharedMemory" $sharedname [1] "d177ab0c-348c-484e-864f e" $nrow [1] 1000 $ncol [1] 70 $rownames NULL $colnames NULL $type [1] "double"
33 snow Package Simple Network of Workstations An R package which supports simple parallel computing. The package provides high-level interface for using a workstation cluster for parallel computations in R. Snow relies on the Master/Slave model of communcation: One device (master) controls one or more other devices (slaves) Note: communication is orders of magnitude slower than computation. For efficient parallel computing a dedicated high-speed network is needed.
34 snow Package # Starting and Stopping clusters: The way to Initialize slave R processes depends on system configuration, for example: > cl <- makecluster( 2, type = SOCK" ) # Shut down the cluster and clean up any remaining connections between machines: > stopcluster ( cl )
35 snow Package clustercall ( cl, fun,...) clustercall calls a specified function with identical arguments on each node in the cluster. The arguments to clustercall are evaluated on the master, their values transmitted to the slave nodes which execute the function call. > myfunc <- function ( x ) { x + 1 } > myfunc_argument <- 5 > clustercall ( cl, myfunc, myfunc_argument ) [[1]] [1] 6 [[2]] [1] 6
36 snow Package Example: simulate random numbers > clusterapply(cl,c(4,2),runif) ]]1[[ ]]2[[ ]1[ ]1[ > system.time(clusterapply(cl,c( , ),runif)) user system elapsed > system.time(runif( )) user system elapsed
Deal with big data in R using bigmemory package
Deal with big data in R using bigmemory package Xiaojuan Hao Department of Statistics University of Nebraska-Lincoln April 28, 2015 Background What Is Big Data Size (We focus on) Complexity Rate of growth
rm(list=ls()) library(sqldf) system.time({large = read.csv.sql("large.csv")}) #172.97 seconds, 4.23GB of memory used by R
Big Data in R Importing data into R: 1.75GB file Table 1: Comparison of importing data into R Time Taken Packages Functions (second) Remark/Note base read.csv > 2,394 My machine (8GB of memory) ran out
Big Data and Parallel Work with R
Big Data and Parallel Work with R What We'll Cover Data Limits in R Optional Data packages Optional Function packages Going parallel Deciding what to do Data Limits in R Big Data? What is big data? More
The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files
UseR! 2007, Iowa State University, Ames, August 8-108 2007 The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files D. Adler, O. Nenadić, W. Zucchini, C. Gläser Institute
MANIPULATION OF LARGE DATABASES WITH "R"
MANIPULATION OF LARGE DATABASES WITH "R" Ana Maria DOBRE, Andreea GAGIU National Institute of Statistics, Bucharest Abstract Nowadays knowledge is power. In the informational era, the ability to manipulate
High-Performance Processing of Large Data Sets via Memory Mapping A Case Study in R and C++
High-Performance Processing of Large Data Sets via Memory Mapping A Case Study in R and C++ Daniel Adler, Jens Oelschlägel, Oleg Nenadic, Walter Zucchini Georg-August University Göttingen, Germany - Research
Package Rdsm. February 19, 2015
Version 2.1.1 Package Rdsm February 19, 2015 Author Maintainer Date 10/01/2014 Title Threads Environment for R Provides a threads-type programming environment
Package RODBC. R topics documented: June 29, 2015
Version 1.3-12 Revision $Rev: 3446 $ Date 2015-06-29 Title ODBC Database Access An ODBC database interface. Package RODBC June 29, 2015 SystemRequirements An ODBC3 driver manager and drivers. Depends R
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Large Datasets and You: A Field Guide
Large Datasets and You: A Field Guide Matthew Blackwell [email protected] Maya Sen [email protected] August 3, 2012 A wind of streaming data, social data and unstructured data is knocking at
Knocker main application User manual
Knocker main application User manual Author: Jaroslav Tykal Application: Knocker.exe Document Main application Page 1/18 U Content: 1 START APPLICATION... 3 1.1 CONNECTION TO DATABASE... 3 1.2 MODULE DEFINITION...
Database-driven library system
Database-driven library system Key-Benefits of CADSTAR 12.1 Characteristics of database-driven library system KEY-BENEFITS Increased speed when searching for parts You can edit/save a single part (instead
Scalable Strategies for Computing with Massive Data: The Bigmemory Project
Scalable Strategies for Computing with Massive Data: The Bigmemory Project John W. Emerson and Michael J. Kane http://www.bigmemory.org/ http://www.stat.yale.edu/~jay/ Associate Professor of Statistics,
Streaming Data, Concurrency And R
Streaming Data, Concurrency And R Rory Winston [email protected] About Me Independent Software Consultant M.Sc. Applied Computing, 2000 M.Sc. Finance, 2008 Apache Committer Working in the financial
Using IRDB in a Dot Net Project
Note: In this document we will be using the term IRDB as a short alias for InMemory.Net. Using IRDB in a Dot Net Project ODBC Driver A 32-bit odbc driver is installed as part of the server installation.
Taking R to the Limit: Part II - Large Datasets
Taking R to the Limit, Part II: Working with Large Datasets August 17, 2010 The Brutal Truth We are here because we love R. Despite our enthusiasm, R has two major limitations, and some people may have
StoreGrid Backup Server With MySQL As Backend Database:
StoreGrid Backup Server With MySQL As Backend Database: Installing and Configuring MySQL on Windows Overview StoreGrid now supports MySQL as a backend database to store all the clients' backup metadata
DBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
Package biganalytics
Package biganalytics February 19, 2015 Version 1.1.1 Date 2012-09-20 Title A library of utilities for big.matrix objects of package bigmemory. Author John W. Emerson and Michael
Tier Architectures. Kathleen Durant CS 3200
Tier Architectures Kathleen Durant CS 3200 1 Supporting Architectures for DBMS Over the years there have been many different hardware configurations to support database systems Some are outdated others
Tableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
Computing with large data sets
Computing with large data sets Richard Bonneau, spring 009 mini-lecture 1(week 7): big data, R databases, RSQLite, DBI, clara different reasons for using databases There are a multitude of reasons for
The Saves Package. an approximate benchmark of performance issues while loading datasets. Gergely Daróczi [email protected].
The Saves Package an approximate benchmark of performance issues while loading datasets Gergely Daróczi [email protected] December 27, 2013 1 Introduction The purpose of this package is to be able
Connect to MySQL or Microsoft SQL Server using R
Connect to MySQL or Microsoft SQL Server using R 1 Introduction Connecting to a MySQL database or Microsoft SQL Server from the R environment can be extremely useful. It allows a research direct access
Fact Sheet In-Memory Analysis
Fact Sheet In-Memory Analysis 1 Copyright Yellowfin International 2010 Contents In Memory Overview...3 Benefits...3 Agile development & rapid delivery...3 Data types supported by the In-Memory Database...4
Semiparametric Regression of Big Data in R
Semiparametric Regression of Big Data in R Nathaniel E. Helwig Department of Statistics University of Illinois at Urbana-Champaign CSE Big Data Workshop: May 29, 2014 Nathaniel E. Helwig (University of
Journal of Statistical Software
JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ The R Package bigmemory: Supporting Efficient Computation and Concurrent Programming with Large Data Sets.
ADAM 5.5. System Requirements
ADAM 5.5 System Requirements 1 1. Overview The schema below shows an overview of the ADAM components that will be installed and set up. ADAM Server: hosts the ADAM core components. You must install the
Hands-On Data Science with R Dealing with Big Data. [email protected]. 27th November 2014 DRAFT
Hands-On Data Science with R Dealing with Big Data [email protected] 27th November 2014 Visit http://handsondatascience.com/ for more Chapters. In this module we explore how to load larger datasets
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
FileMaker 13. ODBC and JDBC Guide
FileMaker 13 ODBC and JDBC Guide 2004 2013 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.
Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute
Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute JMP provides a variety of mechanisms for interfacing to other products and getting data into JMP. The connection
Database Administration with MySQL
Database Administration with MySQL Suitable For: Database administrators and system administrators who need to manage MySQL based services. Prerequisites: Practical knowledge of SQL Some knowledge of relational
Performance And Scalability In Oracle9i And SQL Server 2000
Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability
Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.
Portable Scale-Out Benchmarks for MySQL MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Continuent 2008 Agenda / Introductions / Scale-Out Review / Bristlecone Performance Testing Tools /
Scalable Strategies for Computing with Massive Data: The Bigmemory Project
Scalable Strategies for Computing with Massive Data: The Bigmemory Project Jay Emerson 1 and Mike Kane 2 http://www.bigmemory.org/ (1) http://www.stat.yale.edu/~jay/, @johnwemerson Associate Professor
SQL - QUICK GUIDE. Allows users to access data in relational database management systems.
http://www.tutorialspoint.com/sql/sql-quick-guide.htm SQL - QUICK GUIDE Copyright tutorialspoint.com What is SQL? SQL is Structured Query Language, which is a computer language for storing, manipulating
Semiparametric Regression of Big Data in R
Semiparametric Regression of Big Data in R Nathaniel E. Helwig Department of Statistics University of Illinois at Urbana-Champaign CSE Big Data Workshop: May 29, 2014 Nathaniel E. Helwig (University of
FileMaker 12. ODBC and JDBC Guide
FileMaker 12 ODBC and JDBC Guide 2004 2012 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.
Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap. 1 Oracle9i Documentation First-Semester 1427-1428 Definitions
Raima Database Manager Version 14.0 In-memory Database Engine
+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized
Vectorwise 3.0 Fast Answers from Hadoop. Technical white paper
Vectorwise 3.0 Fast Answers from Hadoop Technical white paper 1 Contents Executive Overview 2 Introduction 2 Analyzing Big Data 3 Vectorwise and Hadoop Environments 4 Vectorwise Hadoop Connector 4 Performance
Architecting the Future of Big Data
Hive ODBC Driver User Guide Revised: July 22, 2013 2012-2013 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration
Real Life Performance of In-Memory Database Systems for BI
D1 Solutions AG a Netcetera Company Real Life Performance of In-Memory Database Systems for BI 10th European TDWI Conference Munich, June 2010 10th European TDWI Conference Munich, June 2010 Authors: Dr.
Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
Using ODBC with MDaemon 6.5
Using ODBC with MDaemon 6.5 Alt-N Technologies, Ltd 1179 Corporate Drive West, #103 Arlington, TX 76006 Tel: (817) 652-0204 2002 Alt-N Technologies. All rights reserved. Other product and company names
How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze
Data Warehouse and Hive Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Decision support systems Decision Support Systems allowed managers, supervisors, and executives to once again see the
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
Architecting the Future of Big Data
Hive ODBC Driver User Guide Revised: July 22, 2014 2012-2014 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
MySQL Storage Engines
MySQL Storage Engines Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels
HOUG Konferencia 2015. Oracle TimesTen In-Memory Database and TimesTen Application-Tier Database Cache. A few facts in 10 minutes
HOUG Konferencia 2015 Oracle TimesTen In-Memory Database and TimesTen Application-Tier Database Cache A few facts in 10 minutes [email protected] What is TimesTen An in-memory relational database
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5
ODBC Driver Version 4 Manual
ODBC Driver Version 4 Manual Revision Date 12/05/2007 HanDBase is a Registered Trademark of DDH Software, Inc. All information contained in this manual and all software applications mentioned in this manual
Querying Databases Using the DB Query and JDBC Query Nodes
Querying Databases Using the DB Query and JDBC Query Nodes Lavastorm Desktop Professional supports acquiring data from a variety of databases including SQL Server, Oracle, Teradata, MS Access and MySQL.
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
HP POLYSERVE SOFTWARE
You can read the recommendations in the user guide, the technical guide or the installation guide for HP POLYSERVE SOFTWARE. You'll find the answers to all your questions on the HP POLYSERVE SOFTWARE in
Replicating to everything
Replicating to everything Featuring Tungsten Replicator A Giuseppe Maxia, QA Architect Vmware About me Giuseppe Maxia, a.k.a. "The Data Charmer" QA Architect at VMware Previously at AB / Sun / 3 times
DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
W H I T E P A P E R : T E C H N I C A L. Understanding and Configuring Symantec Endpoint Protection Group Update Providers
W H I T E P A P E R : T E C H N I C A L Understanding and Configuring Symantec Endpoint Protection Group Update Providers Martial Richard, Technical Field Enablement Manager Table of Contents Content Introduction...
HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?
FileMaker 11. ODBC and JDBC Guide
FileMaker 11 ODBC and JDBC Guide 2004 2010 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker is a trademark of FileMaker, Inc. registered
A survey of big data architectures for handling massive data
CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - [email protected] Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context
Introduction to SQL for Data Scientists
Introduction to SQL for Data Scientists Ben O. Smith College of Business Administration University of Nebraska at Omaha Learning Objectives By the end of this document you will learn: 1. How to perform
Data warehousing with PostgreSQL
Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
ODBC Chapter,First Edition
1 CHAPTER 1 ODBC Chapter,First Edition Introduction 1 Overview of ODBC 2 SAS/ACCESS LIBNAME Statement 3 Data Set Options: ODBC Specifics 15 DBLOAD Procedure: ODBC Specifics 25 DBLOAD Procedure Statements
ILMT Central Team. Performance tuning. IBM License Metric Tool 9.0 Questions & Answers. 2014 IBM Corporation
ILMT Central Team Performance tuning IBM License Metric Tool 9.0 Questions & Answers ILMT Central Team Contact details [email protected] https://ibm.biz/ilmt_forum https://ibm.biz/ilmt_wiki https://ibm.biz/ilmt_youtube
Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices
Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal
Chapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
Big data in R EPIC 2015
Big data in R EPIC 2015 Big Data: the new 'The Future' In which Forbes magazine finds common ground with Nancy Krieger (for the first time ever?), by arguing the need for theory-driven analysis This future
Phire Architect Hardware and Software Requirements
Phire Architect Hardware and Software Requirements Copyright 2014, Phire. All rights reserved. The Programs (which include both the software and documentation) contain proprietary information; they are
RTI Database Integration Service. Getting Started Guide
RTI Database Integration Service Getting Started Guide Version 5.2.0 2015 Real-Time Innovations, Inc. All rights reserved. Printed in U.S.A. First printing. June 2015. Trademarks Real-Time Innovations,
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Developing an ODBC C++ Client with MySQL Database
Developing an ODBC C++ Client with MySQL Database Author: Rajinder Yadav Date: Aug 21, 2007 Web: http://devmentor.org Email: [email protected] Assumptions I am going to assume you already know how
Classroom Demonstrations of Big Data
Classroom Demonstrations of Big Data Eric A. Suess Abstract We present examples of accessing and analyzing large data sets for use in a classroom at the first year graduate level or senior undergraduate
Report Paper: MatLab/Database Connectivity
Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been
Getting Started with RES ONE Automation 2015
Getting Started with RES ONE Automation 2015 Disclaimer Whilst every care has been taken by RES Software to ensure that the information contained in this document is correct and complete, it is possible
Tableau Server Scalability Explained
Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted
What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World
COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan [email protected] What is a database? A database is a collection of logically related data for
Installing and Configuring MySQL as StoreGrid Backend Database on Linux
Installing and Configuring MySQL as StoreGrid Backend Database on Linux Overview StoreGrid now supports MySQL as a backend database to store all the clients' backup metadata information. Unlike StoreGrid
Aradial Installation Guide
Aradial Technologies Ltd. Information in this document is subject to change without notice. Companies, names, and data used in examples herein are fictitious unless otherwise noted. No part of this document
SQL Server An Overview
SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system
Using SAS as a Relational Database
Using SAS as a Relational Database Yves DeGuire Statistics Canada Come out of the desert of ignorance to the OASUS of knowledge Introduction Overview of relational database concepts Why using SAS as a
Package bigrf. February 19, 2015
Version 0.1-11 Date 2014-05-16 Package bigrf February 19, 2015 Title Big Random Forests: Classification and Regression Forests for Large Data Sets Maintainer Aloysius Lim OS_type
