MANIPULATION OF LARGE DATABASES WITH "R"

Size: px
Start display at page:

Download "MANIPULATION OF LARGE DATABASES WITH "R""

Transcription

1 MANIPULATION OF LARGE DATABASES WITH "R" Ana Maria DOBRE, Andreea GAGIU National Institute of Statistics, Bucharest Abstract Nowadays knowledge is power. In the informational era, the ability to manipulate large datasets is essential for making long term strategies. More and more companies and official statistics offices need to use large databases but also by increasing the volume of data it increases the complexity of manipulating it. A basic economy rule is that the supply follows the demand; hereby, much software for manipulating big data was created in order to follow the necessity. The aim of this paper is to underline the performance of R in manipulating large databases. Regarding to this, the paper is primarily intended for people already familiar with common databases and statistical concepts. The paper illustrates as its best how simply R - both open-source and commercial versions is used to handle large databases manipulation. Keywords: R statistical software, data manipulation, big data, large databases, statistics JEL Classification: C44, C61, C82, C87 Introduction By big data shall mean the huge volumes of information that the new technologies or the companies collect and register about individuals or processes. Actually, big data is any dataset or data application that does not fit into available RAM. Analysing data is fun for sure. Analysing big data is full of challenges, almost fascinating, but not so fun. It depends on how the used software follows the necessities. In this paper we assume that R both free open-source and commercial versions is able to handle easy large databases. R is widely used in every field where is data academia, business, official statistics and so on. Since many R users have very large computational needs on big data, various tools for manipulation of large databases were developed. By manipulation we aim to bind together analysis, data mining, computations, visualization and much more. 1. Literature review The importance of using single software which is able to perform all the stages of data analysis was firstly shown by Hodgess (2004) for some models in SAS and FORTRAN programming or a combination of Excel, FORTRAN and SAS. Currently, R software packages can make almost all type of data analysis, like plots, cluster analysis, decomposition, sampling analysis, mapping, statistical regression and forecasts. In the last years, one of the major problems has been manipulating data from large datasets. Initially, computers could barely read in a large dataset, so whether it could be displayed. Gradually, computers have been able to handle larger and larger datasets. The book "Graphics of Large Datasets: Visualizing a Million" (Unwin, Theus, Hoffman, 2006) contributes to an overlook of understanding and knowledge of graphics and data analysis referring to large databases. Revolution R Enterprise is built upon the powerful open source 62

2 R statistics language. Its message is "100% R and More". Revolution Analytics is the commercial version of R software for organizations and large-scale research; yet it is available for free for students, professors, researchers and open source users. With commercial enhancements and professional support for real-world use, it brings higher performance, greater scalability, and stronger reliability to R at a fraction of the cost of other commercial products like SPSS or SAS. Regarding to this, R's popularity has grown in recent years and the trend is favorable, the estimations showing that in about three years will exceed the number of users of SAS and SPSS. For the period May 2010-May 2012, R was ranked first by 30% of respondents (Muenchen, 2012). 2. Tools for manipulation of large databases in R First of all we will introduce some basic concepts about packages (Adler, 2010). Packages in R are a collection of previously programmed functions. They include functions for specific tasks. There are two types of packages: those that come with the base installation of R and packages that should be manually downloaded and installed. The base installation refers to the big executable file that we download and install. The base version contains the most common packages. To see which packages are already installed, we have to click Packages -> Load package. There are hundreds of user-contributed packages that are not part of the base installation, most of which are available on the R website. There are many packages available that will allow the user to execute the same statistical calculations as commercial ones Packages included with the base installation Loading a package that came with the base installation may be done either by a mouse click or by entering a specific command. The user can click Packages->Load package and select a package. The other method is to use the command library. For instance, to load the MASS package, the user should type the command: > library(mass) This command gives access to all functions in the MASS package Packages not included with the base installation Sometimes the process of loading a package is slightly more complicated. For instance, we consider a paper in which data are plotted versus their spatial locations (latitude and longitude) and the size of the dots is proportional to the data values. The text states that the graph was created with the bubble function from the gstat package. If we click Packages- >Load package, we will not find gstat. If a package does not appear in the list, it has not been installed. Hence this method can also be used to determine whether a package is part of the base installation. To obtain and install gstat, or any other available package, we can download the zipped package from the R website and require R to install it, or we can install it from within R Loading the package There is a difference between installing and loading. Install denotes adding the package to the base version of R. Loading refers to the full-access of all the functions in the package. The user cannot load a package if it is not installed. To load the gstat package it can be used one of the two methods described above. Once it has been loaded,?bubble will give instructions for using the function. We have summarised the process of installing and loading packages. If a package is part of the base installation, or has previously been installed, we 63

3 should use the library function. If a package s operation depends upon other packages, they will be automatically loaded, provided they have been installed. If not, they can be manually installed The quality of the packages Some of the packages contain hundreds of functions written by leading scientists in their field, who have often written a book in which the methods are described. Other packages contain only a few functions that may have been used in a published paper. Hence, there are packages from a range of contributors from the enthusiastic PhD student to the professor who has published ten books. Every package is a research project reviewed at academic level (Caragea, Alexandru, Dobre, 2012). There are lots of packages handling the data when the data is small enough. Things get complicated when big data is involved. There are several approaches when it comes to huge amounts of data. Below we explained the mechanism of R in handling data. R reads data into RAM all at once, if using the usual read.table function. Objects in R live in memory entirely. Keeping unnecessary data in RAM will cause R to choke eventually. Specifically, on most systems it is not possible to use more than 2GB of memory; the range of indexes that can be used is limited due to lack of a 64 bit integer data type in R and R64; on 32 bit systems, the maximum amount of virtual memory space is limited to between 2 and 4GB. There are three major solutions in R: - bigmemory: It is ideal for problems involving the analysis in R of manageable subsets of the data, or when an analysis is conducted mostly in C++. It is part of the big family, some of which we will discuss in this study (bigmemory package, biglm package, snow package, bigdata package) - ff: file-based access to datasets that cannot fit in memory (ff Package) - the possibility to use databases which provide fast read/write access for data analysis (RODBC Package, DBI package) 2.5. Packages that handle big data 1. bigmemory package is part of Bigmemory Project. Bigmemory and related packages biganalytics, synchronicity, bigtabulate and bigalgebra bridge this gap, implementing massive matrices and supporting their manipulation and exploration. Bigmemory implements several matrix objects: big.matrix (an object that simply points to a data structure in C++), shared.big.matrix (similar to big.matrix, but can be shared among multiple R processes) and filebacked.big.matrix (it does not point to a data structure; rather, it points to a file on disk containing the matrix, and the file can be shared across a cluster). Shared memory allows us to store data in RAM and share it among multiple processes. Suppose we want to store some data in shared memory so it can be read by multiple instances of R. This allows the user the ability to use multiple instances of R for performing different analytics simultaneously. The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set. The data structures may also be lebacked, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster. Among the advantages of using bigdata package, we mention the possibility to: store a matrix in memory, restart R, and gain access to the matrix without reloading data; share the matrix among multiple R instances or sessions. There are several disadvantages like: no communication among instances of R; limited by available RAM, unless using filebacked.big.matrix; matrix disappears on reboot, unless using filebacked.big.matrix. 2. ff Package is a solution that is based on using files. It provides data structures that are stored on disk. These data structures act as if they are in memory; only necessary/active parts 64

4 of the data from disk are mapped into main memory. The package supports R standard atomic types: double, logical, raw, integer, as well as non-standard atomic types and non-atomic types. ff has some special features, differentiating itself from bigmemory: support for arrays and data frames, not "just" matrices; a good indexing for improved random access performance; fast ltering of data frames via package bit. 3. biglmpackage. is a great alternative to performing usual logistic regression analyses on big data. Some general features are the following: building generalized linear models on big data; loading data into memory in chunks; processing the last chunk and updating the sufficient statistic; required for the model; disposes the last chunk and loading the next chunk; repeats until end of file. Revolution R Enterprise with RevoScaleR. The main embarrassment is that R is a memory-bound language. All data used in calculations - vectors, matrices, lists etc. - need to be held in memory. Even for modern computers with 64-bit address spaces and huge amounts of RAM, dealing with data sets that are tens of gigabytes and hundreds of millions of rows (or larger) can present a significant challenge. The problem is not only about capacity, but also about accommodating data in the memory for the analysis. Revolution Analytics has the solution with its initiative to extend the reach of R into the branch of production data analysis with terabyte-class data sets. With RevoScaleR integrated package, R users can process, visualize and model their largest data sets in a fraction of the time of legacy systems, without the need to deploy expensive or specialized hardware. RevoScaleR provides a new data file type with extension.xdf that has been optimized for accessing parts of an Xdf file for independent processing. RevoScaleR provides also a new R class, RxDataSource, which has been designed to support the use of external memory algorithms with.xdf files. (Rickert, 2011) Oracle R Enterprise. Oracle R Enterprise, a component of the Oracle Advanced Analytics Option, enables users to run R commands on database-resident data, develop and refine R scripts, and leverage the parallelism and scalability of the database. Data analysis can run the latest R open source packages and develop-without need of SQL knowledge. Oracle R Enterprise enables the ability to use R console or any R GUI/IDE. The authors recommend installing the packages as soon as the user needs them in order to avoid getting the error Error: cannot allocate vector of size X Mb. Otherwise, if the user installs all the packages immediately after running R, the RAM memory will be too busy and R won t be able to allocate memory when running the commands in the script. 3. SWOT analysis for R in large data manipulation In the following we considered interesting to outline a SWOT analysis for R regarding large data manipulation in order to make clear the possibilities R has or not Strengths For R in general: Open-source program with improved commercial version The cost of using R are related only with training of users Working of various operating systems: Windows, Linux, Mac OSX Easy to install and configure Mature state of the art programming language: mix-and-match models, scripts and packages for the best results It has several a few packages for big data support For Revolution Analytics in particular: The cost of using Revolution Analytics is very small against other similar software (see Table 1) 65

5 Process, visualize and model terabyte-class data sets in a fraction of the time of legacy products without requiring expensive or specialized hardware Using R on multiple cores on a single computer The ability to process more data than can fit into memory by creating XDFfiles Optimizes the process of streaming data from disk to memory, dramatically reducing the time needed for statistical analysis of large data sets Using only a commodity multi-processor computer with modest amounts of RAM, data processing and predictive modelling techniques can easily be performed on data sets with hundreds of millions of rows and hundreds of variables Easy to cut down the computation time for big data analytics simply by scaling with compute nodes Reduces processing time by extending the system to a small cluster of similar computers commensurately The amount of data conversion and copying is minimized, saving time and speed New variables and rows can be added without needing to rewrite the entire file Efficient parallelization of statistical and data-mining algorithms Multiple models can be analyzed jointly Descriptive Statistics and Cross Tabs on very large data sets Statistical Modeling on very large data sets Possibility of detecting collinearities in models, which can lead to wasted computations or even computational failures, and can remove them prior to doing any computations Support for relational databases Efficient object indexing Oracle Integration via Oracle R Enterprise Has added support for native SAS file formats and conversion to XDF 3.2. Weaknesses R reads data into memory by default R is not wise enough to use more memory than available Regardless of the number of cores on CPU, R will only use one on a default build The range of indexes that can be used is limited due to lack of a 64 bit integer data type in R 3.3. Opportunities The RevoScaleR package provides external memory algorithms that help R break through the memory/performance barrier RevoScaleR functions are fast and efficient enabling real data analysis to be performed on a million row, 13GB data set on a common dual core laptop All of the RevoScaleR statistical functions produce objects that may be used as input to standard R functions Extensible Programming Framework: Advanced R users can write their own functions to exploit the capabilities of the XDF files and RxdataSource objects Re-use and reproduce new discovered techniques on analytic operations that the user is going to perform this is difficult in SAS or SPSS No one has commercialised the PSPP open-source alternative to SPSS - like Revolution Analytics made with open-source R Revolution Analytics added support for native SAS file formats and conversion to XDF 66

6 3.4. Threats Other packages and programming languages (SAS particularly) can read data from files on demand SAS is based almost entirely on external memory algorithms Even without a capacity limit, computations may be too slow to be useful. Table 1. Comparison of license prices for commonly used software for analysing big data Software License Price Revolution Analytics $ 1000 Oracle Advanced Analytics $ 23,000/CPU and $ 460/ Named User Plus SAS Analytics Pro EUR 5,640 (Commercial/Individual use-1 user license) IBM SPSS Statistics Premium $ 15,800 Source: Processing of the authors based on pricing available on Internet 4. Case study in R The case study is based on a company personnel data base. First, we described how we connected the database to R, then we showed some example of using R to analyse the data Connecting with the database There are essentially two paths to communicate with databases in R. One based on the ODBC protocol and the other is based on the general interface provided by package DBI (R Special Interest Group on Databases, 2009) together with specific packages for each database management system (DBMS). If the user decides to use the ODBC protocol, it is necessary to ensure the communication with the DBMS using this protocol. This may involve installing some drivers on the DBMS side. From the side of R, it is necessary only to install the package RODBC. Package DBI implements a series of database interface functions. These functions are independent of the database server that is actually used to store the data. The user only needs to indicate which communication interface he will use at the first step when he establishes a connection to the database. This means that if DBMS is changed, it is necessary to change a single instruction (the one that specifies the DBMS to communicate with). In order to achieve this independence the user also needs to install other packages that take care of the communication details for each different DBMS. R has many DBMS-specific packages for major DBMSs. Specifically, for communication with a MySQL database stored in some server R owns the package RMySQL Loading the Data into R Running on Windows If R is running on Windows, independently of whether the MySQL database server resides on that same PC or in another computer (eventually running other operating system), the simplest way to connect to the database from R is through the ODBC protocol. In order to use this protocol in R, RODBC package needs to be installed. Before connecting to any MySQL database for the first time using the ODBC protocol, a few extra steps are necessary. Namely, it is necessary to install the MySQL ODBC driver, which is called myodbc and can be downloaded from the MySQL site. This only needs to be done the first time ODBC is used to connect to MySQL. After installing this driver, we can create ODBC connections to MySQL databases residing on the computer. 67

7 According to the ODBC protocol, every database connection created has a name (the Data Source Name, or DSN according to ODBC jargon). This name will be used to access the MySQL database from R. To create an ODBC connection on a Windows PC, we must use a program called ODBC data sources, available at the Windows control panel. After running this program we have to create a new User Data Source using the MySQL ODBC driver (myodbc) that we are supposed to have previously installed. During this creation process, we will be asked several things, such as the MySQL server address, the name of the database to which we want to establish a connection, and the name we give to this connection (the DSN). Once we have completed this process, which we only have to do for the first time, we are ready to connect to this MySQL database from R. After loading the RODBC package, we establish a connection with our database using the previously created DSN, using the function odbcconnect(). We then use one of the functions available to query a table, in this case the sqlfetch() function, which obtains all rows of a table and returns them as a data frame object. The next step is to create an xts object from this data frame using the date information and the quotes. Finally, we close the connection to the database with the odbcclose() function. The following R code establishes a connection to the employees database and loads information about the contained tables. > library(rodbc) > ch <- odbcconnect("employees") > sqltables(ch) TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS 1 employees countries TABLE 2 employees departments TABLE 3 employees employees TABLE 4 employees job_history TABLE 5 employees jobs TABLE 6 employees locations TABLE 7 employees regions TABLE In the following we presented some examples on how to extract data using some SQL queries. > #the geographical location of the departments > qry_dep_country<-"select DepartmentID,DepartmentName, locations.countryisocode from departments inner join locations on departments.locationid=locations.locationid where DepartmentID in (Select DepartmentID from employees group by DepartmentID)" > dep_country<-sqlquery(ch,qry_dep_country) > print(dep_country) DepartmentID DepartmentName CountryISOCode 1 10 Administration US 2 20 Marketing CA 3 30 Purchasing US 4 40 Human Resources UK 5 50 Shipping US 6 60 IT US 7 70 Public Relations DE 8 90 Executive US Finance US Accounting US > #the number of employees grouped by departments and countries 68

8 > specialist_country<-merge(specialists,dep_country) > specialist_country DepartmentID NoEmployees DepartmentName CountryISOCode Administration US Marketing CA Purchasing US Human Resources UK Shipping US IT US Public Relations DE Executive US Finance US Accounting US > #the list of jobs and average payments on each job > qry_jobs_avg_pay<- "select Count(*) as NoEmployees, avg(salary) as avg_salary, JobTitle from jobs inner join employees on jobs.jobcode=employees.jobcode group by jobtitle" > jobs_avg_pay<-sqlquery(ch,qry_jobs_avg_pay) > print(jobs_avg_pay) No Employees avg_salary JobTitle Accountant Accounting Manager Administration Assistant Administration Vice President Finance Manager Human Resources Representative Marketing Manager Marketing Representative President Programmer Public Accountant Public Relations Representative Purchasing Clerk Purchasing Manager Shipping Clerk Stock Clerk Stock Manager > #liniar regresion model on how number of employees varies according to salary and job title > qry_jobs<- "select Count(*) as NoEmployees, Salary, JobTitle from jobs inner join employees on jobs.jobcode=employees.jobcode group by jobtitle, Salary" > jobs<-sqlquery(ch,qry_jobs) > print(jobs) NoEmployees Salary JobTitle Accountant Accountant Accountant Accountant Accountant Accounting Manager Administration Assistant 69

9 Administration Vice President Finance Manager Human Resources Representative Marketing Manager Marketing Representative President Programmer Programmer Programmer Programmer Public Accountant Public Relations Representative Purchasing Clerk Purchasing Clerk Purchasing Clerk Purchasing Clerk Purchasing Clerk Purchasing Manager Shipping Clerk Stock Clerk Stock Clerk Stock Clerk Stock Clerk Stock Clerk Stock Clerk Stock Clerk Stock Clerk Stock Clerk Stock Clerk Stock Manager Stock Manager Stock Manager Stock Manager Stock Manager > summary(biglm(noemployees~salary+jobtitle, jobs)) Large data regression model: biglm(noemployees ~ Salary + JobTitle, jobs) Sample size = 41 Coef (95% CI) SE p (Intercept) Salary JobTitleAccounting Manager JobTitleAdministrationAssistant JobTitleAdministrationVice President JobTitleFinance Manager JobTitleHumanResources Representative JobTitleMarketing Manager JobTitleMarketing Representative JobTitlePresident JobTitleProgrammer JobTitlePublic Accountant

10 JobTitlePublic Relations Representative JobTitlePurchasing Clerk JobTitlePurchasing Manager JobTitleShipping Clerk JobTitleStock Clerk JobTitleStock Manager Conclusion As a conclusion, R is the most powerful software in handling datasets as well as big data. R can make almost all type of data analysis for big data, like plots, cluster analysis, statistical regression and forecasts. The exposed case study is only a very small part of what R is able to perform. One of the biggest advantages is its flexibility and its low price for the commercial version against other similar software. Further research on R and big data should be considered because the possibilities of the software and the requirements of the companies and official statistics offices are spreading continuously. Acknowledgement The present paper is part of a research project of Romanian R-userRs Team (http://www.r-project.ro/). The authors would like to give special gratitude to Nicoleta Caragea and Ciprian Antoniade Alexandru. They provided us their support and guideline in this project. References Adler J., 2010, R in a nutshell, O Reilly Caragea N., Alexandru A.C., Dobre A.M., 2012, "Bringing New Opportunities to Develop Statistical Software and Data Analysis Tools in Romania", The Proceedings of the VI th International Conference on Globalization and Higher Education in Economics and Business Administration, pp Edlefsen L., 2011, "RevoScaleR Speed and Scalability", Revolution Analytics Hodgess E., 2004, "A Computer Evolution in Teaching Undergraduate Time Series", Journal of Statistics Education, 12 (3), Muenchen R., 2012, The Popularity of Data Analysis Software, popularity/ R Development Core Team, 2005, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. Rickert J., 2011, "Big Data Analysis with Revolution R Enterprise", Revolution Analytics Ripley B., Lapsley M., 2012, "RODBC: ODBC Database Access. R package version ", -project.org/package=rodbc Rosario R., 2010, "Taking R to the Limit, Part II: Working with Large Datasets", bytemining.com/wp-content/uploads/2010/08/r_hpc_ii.pdf 71

11 Unwin A., Theus M., Hoffman H., 2006 Graphics of Large Datasets: Visualizing a Million. Springer Science, Singapore. * * * Big Memory Project, * * * Comprehensive R Archive Network, * * * Database source: * * * Mysql connector download, * * * Mysql configuration dsn on windows, * * * Oracle R Enterprise, * * * Revolution Analytics, 72

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Your Best Next Business Solution Big Data In R 24/3/2010

Your Best Next Business Solution Big Data In R 24/3/2010 Your Best Next Business Solution Big Data In R 24/3/2010 Big Data In R R Works on RAM Causing Scalability issues Maximum length of an object is 2^31-1 Some packages developed to help overcome this problem

More information

R a Global Sensation in Data Science

R a Global Sensation in Data Science R a Global Sensation in Data Science Nicoleta CARAGEA (nicoletacaragea@gmail.com) Antoniade-Ciprian ALEXANDRU (alexcipro@yahoo.com) Ecological University of Bucharest - Faculty of Economics Ana Maria DOBRE

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

BRINGING NEW OPPORTUNITIES TO DEVELOP STATISTICAL SOFTWARE AND DATA ANALYSIS TOOLS IN ROMANIA

BRINGING NEW OPPORTUNITIES TO DEVELOP STATISTICAL SOFTWARE AND DATA ANALYSIS TOOLS IN ROMANIA Globalization and Higher Education in Economics and Business Administration Iasi, GEBA 18-20 October 2012 BRINGING NEW OPPORTUNITIES TO DEVELOP STATISTICAL SOFTWARE AND DATA ANALYSIS TOOLS IN ROMANIA Nicoleta

More information

Large Datasets and You: A Field Guide

Large Datasets and You: A Field Guide Large Datasets and You: A Field Guide Matthew Blackwell m.blackwell@rochester.edu Maya Sen msen@ur.rochester.edu August 3, 2012 A wind of streaming data, social data and unstructured data is knocking at

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

In-memory databases and innovations in Business Intelligence

In-memory databases and innovations in Business Intelligence Database Systems Journal vol. VI, no. 1/2015 59 In-memory databases and innovations in Business Intelligence Ruxandra BĂBEANU, Marian CIOBANU University of Economic Studies, Bucharest, Romania babeanu.ruxandra@gmail.com,

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

FileMaker 11. ODBC and JDBC Guide

FileMaker 11. ODBC and JDBC Guide FileMaker 11 ODBC and JDBC Guide 2004 2010 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker is a trademark of FileMaker, Inc. registered

More information

Table of Contents. June 2010

Table of Contents. June 2010 June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and

More information

A Study on Big-Data Approach to Data Analytics

A Study on Big-Data Approach to Data Analytics A Study on Big-Data Approach to Data Analytics Ishwinder Kaur Sandhu #1, Richa Chabbra 2 1 M.Tech Student, Department of Computer Science and Technology, NCU University, Gurgaon, Haryana, India 2 Assistant

More information

7. Working with Big Data

7. Working with Big Data 7. Working with Big Data Thomas Lumley Ken Rice Universities of Washington and Auckland Lausanne, September 2014 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau Powered by Vertica Solution Series in conjunction with: hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau The cost of healthcare in the US continues to escalate. Consumers, employers,

More information

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal

More information

Creating a universe on Hive with Hortonworks HDP 2.0

Creating a universe on Hive with Hortonworks HDP 2.0 Creating a universe on Hive with Hortonworks HDP 2.0 Learn how to create an SAP BusinessObjects Universe on top of Apache Hive 2 using the Hortonworks HDP 2.0 distribution Author(s): Company: Ajay Singh

More information

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 3.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and not

More information

Universal PMML Plug-in for EMC Greenplum Database

Universal PMML Plug-in for EMC Greenplum Database Universal PMML Plug-in for EMC Greenplum Database Delivering Massively Parallel Predictions Zementis, Inc. info@zementis.com USA: 6125 Cornerstone Court East, Suite #250, San Diego, CA 92121 T +1(619)

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Sage CRM Technical Specification

Sage CRM Technical Specification Sage CRM Technical Specification Client Software This document outlines the recommended minimum software and hardware requirements for running Sage CRM. Please note that while the document refers to Sage

More information

Deal with big data in R using bigmemory package

Deal with big data in R using bigmemory package Deal with big data in R using bigmemory package Xiaojuan Hao Department of Statistics University of Nebraska-Lincoln April 28, 2015 Background What Is Big Data Size (We focus on) Complexity Rate of growth

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

Big Data and Parallel Work with R

Big Data and Parallel Work with R Big Data and Parallel Work with R What We'll Cover Data Limits in R Optional Data packages Optional Function packages Going parallel Deciding what to do Data Limits in R Big Data? What is big data? More

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

MySQL databases as part of the Online Business, using a platform based on Linux

MySQL databases as part of the Online Business, using a platform based on Linux Database Systems Journal vol. II, no. 3/2011 3 MySQL databases as part of the Online Business, using a platform based on Linux Ion-Sorin STROE Romanian Academy of Economic Studies Romana Sq, no 6, 1 st

More information

Unit 5.1 The Database Concept

Unit 5.1 The Database Concept Unit 5.1 The Database Concept Candidates should be able to: What is a Database? A database is a persistent, organised store of related data. Persistent Data and structures are maintained when data handling

More information

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users. Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major

More information

FileMaker 12. ODBC and JDBC Guide

FileMaker 12. ODBC and JDBC Guide FileMaker 12 ODBC and JDBC Guide 2004 2012 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.

More information

Simba XMLA Provider for Oracle OLAP 2.0. Linux Administration Guide. Simba Technologies Inc. April 23, 2013

Simba XMLA Provider for Oracle OLAP 2.0. Linux Administration Guide. Simba Technologies Inc. April 23, 2013 Simba XMLA Provider for Oracle OLAP 2.0 April 23, 2013 Simba Technologies Inc. Copyright 2013 Simba Technologies Inc. All Rights Reserved. Information in this document is subject to change without notice.

More information

On a Hadoop-based Analytics Service System

On a Hadoop-based Analytics Service System Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology

More information

Basics of Computational Physics

Basics of Computational Physics Basics of Computational Physics What is Computational Physics? Basic computer hardware Software 1: operating systems Software 2: Programming languages Software 3: Problem-solving environment What does

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Alteryx Predictive Analytics for Oracle R

Alteryx Predictive Analytics for Oracle R Alteryx Predictive Analytics for Oracle R I. Software Installation In order to be able to use Alteryx s predictive analytics tools with an Oracle Database connection, your client machine must be configured

More information

ORACLE DATABASE 10G ENTERPRISE EDITION

ORACLE DATABASE 10G ENTERPRISE EDITION ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture

More information

9. Handling large data

9. Handling large data 9. Handling large data Thomas Lumley Ken Rice Universities of Washington and Auckland Seattle, June 2011 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger computer:

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

Data Management, Analysis Tools, and Analysis Mechanics

Data Management, Analysis Tools, and Analysis Mechanics Chapter 2 Data Management, Analysis Tools, and Analysis Mechanics This chapter explores different tools and techniques for handling data for research purposes. This chapter assumes that a research problem

More information

R is Ready for Business

R is Ready for Business is eady for Business evolution 6 High- Analytics for the evolution is production grade analytics software built upon the powerful open source statistics language. With commercial enhancements and professional

More information

Sage CRM Technical Specification

Sage CRM Technical Specification Sage CRM Technical Specification Client Software This document outlines the recommended minimum software and hardware requirements for running Sage CRM. Please note that while the document refers to Sage

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

StruxureWare Data Center Expert 7.2.1 Release Notes

StruxureWare Data Center Expert 7.2.1 Release Notes StruxureWare Data Center Expert 7.2.1 Release Notes Table of Contents Page # Part Numbers Affected...... 1 Minimum System Requirements... 1 New Features........ 1 Issues Fixed....2 Known Issues...2 Upgrade

More information

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK sales@sawmill.co.uk tel: +44 845 250 4470

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK sales@sawmill.co.uk tel: +44 845 250 4470 Product Guide What is Sawmill Sawmill is a highly sophisticated and flexible analysis and reporting tool. It can read text log files from over 800 different sources and analyse their content. Once analyzed

More information

Qlik Sense scalability

Qlik Sense scalability Qlik Sense scalability Visual analytics platform Qlik Sense is a visual analytics platform powered by an associative, in-memory data indexing engine. Based on users selections, calculations are computed

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

R at the front end and

R at the front end and Divide & Recombine for Large Complex Data (a.k.a. Big Data) 1 Statistical framework requiring research in statistical theory and methods to make it work optimally Framework is designed to make computation

More information

Change Manager 5.0 Installation Guide

Change Manager 5.0 Installation Guide Change Manager 5.0 Installation Guide Copyright 1994-2008 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco, CA 94111 U.S.A. All rights reserved.

More information

Big-data Analytics: Challenges and Opportunities

Big-data Analytics: Challenges and Opportunities Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

Phire Architect Hardware and Software Requirements

Phire Architect Hardware and Software Requirements Phire Architect Hardware and Software Requirements Copyright 2014, Phire. All rights reserved. The Programs (which include both the software and documentation) contain proprietary information; they are

More information

OpenText Actuate Big Data Analytics 5.2

OpenText Actuate Big Data Analytics 5.2 OpenText Actuate Big Data Analytics 5.2 OpenText Actuate Big Data Analytics 5.2 introduces several improvements that make the product more useful, powerful and flexible for end users. A new data loading

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

Working with Greenplum Database using Toad for Data Analysts

Working with Greenplum Database using Toad for Data Analysts White Paper Working with Greenplum Database using Toad for Data Analysts The fundamental interoperability between Greenplum Database & Toad for Data Analysts in Windows Abstract This white paper briefly

More information

CUSTOMER Presentation of SAP Predictive Analytics

CUSTOMER Presentation of SAP Predictive Analytics SAP Predictive Analytics 2.0 2015-02-09 CUSTOMER Presentation of SAP Predictive Analytics Content 1 SAP Predictive Analytics Overview....3 2 Deployment Configurations....4 3 SAP Predictive Analytics Desktop

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

BIG DATA ANALYTICS For REAL TIME SYSTEM

BIG DATA ANALYTICS For REAL TIME SYSTEM BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage

More information

Oracle Database 11g Comparison Chart

Oracle Database 11g Comparison Chart Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix

More information

SAP Predictive Analysis Installation

SAP Predictive Analysis Installation SAP Predictive Analysis Installation SAP Predictive Analysis is the latest addition to the SAP BusinessObjects suite and introduces entirely new functionality to the existing Business Objects toolbox.

More information

Classroom Demonstrations of Big Data

Classroom Demonstrations of Big Data Classroom Demonstrations of Big Data Eric A. Suess Abstract We present examples of accessing and analyzing large data sets for use in a classroom at the first year graduate level or senior undergraduate

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Data Miner (Extension of SQL Developer 4.0) An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining

More information

DiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version 8.7. Jun Flexense Ltd.

DiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version 8.7. Jun Flexense Ltd. DiskSavvy DISK SPACE ANALYZER User Manual Version 8.7 Jun 2016 www.disksavvy.com info@flexense.com 1 1 Product Overview...3 2 Product Versions...7 3 Using Desktop Versions...8 3.1 Product Installation

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Issues in Information Systems Volume 16, Issue I, pp. 219-225, 2015

Issues in Information Systems Volume 16, Issue I, pp. 219-225, 2015 MOVING TOWARD A SERVER-BASED VIRTUAL MACHINE HOSTING ENVIRONMENT IN SUPPORT OF UNIVERSITY INFORMATION TECHNOLOGY PROGRAMMING COURSES George Stefanek, PhD, Purdue University North Central, stefanek@pnc.edu

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Setting Up ALERE with Client/Server Data

Setting Up ALERE with Client/Server Data Setting Up ALERE with Client/Server Data TIW Technology, Inc. November 2014 ALERE is a registered trademark of TIW Technology, Inc. The following are registered trademarks or trademarks: FoxPro, SQL Server,

More information

IceWarp Server. Log Analyzer. Version 10

IceWarp Server. Log Analyzer. Version 10 IceWarp Server Log Analyzer Version 10 Printed on 23 June, 2009 i Contents Log Analyzer 1 Quick Start... 2 Required Steps... 2 Optional Steps... 2 Advanced Configuration... 5 Log Importer... 6 General...

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

Oracle 11g is by far the most robust database software on the market

Oracle 11g is by far the most robust database software on the market Chapter 1 A Pragmatic Introduction to Oracle In This Chapter Getting familiar with Oracle Implementing grid computing Incorporating Oracle into everyday life Oracle 11g is by far the most robust database

More information

Actian Vector in Hadoop

Actian Vector in Hadoop Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single

More information

R-evolution in Time Series Analysis Software Applied on R-omanian Capital Market

R-evolution in Time Series Analysis Software Applied on R-omanian Capital Market R-evolution in Time Series Analysis Software Applied on R-omanian Capital Market Ciprian ALEXANDRU 1*, Nicoleta CARAGEA 2, Ana - Maria DOBRE 3 1 Ecological University of Bucharest - Faculty of Economics,

More information

R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015 R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians

More information

Tableau Server Scalability Explained

Tableau Server Scalability Explained Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted

More information

300 Intelligence Reporting. Sage 300 2016 Intelligence Reporting Customer Frequently asked questions

300 Intelligence Reporting. Sage 300 2016 Intelligence Reporting Customer Frequently asked questions 300 Intelligence Reporting Sage 300 2016 Intelligence Reporting Customer Table of contents 1. Overview of Sage Intelligence Reporting 3 2. Comparisons of Sage Intelligence Reporting and Sage Enterprise

More information

Prerequisites Guide. Version 4.0, Rev. 1

Prerequisites Guide. Version 4.0, Rev. 1 Version 4.0, Rev. 1 Contents Software and Hardware Prerequisites Guide... 2 anterradatacenter Version selection... 2 Required Software Components... 2 Sage 300 Construction and Real Estate ODBC... 2 Pervasive

More information

IBM SPSS Modeler 15 In-Database Mining Guide

IBM SPSS Modeler 15 In-Database Mining Guide IBM SPSS Modeler 15 In-Database Mining Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 217. This edition applies to IBM SPSS Modeler

More information

Embedded Operating Systems in a Point of Sale Environment. White Paper

Embedded Operating Systems in a Point of Sale Environment. White Paper Embedded Operating Systems in a Point of Sale Environment White Paper December 2008 Contents Embedded Operating Systems in a POS Environment... 3 Overview... 3 POS Operating Systems... 3 Operating Systems

More information

SysPatrol - Server Security Monitor

SysPatrol - Server Security Monitor SysPatrol Server Security Monitor User Manual Version 2.2 Sep 2013 www.flexense.com www.syspatrol.com 1 Product Overview SysPatrol is a server security monitoring solution allowing one to monitor one or

More information

StruxureWare Data Center Expert 7.2.4 Release Notes

StruxureWare Data Center Expert 7.2.4 Release Notes StruxureWare Data Center Expert 7.2.4 Release Notes Table of Contents Page # Part Numbers Affected...... 1 Minimum System Requirements... 1 New Features........ 1 Issues Fixed....3 Known Issues...3 Upgrade

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Business Intelligence Getting Started Guide

Business Intelligence Getting Started Guide Business Intelligence Getting Started Guide 2013 Table of Contents Introduction... 1 Introduction... 1 What is Sage Business Intelligence?... 1 System Requirements... 2 Recommended System Requirements...

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

v7.1 Technical Specification

v7.1 Technical Specification v7.1 Technical Specification Copyright 2011 Sage Technologies Limited, publisher of this work. All rights reserved. No part of this documentation may be copied, photocopied, reproduced, translated, microfilmed,

More information

SPEX for Windows Client Server Version 8.3. Pre-Requisite Document V1.0 16 th August 2006 SPEX CS 8.3

SPEX for Windows Client Server Version 8.3. Pre-Requisite Document V1.0 16 th August 2006 SPEX CS 8.3 SPEX for Windows Client Server Version 8.3 Pre-Requisite Document V1.0 16 th August 2006 Please read carefully and take note of the applicable pre-requisites contained within this document. It is important

More information

Develop Predictive Models Using Your Business Expertise

Develop Predictive Models Using Your Business Expertise Clementine 8.5 Specifications Develop Predictive Models Using Your Business Expertise Clementine is an integrated data mining workbench, popular worldwide with data miners and business analysts alike.

More information

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis (Version 1.17) For validation Document version 0.1 7/7/2014 Contents What is SAP Predictive Analytics?... 3

More information

Symantec NetBackup 7 Clients and Agents

Symantec NetBackup 7 Clients and Agents Complete protection for your information-driven enterprise Overview Symantec NetBackup provides a simple yet comprehensive selection of innovative clients and agents to optimize the performance and efficiency

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Turn Big Data to Small Data

Turn Big Data to Small Data Turn Big Data to Small Data Use Qlik to Utilize Distributed Systems and Document Databases October, 2014 Stig Magne Henriksen Image: kdnuggets.com From Big Data to Small Data Agenda When do we have a Big

More information

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide Table of Contents TABLE OF CONTENTS... 3 1.0 INTRODUCTION... 1 1.1 HOW TO USE THIS GUIDE... 1 1.2 TOPIC SUMMARY...

More information

Introduction. Why Use ODBC? Setting Up an ODBC Data Source. Stat/Math - Getting Started Using ODBC with SAS and SPSS

Introduction. Why Use ODBC? Setting Up an ODBC Data Source. Stat/Math - Getting Started Using ODBC with SAS and SPSS Introduction Page 1 of 15 The Open Database Connectivity (ODBC) standard is a common application programming interface for accessing data files. In other words, ODBC allows you to move data back and forth

More information

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database White Paper Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database Abstract This white paper explores the technology

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information