Bigger data analysis. Hadley Chief Scientist, RStudio. Thursday, July 18, 13
|
|
|
- Clyde Lucas
- 10 years ago
- Views:
Transcription
1 Bigger data analysis Hadley Chief Scientist, RStudio July 2013
2 1. What is data analysis? 2. Transforming data 3. Visualising data
3 What is data analysis?
4 Data analysis Data analysis the process is the process by which by data which becomes data becomes understanding, understanding, knowledge knowledge and insight and insight
5 Data analysis is the process by which data becomes understanding, knowledge and insight
6 Visualise Tidy Transform Model
7 Frequent data analysis learn to program
8 Cognition time Computation time
9 Visualise ggplot2 Tidy reshape2 stringr lubridate Transform plyr Model
10 Computation time Cognition time
11 Visualise bigvis Tidy Transform dplyr Model
12 Studio Data Every commercial US flight : ~76 million flights Total database: ~11 Gb >100 variables, but I ll focus on a handful: airline, delay, distance, flight time and speed.
13 Transformation
14 Split Apply Combine name n total name n Al 2 2 Al 2 name n Bo 4 Bo 4 total name total Al 2 Bo 0 Bo 0 9 Bo 9 Bo 5 Bo 5 Ed 15 Ed 5 name n total Ed 10 Ed 5 15 Ed 10
15 array data frame list nothing array aaply adply alply a_ply data frame daply ddply dlply d_ply list laply ldply llply l_ply n replicates raply rdply rlply r_ply function arguments maply mdply mlply m_ply
16 array data frame list nothing array aaply adply alply a_ply data frame daply ddply dlply d_ply list laply ldply llply l_ply n replicates raply rdply rlply r_ply function arguments maply mdply mlply m_ply
17 a_ply alply aaply l_ply fun daply adply laply d_ply use Never Occassionally Often All the time llply dlply ldply ddply count
18 Data analysis verbs select: subset variables filter: subset rows mutate: add new columns summarise: reduce to a single row arrange: re-order the rows
19 Data analysis verbs + group by select: subset variables filter: subset rows mutate: add new columns summarise: reduce to a single row arrange: re-order the rows
20 h <- readrds("houston.rdata") # ~2,100,000 x 6, ~57 meg; not huge, but substantial library(plyr) ddply(h, c("year", "Month", "DayofMonth"), summarise, n = length(year)) # user system elapsed # count(h, c("year", "Month", "DayofMonth")) # user system elapsed #
21 # Often work with the same grouping variables # multiple times, so define upfront. Also refer # to variables in the same way daily_df <- group_by(h, Year, Month, DayofMonth) # Now summarise knows how to deal with grouped # data frames summarise(daily_df, n()) # user system elapsed # # 20x faster!
22 library(data.table) h_dt <- data.table(h) daily_dt <- group_by(h_dt, Year, Month, DayofMonth) summarise(daily_dt, n()) # user system elapsed # # Exactly the same syntax, but 2.5x faster! # Don't need to learn the idiosyncrasies of # data.table; just 2 lines of code
23 # And dplyr also works seamlessly with databases: ontime <- source_sqlite("flights.sqlite3", "ontime") h_db <- filter(ontime, Origin == "IAH") daily_db <- group_by(h_db, Year, Month, DayofMonth) summarise(daily_db, n()) # user system elapsed # # user system elapsed # # Much slower, but not restricted to a predefined subset # Could speed up by carefully crafting indices
24 # Behind the scenes library(dplyr) ontime <- source_sqlite("../flights.sqlite3", "ontime") translate_sql(year > 2005, ontime) # <SQL> Year > translate_sql(year > 2005L, ontime) # <SQL> Year > 2005 translate_sql(origin == "IAD" Dest == "IAD", ontime) # <SQL> Origin = 'IAD' OR Dest = 'IAD' years <- 2000:2005 translate_sql(year %in% years, ontime) # <SQL> Year IN (2000, 2001, 2002, 2003, 2004, 2005)
25 Data sources Data frames (dplyr) Data tables (dplyr) SQLite tables (dplyr) Postgresql, MySql, SQL server,... MonetDB (planned) Google bigquery (bigrquery)
26 daily_df <- group_by(h, Year, Month, DayofMonth) summarise(daily_df, n()) daily_dt <- group_by(h_dt, Year, Month, DayofMonth) summarise(daily_dt, n()) daily_db <- group_by(h_db, Year, Month, DayofMonth) summarise(daily_db, n()) # It doesn't matter how your data is stored
27 # It might even live on the web library(bigrquery) library(dplyr) library(bigrquery) h_bq <- source_bigquery(billing_project, "ontime", "houston") daily_bq <- group_by(h_bq, Year, Month, DayofMonth) system.time(summarise(daily_bq, n())) # ~2 seconds # Storage = $80 / TB / Month # Query = $35 / TB (100 GB free)
28 dplyr Currently experimental and incomplete, but it works, and you re welcome to try it out. library(devtools) install_github("assertthat") install_github("dplyr") install_github("bigrquery") Needs a development environment (
29 Google for: split apply combine dplyr
30 Visualisation
31 Studio library(ggplot2) library(bigvis) # Can't use data frames :( dist <- readrds("dist.rds") delay <- readrds("delay.rds") time <- readrds("time.rds") speed <- dist / time * 60 # There's always bad data time[time < 0] <- NA speed[speed < 0] <- NA speed[speed > 761.2] <- NA
32 qplot(dist, speed, colour = delay) + scale_colour_gradient2()
33 One hour later... qplot(dist, speed, colour = delay) + scale_colour_gradient2()
34 x <- runif(2e5) y <- runif(2e5) system.time(plot(x, y))
35
36 user system elapsed
37 Studio Goals Support exploratory analysis (e.g. in R) Fast on commodity hardware 100,000,000 in <5s 108 obs = 0.8 Gb, ~20 vars in 16 Gb
38 Studio Insight Bottleneck is number of pixels: 1d 3,000; 2d: 3,000,000 Process: Condense (bin & summarise) Smooth Visualise
39 Bin x origin width
40 Summarise Count Histogram, KDE Mean Regression, Loess Std. dev. Quantiles Boxplots, Quantile regression smoothing
41 Studio count dist dist_s <- condense(bin(dist, 10)) autoplot(dist_s)
42 Studio user system elapsed count dist dist_s <- condense(bin(dist, 10)) autoplot(dist_s)
43 Studio NA count time time_s <- condense(bin(time, 1)) autoplot(time_s)
44 Studio count time autoplot(time_s, na.rm = TRUE)
45 Studio count time autoplot(time_s[time_s < 500, ])
46 Studio count time autoplot(time_s %% 60)
47 speed count 1e+06 1e+04 1e+02 1e dist
48 speed count 1e+06 1e+04 1e+02 1e sd1 <- condense(bin(dist, 10), z = speed) autoplot(sd1) + ylab("speed") dist
49 user system elapsed speed count 1e+06 1e+04 1e+02 1e sd1 <- condense(bin(dist, 10), z = speed) autoplot(sd1) + ylab("speed") dist
50 speed 400.count 6e+05 5e+05 4e+05 3e+05 2e+05 1e+05 0e dist
51 speed 400.count 6e+05 5e+05 4e+05 3e+05 2e+05 1e+05 0e sd2 <- condense(bin(dist, 20), bin(speed, 20)) autoplot(sd2) dist
52 800 user system elapsed speed 400.count 6e+05 5e+05 4e+05 3e+05 2e+05 1e+05 0e sd2 <- condense(bin(dist, 20), bin(speed, 20)) autoplot(sd2) dist
53 Studio Demo shiny::runapp("mt/", 8002)
54 Google for: bigvis
55 Conclusions
56 Visualise bigvis Tidy Transform dplyr Model
Accessing bigger datasets in R using SQLite and dplyr
Accessing bigger datasets in R using SQLite and dplyr Amherst College, Amherst, MA, USA March 24, 2015 [email protected] Thanks to Revolution Analytics for their financial support to the Five College
Lecture 4: Tools for data analysis, exploration, and transformation: plyr and reshape2
Lecture 4: Tools for data, exploration, and transformation: and 2 LSA 2013, Brain and Cognitive Sciences University of Rochester December 3, 2013 manipulation and exploration with and Split-combine: wide
Journal of Statistical Software
JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ The Split-Apply-Combine Strategy for Data Analysis Hadley Wickham Rice University Abstract Many data analysis
Visualising big data in R
Visualising big data in R April 2013 Birmingham R User Meeting Alastair Sanderson www.alastairsanderson.com 23rd April 2013 The challenge of visualising big data Only a few million pixels on a screen,
Teaching Precursors to Data Science in Introductory and Second Courses in Statistics
Teaching Precursors to Data Science in Introductory and Second Courses in Statistics Nicholas Horton, [email protected] April 28, 2015 Resources available at http://www.amherst.edu/~nhorton/precursors
Hands-On Data Science with R Dealing with Big Data. [email protected]. 27th November 2014 DRAFT
Hands-On Data Science with R Dealing with Big Data [email protected] 27th November 2014 Visit http://handsondatascience.com/ for more Chapters. In this module we explore how to load larger datasets
Big data in R EPIC 2015
Big data in R EPIC 2015 Big Data: the new 'The Future' In which Forbes magazine finds common ground with Nancy Krieger (for the first time ever?), by arguing the need for theory-driven analysis This future
Getting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
Scientific data visualization
Scientific data visualization Using ggplot2 Sacha Epskamp University of Amsterdam Department of Psychological Methods 11-04-2014 Hadley Wickham Hadley Wickham Evolution of data visualization Scientific
Data Visualization with R Language
1 Data Visualization with R Language DENG, Xiaodong ([email protected] ) Research Assistant Saw Swee Hock School of Public Health, National University of Singapore Why Visualize Data? For better
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
SQL Server 2014. In-Memory by Design. Anu Ganesan August 8, 2014
SQL Server 2014 In-Memory by Design Anu Ganesan August 8, 2014 Drive Real-Time Business with Real-Time Insights Faster transactions Faster queries Faster insights All built-in to SQL Server 2014. 2 Drive
Lecture 25: Database Notes
Lecture 25: Database Notes 36-350, Fall 2014 12 November 2014 The examples here use http://www.stat.cmu.edu/~cshalizi/statcomp/ 14/lectures/23/baseball.db, which is derived from Lahman s baseball database
How good can databases deal with Netflow data
How good can databases deal with Netflow data Bachelorarbeit Supervisor: bernhard [email protected] Inteligent Networks Group (INET) Ernesto Abarca Ortiz [email protected] OVERVIEW
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
rm(list=ls()) library(sqldf) system.time({large = read.csv.sql("large.csv")}) #172.97 seconds, 4.23GB of memory used by R
Big Data in R Importing data into R: 1.75GB file Table 1: Comparison of importing data into R Time Taken Packages Functions (second) Remark/Note base read.csv > 2,394 My machine (8GB of memory) ran out
VOL. 5, NO. 2, August 2015 ISSN 2225-7217 ARPN Journal of Systems and Software 2009-2015 AJSS Journal. All rights reserved
Big Data Analysis of Airline Data Set using Hive Nillohit Bhattacharya, 2 Jongwook Woo Grad Student, 2 Prof., Department of Computer Information Systems, California State University Los Angeles nbhatta2
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
HTSQL is a comprehensive navigational query language for relational databases.
http://htsql.org/ HTSQL A Database Query Language HTSQL is a comprehensive navigational query language for relational databases. HTSQL is designed for data analysts and other accidental programmers who
Powering Monitoring Analytics with ELK stack
Powering Monitoring Analytics with ELK stack Abdelkader Lahmadi, Frédéric Beck INRIA Nancy Grand Est, University of Lorraine, France 2015 (compiled on: June 23, 2015) References online Tutorials Elasticsearch
Oracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
Database Scalability and Oracle 12c
Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data [email protected] Warning I will be covering topics and saying things that will cause a rethink in
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
MySQL Storage Engines
MySQL Storage Engines Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels
Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices
Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal
Introduction Course in SPSS - Evening 1
ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
How To Synchronize With A Cwr Mobile Crm 2011 Data Management System
CWR Mobility Customer Support Program Page 1 of 10 Version [Status] May 2012 Synchronization Best Practices Configuring CWR Mobile CRM for Success Whitepaper Copyright 2009-2011 CWR Mobility B.V. Synchronization
Using the SQL Server Linked Server Capability
Using the SQL Server Linked Server Capability SQL Server s Linked Server feature enables fast and easy integration of SQL Server data and non SQL Server data, directly in the SQL Server engine itself.
Introduction to the data.table package in R
Introduction to the data.table package in R Revised: September 18, 2015 (A later revision may be available on the homepage) Introduction This vignette is aimed at those who are already familiar with creating
hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau
Powered by Vertica Solution Series in conjunction with: hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau The cost of healthcare in the US continues to escalate. Consumers, employers,
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
SQL DBA Bundle. Data Sheet. Data Sheet. Introduction. What does it cost. What s included in the SQL DBA Bundle. Feedback for the SQL DBA Bundle
Data Sheet SQL DBA Bundle Data Sheet Introduction What does it cost What s included in the SQL DBA Bundle Feedback for the SQL DBA Bundle About Red Gate Software Contact information 2 2 3 7 8 8 SQL DBA
ANDROID APPS DEVELOPMENT FOR MOBILE GAME
ANDROID APPS DEVELOPMENT FOR MOBILE GAME Lecture 7: Data Storage and Web Services Overview Android provides several options for you to save persistent application data. Storage Option Shared Preferences
Your Best Next Business Solution Big Data In R 24/3/2010
Your Best Next Business Solution Big Data In R 24/3/2010 Big Data In R R Works on RAM Causing Scalability issues Maximum length of an object is 2^31-1 Some packages developed to help overcome this problem
Connecting Software Connect Bridge - Mobile CRM Android User Manual
Connect Bridge - Mobile CRM Android User Manual Summary This document describes the Android app Mobile CRM, its functionality and features available. The document is intended for end users as user manual
Introducing DocumentDB
David Chappell Introducing DocumentDB A NoSQL Database for Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Why DocumentDB?... 3 The DocumentDB Data Model...
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Report Paper: MatLab/Database Connectivity
Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been
Securing and Accelerating Databases In Minutes using GreenSQL
Securing and Accelerating Databases In Minutes using GreenSQL Unified Database Security All-in-one database security and acceleration solution Simplified management, maintenance, renewals and threat update
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Using APSIM, C# and R to Create and Analyse Large Datasets
21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 Using APSIM, C# and R to Create and Analyse Large Datasets J. L. Fainges
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
Feature Factory: A Crowd Sourced Approach to Variable Discovery From Linked Data
Feature Factory: A Crowd Sourced Approach to Variable Discovery From Linked Data Kiarash Adl Advisor: Kalyan Veeramachaneni, Any Scale Learning for All Computer Science and Artificial Intelligence Laboratory
Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
2015 The MathWorks, Inc. 1
25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex
The Brave New World of Power BI and Hybrid Cloud
The Brave New World of Power BI and Hybrid Cloud [email protected] 27 th August 2015 Agenda Intro Session Goals Short History Lesson Overview of Power BI Components + Demos Transitioning and Future
Media Upload and Sharing Website using HBASE
A-PDF Merger DEMO : Purchase from www.a-pdf.com to remove the watermark Media Upload and Sharing Website using HBASE Tushar Mahajan Santosh Mukherjee Shubham Mathur Agenda Motivation for the project Introduction
G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
Spatialite-gui. a GUI tool to manage SQLite and SpatiaLite databases. Just few very short notes showing How to get started as quick as possible
Spatialite-gui a GUI tool to manage SQLite and SpatiaLite databases Just few very short notes showing How to get started as quick as possible You've just launched spatialite-gui; so you are now facing
Making OData requests from jquery and/or the Lianja HTML5 Client in a Web App is extremely straightforward and simple.
Lianja Cloud Server supports OData-compatible data access. The Server handles ODBC connections as well as HTTP requests using OData URIs. In this article I will show you how to use Lianja Cloud Server
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Mike Canney. Application Performance Analysis
Mike Canney Application Performance Analysis 1 Welcome to Sharkfest 12 contact Mike Canney, Principal Network Analyst, Tektivity, Inc. [email protected] 319-365-3336 www.getpackets.com 2 Agenda agenda
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework
Connecting Software. CB Mobile CRM Windows Phone 8. User Manual
CB Mobile CRM Windows Phone 8 User Manual Summary This document describes the Windows Phone 8 Mobile CRM app functionality and available features. The document is intended for end users as user manual
Introduction to SQL for Data Scientists
Introduction to SQL for Data Scientists Ben O. Smith College of Business Administration University of Nebraska at Omaha Learning Objectives By the end of this document you will learn: 1. How to perform
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
Tushar Joshi Turtle Networks Ltd
MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Report Builder. Microsoft SQL Server is great for storing departmental or company data. It is. A Quick Guide to. In association with
In association with A Quick Guide to Report Builder Simon Jones explains how to put business information into the hands of your employees thanks to Report Builder Microsoft SQL Server is great for storing
OPTIMIZATION OF DATABASE STRUCTURE FOR HYDROMETEOROLOGICAL MONITORING SYSTEM
OPTIMIZATION OF DATABASE STRUCTURE FOR HYDROMETEOROLOGICAL MONITORING SYSTEM Ph.D. Robert SZCZEPANEK Cracow University of Technology Institute of Water Engineering and Water Management ul.warszawska 24,
Predictive Analytics
Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is
Big Analytics in the Cloud. Matt Winkler PM, Big Data @ Microsoft @mwinkle
Big Analytics in the Cloud Matt Winkler PM, Big Data @ Microsoft @mwinkle Part 3: Single Slide JustGiving is a global online social platform for giving that lets you raise money for a cause you care about
Data Visualization in R
Data Visualization in R L. Torgo [email protected] Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2014 Introduction Motivation for Data Visualization Humans are outstanding at detecting
Data mining as a tool of revealing the hidden connection of the plant
Data mining as a tool of revealing the hidden connection of the plant Honeywell AIDA Advanced Interactive Data Analysis Introduction What is AIDA? AIDA: Advanced Interactive Data Analysis Developped in
Exploratory Data Analysis for Ecological Modelling and Decision Support
Exploratory Data Analysis for Ecological Modelling and Decision Support Gennady Andrienko & Natalia Andrienko Fraunhofer Institute AIS Sankt Augustin Germany http://www.ais.fraunhofer.de/and 5th ECEM conference,
Conquer the 5 Most Common Magento Coding Issues to Optimize Your Site for Performance
Conquer the 5 Most Common Magento Coding Issues to Optimize Your Site for Performance Written by: Oleksandr Zarichnyi Table of Contents INTRODUCTION... TOP 5 ISSUES... LOOPS... Calculating the size of
INFORMATION BROCHURE Certificate Course in Web Design Using PHP/MySQL
INFORMATION BROCHURE OF Certificate Course in Web Design Using PHP/MySQL National Institute of Electronics & Information Technology (An Autonomous Scientific Society of Department of Information Technology,
User Guide. Analytics Desktop Document Number: 09619414
User Guide Analytics Desktop Document Number: 09619414 CONTENTS Guide Overview Description of this guide... ix What s new in this guide...x 1. Getting Started with Analytics Desktop Introduction... 1
Installation & User Guide
SharePoint List Filter Plus Web Part Installation & User Guide Copyright 2005-2009 KWizCom Corporation. All rights reserved. Company Headquarters P.O. Box #38514 North York, Ontario M2K 2Y5 Canada E-mail:
6 Steps to Faster Data Blending Using Your Data Warehouse
6 Steps to Faster Data Blending Using Your Data Warehouse Self-Service Data Blending and Analytics Dynamic market conditions require companies to be agile and decision making to be quick meaning the days
Similarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK [email protected] tel: +44 845 250 4470
Product Guide What is Sawmill Sawmill is a highly sophisticated and flexible analysis and reporting tool. It can read text log files from over 800 different sources and analyse their content. Once analyzed
5 Correlation and Data Exploration
5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both
Lecture 2: Exploratory Data Analysis with R
Lecture 2: Exploratory Data Analysis with R Last Time: 1. Introduction: Why use R? / Syllabus 2. R as calculator 3. Import/Export of datasets 4. Data structures 5. Getting help, adding packages 6. Homework
Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel
Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will
