Bigger data analysis. Hadley Chief Scientist, RStudio. Thursday, July 18, 13
|
|
- Clyde Lucas
- 8 years ago
- Views:
Transcription
1 Bigger data analysis Hadley Chief Scientist, RStudio July 2013
2 1. What is data analysis? 2. Transforming data 3. Visualising data
3 What is data analysis?
4 Data analysis Data analysis the process is the process by which by data which becomes data becomes understanding, understanding, knowledge knowledge and insight and insight
5 Data analysis is the process by which data becomes understanding, knowledge and insight
6 Visualise Tidy Transform Model
7 Frequent data analysis learn to program
8 Cognition time Computation time
9 Visualise ggplot2 Tidy reshape2 stringr lubridate Transform plyr Model
10 Computation time Cognition time
11 Visualise bigvis Tidy Transform dplyr Model
12 Studio Data Every commercial US flight : ~76 million flights Total database: ~11 Gb >100 variables, but I ll focus on a handful: airline, delay, distance, flight time and speed.
13 Transformation
14 Split Apply Combine name n total name n Al 2 2 Al 2 name n Bo 4 Bo 4 total name total Al 2 Bo 0 Bo 0 9 Bo 9 Bo 5 Bo 5 Ed 15 Ed 5 name n total Ed 10 Ed 5 15 Ed 10
15 array data frame list nothing array aaply adply alply a_ply data frame daply ddply dlply d_ply list laply ldply llply l_ply n replicates raply rdply rlply r_ply function arguments maply mdply mlply m_ply
16 array data frame list nothing array aaply adply alply a_ply data frame daply ddply dlply d_ply list laply ldply llply l_ply n replicates raply rdply rlply r_ply function arguments maply mdply mlply m_ply
17 a_ply alply aaply l_ply fun daply adply laply d_ply use Never Occassionally Often All the time llply dlply ldply ddply count
18 Data analysis verbs select: subset variables filter: subset rows mutate: add new columns summarise: reduce to a single row arrange: re-order the rows
19 Data analysis verbs + group by select: subset variables filter: subset rows mutate: add new columns summarise: reduce to a single row arrange: re-order the rows
20 h <- readrds("houston.rdata") # ~2,100,000 x 6, ~57 meg; not huge, but substantial library(plyr) ddply(h, c("year", "Month", "DayofMonth"), summarise, n = length(year)) # user system elapsed # count(h, c("year", "Month", "DayofMonth")) # user system elapsed #
21 # Often work with the same grouping variables # multiple times, so define upfront. Also refer # to variables in the same way daily_df <- group_by(h, Year, Month, DayofMonth) # Now summarise knows how to deal with grouped # data frames summarise(daily_df, n()) # user system elapsed # # 20x faster!
22 library(data.table) h_dt <- data.table(h) daily_dt <- group_by(h_dt, Year, Month, DayofMonth) summarise(daily_dt, n()) # user system elapsed # # Exactly the same syntax, but 2.5x faster! # Don't need to learn the idiosyncrasies of # data.table; just 2 lines of code
23 # And dplyr also works seamlessly with databases: ontime <- source_sqlite("flights.sqlite3", "ontime") h_db <- filter(ontime, Origin == "IAH") daily_db <- group_by(h_db, Year, Month, DayofMonth) summarise(daily_db, n()) # user system elapsed # # user system elapsed # # Much slower, but not restricted to a predefined subset # Could speed up by carefully crafting indices
24 # Behind the scenes library(dplyr) ontime <- source_sqlite("../flights.sqlite3", "ontime") translate_sql(year > 2005, ontime) # <SQL> Year > translate_sql(year > 2005L, ontime) # <SQL> Year > 2005 translate_sql(origin == "IAD" Dest == "IAD", ontime) # <SQL> Origin = 'IAD' OR Dest = 'IAD' years <- 2000:2005 translate_sql(year %in% years, ontime) # <SQL> Year IN (2000, 2001, 2002, 2003, 2004, 2005)
25 Data sources Data frames (dplyr) Data tables (dplyr) SQLite tables (dplyr) Postgresql, MySql, SQL server,... MonetDB (planned) Google bigquery (bigrquery)
26 daily_df <- group_by(h, Year, Month, DayofMonth) summarise(daily_df, n()) daily_dt <- group_by(h_dt, Year, Month, DayofMonth) summarise(daily_dt, n()) daily_db <- group_by(h_db, Year, Month, DayofMonth) summarise(daily_db, n()) # It doesn't matter how your data is stored
27 # It might even live on the web library(bigrquery) library(dplyr) library(bigrquery) h_bq <- source_bigquery(billing_project, "ontime", "houston") daily_bq <- group_by(h_bq, Year, Month, DayofMonth) system.time(summarise(daily_bq, n())) # ~2 seconds # Storage = $80 / TB / Month # Query = $35 / TB (100 GB free)
28 dplyr Currently experimental and incomplete, but it works, and you re welcome to try it out. library(devtools) install_github("assertthat") install_github("dplyr") install_github("bigrquery") Needs a development environment (
29 Google for: split apply combine dplyr
30 Visualisation
31 Studio library(ggplot2) library(bigvis) # Can't use data frames :( dist <- readrds("dist.rds") delay <- readrds("delay.rds") time <- readrds("time.rds") speed <- dist / time * 60 # There's always bad data time[time < 0] <- NA speed[speed < 0] <- NA speed[speed > 761.2] <- NA
32 qplot(dist, speed, colour = delay) + scale_colour_gradient2()
33 One hour later... qplot(dist, speed, colour = delay) + scale_colour_gradient2()
34 x <- runif(2e5) y <- runif(2e5) system.time(plot(x, y))
35
36 user system elapsed
37 Studio Goals Support exploratory analysis (e.g. in R) Fast on commodity hardware 100,000,000 in <5s 108 obs = 0.8 Gb, ~20 vars in 16 Gb
38 Studio Insight Bottleneck is number of pixels: 1d 3,000; 2d: 3,000,000 Process: Condense (bin & summarise) Smooth Visualise
39 Bin x origin width
40 Summarise Count Histogram, KDE Mean Regression, Loess Std. dev. Quantiles Boxplots, Quantile regression smoothing
41 Studio count dist dist_s <- condense(bin(dist, 10)) autoplot(dist_s)
42 Studio user system elapsed count dist dist_s <- condense(bin(dist, 10)) autoplot(dist_s)
43 Studio NA count time time_s <- condense(bin(time, 1)) autoplot(time_s)
44 Studio count time autoplot(time_s, na.rm = TRUE)
45 Studio count time autoplot(time_s[time_s < 500, ])
46 Studio count time autoplot(time_s %% 60)
47 speed count 1e+06 1e+04 1e+02 1e dist
48 speed count 1e+06 1e+04 1e+02 1e sd1 <- condense(bin(dist, 10), z = speed) autoplot(sd1) + ylab("speed") dist
49 user system elapsed speed count 1e+06 1e+04 1e+02 1e sd1 <- condense(bin(dist, 10), z = speed) autoplot(sd1) + ylab("speed") dist
50 speed 400.count 6e+05 5e+05 4e+05 3e+05 2e+05 1e+05 0e dist
51 speed 400.count 6e+05 5e+05 4e+05 3e+05 2e+05 1e+05 0e sd2 <- condense(bin(dist, 20), bin(speed, 20)) autoplot(sd2) dist
52 800 user system elapsed speed 400.count 6e+05 5e+05 4e+05 3e+05 2e+05 1e+05 0e sd2 <- condense(bin(dist, 20), bin(speed, 20)) autoplot(sd2) dist
53 Studio Demo shiny::runapp("mt/", 8002)
54 Google for: bigvis
55 Conclusions
56 Visualise bigvis Tidy Transform dplyr Model
Accessing bigger datasets in R using SQLite and dplyr
Accessing bigger datasets in R using SQLite and dplyr Amherst College, Amherst, MA, USA March 24, 2015 nhorton@amherst.edu Thanks to Revolution Analytics for their financial support to the Five College
More informationLecture 4: Tools for data analysis, exploration, and transformation: plyr and reshape2
Lecture 4: Tools for data, exploration, and transformation: and 2 LSA 2013, Brain and Cognitive Sciences University of Rochester December 3, 2013 manipulation and exploration with and Split-combine: wide
More informationJournal of Statistical Software
JSS Journal of Statistical Software April 2011, Volume 40, Issue 1. http://www.jstatsoft.org/ The Split-Apply-Combine Strategy for Data Analysis Hadley Wickham Rice University Abstract Many data analysis
More informationJournal of Statistical Software
JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ The Split-Apply-Combine Strategy for Data Analysis Hadley Wickham Rice University Abstract Many data analysis
More informationVisualising big data in R
Visualising big data in R April 2013 Birmingham R User Meeting Alastair Sanderson www.alastairsanderson.com 23rd April 2013 The challenge of visualising big data Only a few million pixels on a screen,
More informationTeaching Precursors to Data Science in Introductory and Second Courses in Statistics
Teaching Precursors to Data Science in Introductory and Second Courses in Statistics Nicholas Horton, nhorton@amherst.edu April 28, 2015 Resources available at http://www.amherst.edu/~nhorton/precursors
More informationHands-On Data Science with R Dealing with Big Data. Graham.Williams@togaware.com. 27th November 2014 DRAFT
Hands-On Data Science with R Dealing with Big Data Graham.Williams@togaware.com 27th November 2014 Visit http://handsondatascience.com/ for more Chapters. In this module we explore how to load larger datasets
More informationBig data in R EPIC 2015
Big data in R EPIC 2015 Big Data: the new 'The Future' In which Forbes magazine finds common ground with Nancy Krieger (for the first time ever?), by arguing the need for theory-driven analysis This future
More informationGetting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
More informationHow To Use A Data Table In R 2.5.2.2 (For A Powerpoint)
M A N I P U L AT I N G B I G DATA I N R R O B E RT J. CA R R O L L M AY 2 1, 2 0 1 4 This document introduces the data.table package for fast manipulation of big data objects. This is but one option among
More informationScientific data visualization
Scientific data visualization Using ggplot2 Sacha Epskamp University of Amsterdam Department of Psychological Methods 11-04-2014 Hadley Wickham Hadley Wickham Evolution of data visualization Scientific
More informationData Visualization with R Language
1 Data Visualization with R Language DENG, Xiaodong (xiaodong_deng@nuhs.edu.sg ) Research Assistant Saw Swee Hock School of Public Health, National University of Singapore Why Visualize Data? For better
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationA Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
More informationSQL Server 2014. In-Memory by Design. Anu Ganesan August 8, 2014
SQL Server 2014 In-Memory by Design Anu Ganesan August 8, 2014 Drive Real-Time Business with Real-Time Insights Faster transactions Faster queries Faster insights All built-in to SQL Server 2014. 2 Drive
More informationLecture 25: Database Notes
Lecture 25: Database Notes 36-350, Fall 2014 12 November 2014 The examples here use http://www.stat.cmu.edu/~cshalizi/statcomp/ 14/lectures/23/baseball.db, which is derived from Lahman s baseball database
More informationHow good can databases deal with Netflow data
How good can databases deal with Netflow data Bachelorarbeit Supervisor: bernhard fabian@net.t-labs.tu-berlin.de Inteligent Networks Group (INET) Ernesto Abarca Ortiz eabarca@net.t-labs.tu-berlin.de OVERVIEW
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationCitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
More informationrm(list=ls()) library(sqldf) system.time({large = read.csv.sql("large.csv")}) #172.97 seconds, 4.23GB of memory used by R
Big Data in R Importing data into R: 1.75GB file Table 1: Comparison of importing data into R Time Taken Packages Functions (second) Remark/Note base read.csv > 2,394 My machine (8GB of memory) ran out
More informationVOL. 5, NO. 2, August 2015 ISSN 2225-7217 ARPN Journal of Systems and Software 2009-2015 AJSS Journal. All rights reserved
Big Data Analysis of Airline Data Set using Hive Nillohit Bhattacharya, 2 Jongwook Woo Grad Student, 2 Prof., Department of Computer Information Systems, California State University Los Angeles nbhatta2
More informationHow To Scale Big Data
Real-time Big Data An Agile Approach Presented by: Cory Isaacson, CEO CodeFutures Corporation http://www.codefutures.com Fall 2014 Introduction Who I am Cory Isaacson, CEO/CTO of CodeFutures Providers
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationUsing distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
More informationJan 28 th, 2015 FREE Webinar by
Google Analytics Data Mining with R (includes 3 Real Applications) Jan 28 th, 2015 FREE Webinar by 1/28/2015 1 Our Speakers Kushan Shah Maintainer of RGoogleAnalytics Library & Web Analyst at Tatvic @
More informationPractical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
More informationHTSQL is a comprehensive navigational query language for relational databases.
http://htsql.org/ HTSQL A Database Query Language HTSQL is a comprehensive navigational query language for relational databases. HTSQL is designed for data analysts and other accidental programmers who
More informationPowering Monitoring Analytics with ELK stack
Powering Monitoring Analytics with ELK stack Abdelkader Lahmadi, Frédéric Beck INRIA Nancy Grand Est, University of Lorraine, France 2015 (compiled on: June 23, 2015) References online Tutorials Elasticsearch
More informationOracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
More informationDatabase Scalability and Oracle 12c
Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data marcelle@piction.com Warning I will be covering topics and saying things that will cause a rethink in
More informationTackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
More informationMySQL Storage Engines
MySQL Storage Engines Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels
More informationSawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices
Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal
More informationIntroduction Course in SPSS - Evening 1
ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationHow To Synchronize With A Cwr Mobile Crm 2011 Data Management System
CWR Mobility Customer Support Program Page 1 of 10 Version [Status] May 2012 Synchronization Best Practices Configuring CWR Mobile CRM for Success Whitepaper Copyright 2009-2011 CWR Mobility B.V. Synchronization
More informationUsing the SQL Server Linked Server Capability
Using the SQL Server Linked Server Capability SQL Server s Linked Server feature enables fast and easy integration of SQL Server data and non SQL Server data, directly in the SQL Server engine itself.
More informationIntroduction to the data.table package in R
Introduction to the data.table package in R Revised: September 18, 2015 (A later revision may be available on the homepage) Introduction This vignette is aimed at those who are already familiar with creating
More informationPrivate vs. Public: Cloud Backup
Tech Brief Private vs. Public: Cloud Backup What You Need To Know With more and more MSPs looking to add cloud backup? services, the decision to build a private or to buy a public cloud requires a close
More informationhmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau
Powered by Vertica Solution Series in conjunction with: hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau The cost of healthcare in the US continues to escalate. Consumers, employers,
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationBig Table in Plain Language
Big Table in Plain Language Some people remember exactly where they were when JFK was shot. Other people remember exactly where they were when Neil Armstrong stepped on the moon. I remember exactly where
More informationSQL DBA Bundle. Data Sheet. Data Sheet. Introduction. What does it cost. What s included in the SQL DBA Bundle. Feedback for the SQL DBA Bundle
Data Sheet SQL DBA Bundle Data Sheet Introduction What does it cost What s included in the SQL DBA Bundle Feedback for the SQL DBA Bundle About Red Gate Software Contact information 2 2 3 7 8 8 SQL DBA
More informationANDROID APPS DEVELOPMENT FOR MOBILE GAME
ANDROID APPS DEVELOPMENT FOR MOBILE GAME Lecture 7: Data Storage and Web Services Overview Android provides several options for you to save persistent application data. Storage Option Shared Preferences
More informationYour Best Next Business Solution Big Data In R 24/3/2010
Your Best Next Business Solution Big Data In R 24/3/2010 Big Data In R R Works on RAM Causing Scalability issues Maximum length of an object is 2^31-1 Some packages developed to help overcome this problem
More informationConnecting Software Connect Bridge - Mobile CRM Android User Manual
Connect Bridge - Mobile CRM Android User Manual Summary This document describes the Android app Mobile CRM, its functionality and features available. The document is intended for end users as user manual
More informationIntroducing DocumentDB
David Chappell Introducing DocumentDB A NoSQL Database for Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Why DocumentDB?... 3 The DocumentDB Data Model...
More informationInternet Map Service Hosting at maphost.co.nz
SpatialMedia Internet Map Service Hosting at maphost.co.nz Hosting internet mapping services is the obvious solution for sites who have their web site hosted by an ISP/IPP. Getting online can be as simple
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationUsing SQL Monitor at Interactive Intelligence
Using SQL Monitor at Robbie Baxter 93% of Fortune 100 companies use Red Gate's software Using SQL Monitor at Robbie Baxter Database Administrator Summary Business communications software company has used
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationReport Paper: MatLab/Database Connectivity
Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been
More informationSecuring and Accelerating Databases In Minutes using GreenSQL
Securing and Accelerating Databases In Minutes using GreenSQL Unified Database Security All-in-one database security and acceleration solution Simplified management, maintenance, renewals and threat update
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationUsing APSIM, C# and R to Create and Analyse Large Datasets
21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 Using APSIM, C# and R to Create and Analyse Large Datasets J. L. Fainges
More informationEmpower Your Decisions: Maximizing Business Decisions with Data Visualization
Empower Your Decisions: Maximizing Business Decisions with Data Visualization Forbes, GE, and The MLB all have one thing common that most data driven businesses don't have. They all maximize their business
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
More informationFeature Factory: A Crowd Sourced Approach to Variable Discovery From Linked Data
Feature Factory: A Crowd Sourced Approach to Variable Discovery From Linked Data Kiarash Adl Advisor: Kalyan Veeramachaneni, Any Scale Learning for All Computer Science and Artificial Intelligence Laboratory
More informationSplice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More information2015 The MathWorks, Inc. 1
25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex
More informationThe Brave New World of Power BI and Hybrid Cloud
The Brave New World of Power BI and Hybrid Cloud Bhavik.Merchant@nec.com.au 27 th August 2015 Agenda Intro Session Goals Short History Lesson Overview of Power BI Components + Demos Transitioning and Future
More informationMedia Upload and Sharing Website using HBASE
A-PDF Merger DEMO : Purchase from www.a-pdf.com to remove the watermark Media Upload and Sharing Website using HBASE Tushar Mahajan Santosh Mukherjee Shubham Mathur Agenda Motivation for the project Introduction
More informationExpert Reference Series of White Papers. Introduction to Amazon Relational Database Service (Amazon RDS)
Expert Reference Series of White Papers Introduction to Amazon Relational Database Service (Amazon RDS) 1-800-COURSES www.globalknowledge.com Introduction to Amazon Relational Database Service (Amazon
More informationSQL Databases to access cosmological simulation results. CLUES Workshop. Lyon, 2012 Fernando Campos
SQL Databases to access cosmological simulation results CLUES Workshop. Lyon, 2012 Fernando Campos SQL Databases to access cosmological simulation results Why?? Too big data to handle it on files Easy
More informationG563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
More informationSpatialite-gui. a GUI tool to manage SQLite and SpatiaLite databases. Just few very short notes showing How to get started as quick as possible
Spatialite-gui a GUI tool to manage SQLite and SpatiaLite databases Just few very short notes showing How to get started as quick as possible You've just launched spatialite-gui; so you are now facing
More informationMaking OData requests from jquery and/or the Lianja HTML5 Client in a Web App is extremely straightforward and simple.
Lianja Cloud Server supports OData-compatible data access. The Server handles ODBC connections as well as HTTP requests using OData URIs. In this article I will show you how to use Lianja Cloud Server
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationMike Canney. Application Performance Analysis
Mike Canney Application Performance Analysis 1 Welcome to Sharkfest 12 contact Mike Canney, Principal Network Analyst, Tektivity, Inc. canney@getpackets.com 319-365-3336 www.getpackets.com 2 Agenda agenda
More informationHow DBA s can improve data access in the enterprise, unlock value and boost productivity.
How DBA s can improve data access in the enterprise, unlock value and boost productivity. Overview Improving access to enterprise data can unlock value and boost productivity in IT organizations. Current
More informationSoftware Design Proposal Scientific Data Management System
Software Design Proposal Scientific Data Management System Alex Fremier Associate Professor University of Idaho College of Natural Resources Colby Blair Computer Science Undergraduate University of Idaho
More informationPostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework
More informationConnecting Software. CB Mobile CRM Windows Phone 8. User Manual
CB Mobile CRM Windows Phone 8 User Manual Summary This document describes the Windows Phone 8 Mobile CRM app functionality and available features. The document is intended for end users as user manual
More informationIntroduction to SQL for Data Scientists
Introduction to SQL for Data Scientists Ben O. Smith College of Business Administration University of Nebraska at Omaha Learning Objectives By the end of this document you will learn: 1. How to perform
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationTushar Joshi Turtle Networks Ltd
MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering
More informationPerformance Progress Report
U.S. DEPARTMENT OF COMMERCE 2. Award or Grant Number 48-50-M09064 4. Report Date (MM/DD/YYYY) 10-01-2012 1. Recipient Name Connected Nation, Inc - Texas 6. Reporting Period End Date: 09-30-2012 3. Street
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationReport Builder. Microsoft SQL Server is great for storing departmental or company data. It is. A Quick Guide to. In association with
In association with A Quick Guide to Report Builder Simon Jones explains how to put business information into the hands of your employees thanks to Report Builder Microsoft SQL Server is great for storing
More informationOPTIMIZATION OF DATABASE STRUCTURE FOR HYDROMETEOROLOGICAL MONITORING SYSTEM
OPTIMIZATION OF DATABASE STRUCTURE FOR HYDROMETEOROLOGICAL MONITORING SYSTEM Ph.D. Robert SZCZEPANEK Cracow University of Technology Institute of Water Engineering and Water Management ul.warszawska 24,
More informationPredictive Analytics
Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is
More informationBig Analytics in the Cloud. Matt Winkler PM, Big Data @ Microsoft @mwinkle
Big Analytics in the Cloud Matt Winkler PM, Big Data @ Microsoft @mwinkle Part 3: Single Slide JustGiving is a global online social platform for giving that lets you raise money for a cause you care about
More informationData Visualization in R
Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2014 Introduction Motivation for Data Visualization Humans are outstanding at detecting
More informationData mining as a tool of revealing the hidden connection of the plant
Data mining as a tool of revealing the hidden connection of the plant Honeywell AIDA Advanced Interactive Data Analysis Introduction What is AIDA? AIDA: Advanced Interactive Data Analysis Developped in
More informationExploratory Data Analysis for Ecological Modelling and Decision Support
Exploratory Data Analysis for Ecological Modelling and Decision Support Gennady Andrienko & Natalia Andrienko Fraunhofer Institute AIS Sankt Augustin Germany http://www.ais.fraunhofer.de/and 5th ECEM conference,
More informationConquer the 5 Most Common Magento Coding Issues to Optimize Your Site for Performance
Conquer the 5 Most Common Magento Coding Issues to Optimize Your Site for Performance Written by: Oleksandr Zarichnyi Table of Contents INTRODUCTION... TOP 5 ISSUES... LOOPS... Calculating the size of
More informationINFORMATION BROCHURE Certificate Course in Web Design Using PHP/MySQL
INFORMATION BROCHURE OF Certificate Course in Web Design Using PHP/MySQL National Institute of Electronics & Information Technology (An Autonomous Scientific Society of Department of Information Technology,
More informationUser Guide. Analytics Desktop Document Number: 09619414
User Guide Analytics Desktop Document Number: 09619414 CONTENTS Guide Overview Description of this guide... ix What s new in this guide...x 1. Getting Started with Analytics Desktop Introduction... 1
More informationInstallation & User Guide
SharePoint List Filter Plus Web Part Installation & User Guide Copyright 2005-2009 KWizCom Corporation. All rights reserved. Company Headquarters P.O. Box #38514 North York, Ontario M2K 2Y5 Canada E-mail:
More information6 Steps to Faster Data Blending Using Your Data Warehouse
6 Steps to Faster Data Blending Using Your Data Warehouse Self-Service Data Blending and Analytics Dynamic market conditions require companies to be agile and decision making to be quick meaning the days
More informationSimilarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
More informationProduct Guide. Sawmill Analytics, Swindon SN4 9LZ UK sales@sawmill.co.uk tel: +44 845 250 4470
Product Guide What is Sawmill Sawmill is a highly sophisticated and flexible analysis and reporting tool. It can read text log files from over 800 different sources and analyse their content. Once analyzed
More informationWHAT IS THE CONFIGURATION TROUBLESHOOTER?
Paper BI-003 Best Practices for SAS Business Intelligence Administrators: Using the Configuration Troubleshooter to Keep SAS Solutions and SAS BI Applications Running Smoothly Tanya Kalich, SAS Institute
More information5 Correlation and Data Exploration
5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both
More informationLecture 2: Exploratory Data Analysis with R
Lecture 2: Exploratory Data Analysis with R Last Time: 1. Introduction: Why use R? / Syllabus 2. R as calculator 3. Import/Export of datasets 4. Data structures 5. Getting help, adding packages 6. Homework
More informationBig Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel
Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will
More information