Streaming Data, Concurrency And R
|
|
|
- Tiffany Gilbert
- 10 years ago
- Views:
Transcription
1 Streaming Data, Concurrency And R Rory Winston [email protected]
2 About Me Independent Software Consultant M.Sc. Applied Computing, 2000 M.Sc. Finance, 2008 Apache Committer Working in the financial sector for the last 7 years or so Interested in practical applications of functional languages and machine learning Relatively recent convert to R ( 2 years)
3 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
4 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
5 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
6 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
7 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
8 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
9 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
10 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
11 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
12 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
13 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
14 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
15 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
16 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
17 R - Pros and Cons Pro Can be extremely elegant Comprehensive extension library Open-source Huge parallelization effort Fantastic reporting capabilities Con Can be clunky (S4) Bewildering array of overlapping extensions Inherently single-threaded
18 Parallelization vs. Concurrency R interpreter is single threaded Some historical context for this (BLAS implementations) Not necessarily a limitation in the general context Multithreading can be complex and problematic Instead a focus on parallelization: Distributed computation: gridr, nws, snow Multicore/multi-cpu scaling: Rmpi, Romp, pnmath/pnmath0 Interfaces to Pthreads/PBLAS/OpenMP/MPI/Globus/etc. Parallelization suits cpu-bound large data processing applications
19 Other Scalability and Performance Work JIT/bytecode compilation (Ra) Implicit vectorization a la Matlab (code analysis) Large ( RAM) dataset handling (bigmemory,ff) Many incremental performance improvements (e.g. less internal copying) Next: GPU/massive multicore...?
20 What Benefit Concurrency? Real-time (streaming to be more precise) data analysis Growing Interest in using R for streaming data, not just offline analyis GUI toolkit integration Fine-grained control over independent task execution "I believe that explicit concurrency management tools (i.e. a threads toolkit) are what we really need in R at this point." - Luke Tierney, 2001
21 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
22 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
23 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
24 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
25 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
26 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
27 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
28 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
29 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
30 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
31 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
32 Will There Be A Multithreaded R? Short answer is: probably not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?) Large amount of work (but potentially do-able)
33 Example Application Based on work I did last year and presented at UseR! 2008 Wrote a real-time and historical market data service from Reuters/R The real-time interface used the Reuters C++ API R extension in C++ that spawned listening thread and handled updates
34 Simplified Architecture R extension (C++) realtime bus
35 Example Usage rsub <- function(duration, items, callback) The call rsub will subscribe to the specified rate(s) for the duration of time specified by duration (ms). When a tick arrives, the callback function callback is invoked, with a data frame containing the fields specified in items. Multiple market data items may be subscribed to, and any combination of fields may be be specified. Uses the underlying RFA API, which provides a C++ interface to real-time market updates.
36 Real-Time Example # Specify field names to retrieve fields <- c("bid","ask","timcor") # Subscribe to EUR/USD and GBP/USD ticks items <- list() items[[1]] <- c("idn_selectfeed", "EUR=", fields) items[[2]] <- c("idn_selectfeed", "GBP=", fields) # Simple Callback Function callback <- function(df) { print(paste("received",df)) } # Subscribe for 1 hour ONE_HOUR <- 1000*(60)^2 rsub(one_hour, items, callback)
37 Issues With This Approach As R interpreter is single threaded, cannot spawn thread for callbacks Thus, interpreter thread is locked for the duration of subscription Not a great user experience Need to find alternative mechanism
38 Alternative Approach If we cannot run subscriber threads in-process, need to decouple Standard approach: add an extra layer and use some form of IPC For instance, we could: Subscribe in a dedicated R process (A) Push incoming data onto a socket R process (B) reads from a listening socket Sockets could also be another IPC primitive, e.g. pipes Also note that R supports asynchronous I/O (?isincomplete) Look at the ibrokers package for examples of this
39 The bigmemory package From the description: "Use C++ to create, store, access, and manipulate massive matrices" Allows creation of large matrices These matrices can be mapped to files/shared memory It is the shared memory functionality that we will use The next version (3.0) will be unveiled at UseR! 2009 big.matrix(nrow, ncol, type = "integer",...) shared.big.matrix(nrow, ncol, type = "integer",...) filebacked.big.matrix(nrow, ncol, type = "integer",...)
40 Sample Usage > library(bigmemory) # Note: I'm using pre-release > X <- shared.big.matrix(type="double", ncol=1000, nrow=1000) > X An object of class big.matrix Slot "address": <pointer: 0x7378a0>
41 $rownames NULL Create Shared Memory Descriptor > desc <- describe(x) > desc $sharedtype [1] "SharedMemory" $sharedname [1] "53f14925-dca1-42a8-a547-e1bccae999ce" $nrow [1] 1000 $ncol [1] 1000
42 Export the Descriptor In R session 1: > dput(desc, file="~/matrix.desc") In R session 2: > library(bigmemory) > desc <- dget("~/matrix.desc") > X <- attach.big.matrix(desc) Now R sessions A and B share the same big.matrix instance
43 Share Data Between Sessions R session 1: > X[1,1] < R session 2: > X[1,1] [1] Thus, streaming data can be continuously fed into session A And concurrently processed in session B
44 Summary Lack of threads not a barrier to concurrent analysis Packages like bigmemory, nws, etc. facilitate decoupling via IPC nws goes a step further, with a distributed workspace Many applications for streaming data: Data collection/monitoring Development of pricing/risk algorithms Low-frequency execution (??)...
45 References luke/r/thrgui/
Big Data and Parallel Work with R
Big Data and Parallel Work with R What We'll Cover Data Limits in R Optional Data packages Optional Function packages Going parallel Deciding what to do Data Limits in R Big Data? What is big data? More
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Presto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
Your Best Next Business Solution Big Data In R 24/3/2010
Your Best Next Business Solution Big Data In R 24/3/2010 Big Data In R R Works on RAM Causing Scalability issues Maximum length of an object is 2^31-1 Some packages developed to help overcome this problem
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories [email protected] http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
General Introduction
Managed Runtime Technology: General Introduction Xiao-Feng Li ([email protected]) 2012-10-10 Agenda Virtual machines Managed runtime systems EE and MM (JIT and GC) Summary 10/10/2012 Managed Runtime
Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
The Not Quite R (NQR) Project: Explorations Using the Parrot Virtual Machine
The Not Quite R (NQR) Project: Explorations Using the Parrot Virtual Machine Michael J. Kane 1 and John W. Emerson 2 1 Yale Center for Analytical Sciences, Yale University 2 Department of Statistics, Yale
Programming in C# with Microsoft Visual Studio 2010
Introducción a la Programación Web con C# en Visual Studio 2010 Curso: Introduction to Web development Programming in C# with Microsoft Visual Studio 2010 Introduction to Web Development with Microsoft
USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE
USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE Gonzalo Garcia VP of Operations, USA Property of GMV All rights reserved INTRODUCTION Property of GMV All rights reserved INTRODUCTION
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
Real Time Market Data and Trade Execution with R
Real Time Market Data and Trade Execution with R Jeffrey A. Ryan April 5, 2009 Contents 1 Introduction to Real Time Processing in R 2 1.1 Socket Connections in R....................... 2 1.1.1 Overview...........................
CSC 2405: Computer Systems II
CSC 2405: Computer Systems II Spring 2013 (TR 8:30-9:45 in G86) Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Introductions Mirela Damian Room 167A in the Mendel Science Building [email protected]
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated
Enterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
Syllabus for CS 134 Java Programming
- Java Programming Syllabus Page 1 Syllabus for CS 134 Java Programming Computer Science Course Catalog 2000-2001: This course is an introduction to objectoriented programming using the Java language.
Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside
Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment Sanjay Kulhari, Jian Wen UC Riverside Team Sanjay Kulhari M.S. student, CS U C Riverside Jian Wen Ph.D. student, CS U
EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR A REMOTE TRACING FACILITY FOR DISTRIBUTED SYSTEMS
EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR CERN-ATS-2011-200 A REMOTE TRACING FACILITY FOR DISTRIBUTED SYSTEMS F. Ehm, A. Dworak, CERN, Geneva, Switzerland Abstract
Enterprise Data Integration for Microsoft Dynamics CRM
Enterprise Data Integration for Microsoft Dynamics CRM Daniel Cai http://danielcai.blogspot.com About me Daniel Cai Developer @KingswaySoft a software company offering integration software and solutions
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Chapter 10 Case Study 1: LINUX
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 10 Case Study 1: LINUX History of UNIX and Linux UNICS PDP-11 UNIX Portable UNIX Berkeley UNIX Standard UNIX MINIX Linux UNIX/Linux Goals
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
Filtering Mail with Milter. David F. Skoll Roaring Penguin Software Inc.
Filtering Mail with Milter David F. Skoll Roaring Penguin Software Inc. Why filter mail? Overview Different filtering approaches Delivery agent (e.g. Procmail) Central filtering (Milter) Milter Architecture
Effective Java Programming. efficient software development
Effective Java Programming efficient software development Structure efficient software development what is efficiency? development process profiling during development what determines the performance of
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
Design Document. Offline Charging Server (Offline CS ) Version 1.0. - i -
Design Document Offline Charging Server (Offline CS ) Version 1.0 - i - Document Scope Objective The information provided in this document specifies the design details of Operations of Offline Charging
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
Chapter 6 Concurrent Programming
Chapter 6 Concurrent Programming Outline 6.1 Introduction 6.2 Monitors 6.2.1 Condition Variables 6.2.2 Simple Resource Allocation with Monitors 6.2.3 Monitor Example: Circular Buffer 6.2.4 Monitor Example:
Eloquence Training What s new in Eloquence B.08.00
Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila Application Architectures Ref: Chapter 13 Software Engineering By Ian Sommerville, 7th Edition
EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
CRM Magic with Data Migration & Integration
CRM Magic with Data Migration & Integration Daniel Cai http://www.kingswaysoft.com http://danielcai.blogspot.com About me Daniel Cai Principal Developer @KingswaySoft An independent software company offering
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Event-based middleware services
3 Event-based middleware services The term event service has different definitions. In general, an event service connects producers of information and interested consumers. The service acquires events
Project Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008
Project Convergence: Integrating Data Grids and Compute Grids Eugene Steinberg, CTO May, 2008 Data-Driven Scalability Challenges in HPC Data is far away Latency of remote connection Latency of data movement
Understanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
BSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
Deal with big data in R using bigmemory package
Deal with big data in R using bigmemory package Xiaojuan Hao Department of Statistics University of Nebraska-Lincoln April 28, 2015 Background What Is Big Data Size (We focus on) Complexity Rate of growth
Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure
1 Alessandro Alinone Agenda Introduction Push Technology: definition, typology, history, early failures Lightstreamer: 3rd Generation architecture, true-push Client-side push technology (Browser client,
rm(list=ls()) library(sqldf) system.time({large = read.csv.sql("large.csv")}) #172.97 seconds, 4.23GB of memory used by R
Big Data in R Importing data into R: 1.75GB file Table 1: Comparison of importing data into R Time Taken Packages Functions (second) Remark/Note base read.csv > 2,394 My machine (8GB of memory) ran out
System Software and TinyAUTOSAR
System Software and TinyAUTOSAR Florian Kluge University of Augsburg, Germany parmerasa Dissemination Event, Barcelona, 2014-09-23 Overview parmerasa System Architecture Library RTE Implementations TinyIMA
CSCI E 98: Managed Environments for the Execution of Programs
CSCI E 98: Managed Environments for the Execution of Programs Draft Syllabus Instructor Phil McGachey, PhD Class Time: Mondays beginning Sept. 8, 5:30-7:30 pm Location: 1 Story Street, Room 304. Office
A distributed system is defined as
A distributed system is defined as A collection of independent computers that appears to its users as a single coherent system CS550: Advanced Operating Systems 2 Resource sharing Openness Concurrency
Real Time Programming: Concepts
Real Time Programming: Concepts Radek Pelánek Plan at first we will study basic concepts related to real time programming then we will have a look at specific programming languages and study how they realize
New Design and Layout Tips For Processing Multiple Tasks
Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE
Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI
Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI 1 Long term TIZEN Objectives for harmonization Support in TIZEN for coexistence of GENIVI applications Allow portable business rules
Cloud Computing. Up until now
Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines
Technical Properties. Mobile Operating Systems. Overview Concepts of Mobile. Functions Processes. Lecture 11. Memory Management.
Overview Concepts of Mobile Operating Systems Lecture 11 Concepts of Mobile Operating Systems Mobile Business I (WS 2007/08) Prof Dr Kai Rannenberg Chair of Mobile Business and Multilateral Security Johann
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications
Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications Jun Wang, Xi Xiong, Peng Liu Penn State Cyber Security Lab 1 An inherent security limitation
Parallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier.
Parallel Computing 37 (2011) 26 41 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Architectural support for thread communications in multi-core
Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert
Ubiquitous Computing Ubiquitous Computing The Sensor Network System Sun SPOT: The Sun Small Programmable Object Technology Technology-Based Wireless Sensor Networks a Java Platform for Developing Applications
IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus
Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Java the UML Way: Integrating Object-Oriented Design and Programming
Java the UML Way: Integrating Object-Oriented Design and Programming by Else Lervik and Vegard B. Havdal ISBN 0-470-84386-1 John Wiley & Sons, Ltd. Table of Contents Preface xi 1 Introduction 1 1.1 Preliminaries
JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers
JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers Technology White Paper JStatCom Engineering, www.jstatcom.com by Markus Krätzig, June 4, 2007 Abstract JStatCom is a software framework
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
Linux Process Scheduling Policy
Lecture Overview Introduction to Linux process scheduling Policy versus algorithm Linux overall process scheduling objectives Timesharing Dynamic priority Favor I/O-bound process Linux scheduling algorithm
Chapter 1: Operating System Models 1 2 Operating System Models 2.1 Introduction Over the past several years, a number of trends affecting operating system design are witnessed and foremost among them is
FIVE SIGNS YOU NEED HTML5 WEBSOCKETS
FIVE SIGNS YOU NEED HTML5 WEBSOCKETS A KAAZING WHITEPAPER Copyright 2011 Kaazing Corporation. All rights reserved. FIVE SIGNS YOU NEED HTML5 WEBSOCKETS A KAAZING WHITEPAPER HTML5 Web Sockets is an important
Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming
Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming Java has become enormously popular. Java s rapid rise and wide acceptance can be traced to its design
White Paper. Optimizing the Performance Of MySQL Cluster
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
Distributed R for Big Data
Distributed R for Big Data Indrajit Roy HP Vertica Development Team Abstract Distributed R simplifies large-scale analysis. It extends R. R is a single-threaded environment which limits its utility for
IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
System Structures. Services Interface Structure
System Structures Services Interface Structure Operating system services (1) Operating system services (2) Functions that are helpful to the user User interface Command line interpreter Batch interface
Adobe LiveCycle Data Services 3 Performance Brief
Adobe LiveCycle ES2 Technical Guide Adobe LiveCycle Data Services 3 Performance Brief LiveCycle Data Services 3 is a scalable, high performance, J2EE based server designed to help Java enterprise developers
A Look through the Android Stack
A Look through the Android Stack A Look through the Android Stack Free Electrons Maxime Ripard Free Electrons Embedded Linux Developers c Copyright 2004-2012, Free Electrons. Creative Commons BY-SA 3.0
Japan Communication India Skill Development Center
Japan Communication India Skill Development Center Java Application System Developer Course Detail Track 2a Java Application Software Developer: Phase1 SQL Overview 70 Introduction Database, DB Server
CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson
CS 3530 Operating Systems L02 OS Intro Part 1 Dr. Ken Hoganson Chapter 1 Basic Concepts of Operating Systems Computer Systems A computer system consists of two basic types of components: Hardware components,
Parallel Computing for Data Science
Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint
PERFORMANCE COMPARISON OF COMMON OBJECT REQUEST BROKER ARCHITECTURE(CORBA) VS JAVA MESSAGING SERVICE(JMS) BY TEAM SCALABLE
PERFORMANCE COMPARISON OF COMMON OBJECT REQUEST BROKER ARCHITECTURE(CORBA) VS JAVA MESSAGING SERVICE(JMS) BY TEAM SCALABLE TIGRAN HAKOBYAN SUJAL PATEL VANDANA MURALI INTRODUCTION Common Object Request
Big-data Analytics: Challenges and Opportunities
Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)
ProTrack: A Simple Provenance-tracking Filesystem
ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology [email protected] Abstract Provenance describes a file
Real-time Java Processor for Monitoring and Test
Real-time Java Processor for Monitoring and Test Martin Zabel, Thomas B. Preußer, Rainer G. Spallek Technische Universität Dresden {zabel,preusser,rgs}@ite.inf.tu-dresden.de Abstract This paper introduces
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
OPTIMIZING YOUR PI SDK APPLICATIONS BUILDERS' CAFÉ WEBINAR SERIES
OPTIMIZING YOUR PI SDK APPLICATIONS BUILDERS' CAFÉ WEBINAR SERIES PRESENTERS Jay Lakumb Product Manager Charlie Henze Software Development Lead Cristóbal Escamilla vcampus Team Member 2 VCAMPUS-EXCLUSIVE
Java Environment for Parallel Realtime Development Platform Independent Software Development for Multicore Systems
Java Environment for Parallel Realtime Development Platform Independent Software Development for Multicore Systems Ingo Prötel, aicas GmbH Computing Frontiers 6 th of May 2008, Ischia, Italy Jeopard-Project:
SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems
SOFT 437 Software Performance Analysis Ch 5:Web Applications and Other Distributed Systems Outline Overview of Web applications, distributed object technologies, and the important considerations for SPE
Operating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
Cloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture
Review from last time CS 537 Lecture 3 OS Structure What HW structures are used by the OS? What is a system call? Michael Swift Remzi Arpaci-Dussea, Michael Swift 1 Remzi Arpaci-Dussea, Michael Swift 2
Kafka & Redis for Big Data Solutions
Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
