R, paralelización, datos masivos y aplicaciones web: ejemplos del uso de R en bioinformática

Size: px
Start display at page:

Download "R, paralelización, datos masivos y aplicaciones web: ejemplos del uso de R en bioinformática"

Transcription

1 R, C, R, paralelización, datos masivos y aplicaciones : ejemplos del uso de R en bioinformática Ramón Díaz-Uriarte Dept. Bioquímica Universidad Autónoma de Madrid Madrid, Spain rdiaz02@gmail.com III Jornadas de Usuarios de R 17-Noviembre-2011 (1 : 58)

2 License copyright R, C, This work is Copyright, c, 2011, Ramón Díaz-Uriarte, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To view a copy of this license, visit by-nc-sa/3.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. ***************************** Please, respect the copyright. This material is provided freely, if you use it, I only ask that you use it according to the (very permissive) terms of the license: attribution, non-comercial use, a share alike license. If you have any doubts, ask me. (2 : 58)

3 Outline R, C, R, C, (3 : 58)

4 Biological context Computational context R, C, Biological context Computational context R, C, (4 : 58)

5 Chromosomes Biological context Computational context R, C, From the Wikipedia; original source Glossary/Illustration/karyotype.shtml (5 : 58)

6 DNA protein Biological context Computational context R, C, (From O. Rueda s PhD Thesis) (6 : 58)

7 Data from a microarray experiment Biological context Computational context R, C, Slide from Gema Moreno Bueno, Department of Biochemistry, UAM (7 : 58)

8 More microarray data Biological context Computational context R, C, Modified from students/peter_cock/r/heatmap/scaled_color_key.png (8 : 58)

9 DNA protein Biological context Computational context R, C, (From O. Rueda s PhD Thesis) (9 : 58)

10 Log 2 (Ratio) R en acgh Barrett et al., 2004 Biological context Computational context Calling gains losses: hypothesis testing R, C, Hupe & Barillot, 2005 Inferring number of copy gains/losses: estimation Olshen, 2005 Arrays: a dot is a DNA fragment. Each array a sample. Each array all chromosomes. (For analysis, location in chromosome matters) Chromosome (10 : 58)

11 Data, data, data (in Gigabytes) Biological context Computational context R, C, Expression arrays (mrna) > 40,000 probes Copy number with acgh > 400,000 common; some > 4 x (11 : 58)

12 Biological context Computational context R, C, Multicores computing clusters Increases in CPU speed slowed down (< 20% per year since 2002). Increase in the number of cores : 2, 4, 8. Next 10 years? Inexpensive computing clusters with off-the-shelf components. Must design our programs from the start: parallel programming Image from (12 : 58)

13 R, C, R, C, (13 : 58)

14 Stalone Statistical Computing in Bioinformatics R, C, Develop statistical methods Implement for statisticians bioinformaticians Implement existing approaches Implement for wet lab users - Parallel Computing - Fault tolerance Web apps: - User friendly - No installation Increased speed (40x - 60x) - Statistical rigour - Best practices (14 : 58)

15 R R, C, Code available for many procedures (but a few years ago none parallelized!) Many computations embarrassingly parallelizable: bootstrapping cross-validation arrays (or samples) arrays by chromosomes parallel chains in MCMC Figure production can be parallelized (15 : 58)

16 R R, C, (Implement missing functionality: R/C) MPI: R packages Rmpi, papply, snow, snowfall Load balanced Wrappers over mid level functions in package: ease updating Parallelize: Bootstrap samples/cross-val. runs. arrays arrays by chromosomes (or a combination of both) (16 : 58)

17 Is it worth it? R, C, Are speed improvements really worth the effort? Over what range of problems do see improvements? With what hardware can we see improvements? (17 : 58)

18 What do we gain? HMM 20,000 genes GLAD CBS BioHMM Web applications Large 5000 data sets 2500 User wall time (seconds) User wall time (seconds) R, C, Sequential compression 1000 on Parallelized Number of arrays (samples) (18 : 58)

19 What do we gain? R, C, Are speed improvements really worth the effort? Your effort: R CMD INSTALL ADaCGH2. Over what range of problems do see improvements? 10 to 10 3 arrays/samples; 10 4 to 10 6 spots/genes. With what hardware can we see improvements? 2 cores to 120 cores. Smaller clusters: more cost effective Single node/multi-core: less communication overhead (19 : 58)

20 Where is this running? R, C, varselrf (CRAN) ADaCGH2 (BioConductor) SignS (launchpad: (20 : 58)

21 Web apps: how R, C, Web apps: how R, C, (21 : 58)

22 Applications for wet lab researchers Web apps: how R, C, Analyze data in a reasonably short time. User friendly access to methods that are statistically rigorous. (22 : 58)

23 Web apps: how R, C, Web-based applications User-friendly interface. No hardware/software hassles for end users. Parallelization is transparent. Method selection can be partially transferred (to us). Short user wall time: use (hardware/software) resources rarely available to individual biomedical researchers Just type in a URL: Image modified from (23 : 58)

24 Sometimes collaborations feel like... Web apps: how R, C, (From (24 : 58)

25 Parallelization in applications Statistical Computing in Bioinformatics Web apps: how Develop statistical methods Implement existing approaches R, C, Implement for statisticians bioinformaticians - Parallel Computing - Fault tolerance Implement for wet lab users Web apps: - User friendly - No installation Increased speed (40x - 60x) - Statistical rigour - Best practices (25 : 58)

26 Main -based applications Dealing with raw data Remove artifacts from microarrays - Missing data - Replicate spots Web apps: how DNMAD prep R, C, Segment acgh Molecular signatures survival data Differentially expressed genes Statistical analysis (sensu stricto) Select genes for classification WaviCGH ADaCGH SignS Pomelo_II Tnasas GeneSrF Annotation Interpretation Interpret results IDClight PaLS (26 : 58)

27 Web apps: how R, C, User wall time (seconds) What do we gain? GLAD CBS genes, 40 arrays (27 : 58) CGHseg HMM Number of simultaneous users

28 How it works: some key ideas Web apps: how R, C, Each run Parallelization (transparent for users) Fault-tolerance (network problems, machine crashes, bugs) Check-pointing Periodic tasks (keep system running 24h, 365 d) Automatic monitorization Automated testing suite (28 : 58)

29 What happens Web apps: how R, C, User Head node (LVS): Send request to one of the servers. - Setting up LAM/MPI - Starting R - Fault tolerance - Checking termination of R - Checking run errors - Formatting output CGI: data checking, file upload Execution: Python program R program Autorefreshing HTML until final results Sequential Parallelized (29 : 58)

30 What happens: details User Web apps: how Monitor R execution Maintain R process counters CGI Read data Head node (LVS) Server 2 (Master) Apache Create MPI universe Launch R, Rmpi R, C, Is R done? Yes No Return autorefreshing page (slave) (slave) (slave) Yes Server n Server 1 Server 3 Continue R execution till end Rmpi started OK? Not after K attempts No Stop execution Halt MPI universe Return error Halt MPI universe Produce return results pages (30 : 58)

31 MPI details Sleep No Can we run? (Count other lam daemons) NFS shared storage Verify servers (modify LAM defs) Web apps: how Yes Boot (new) LAM/MPI R, C, Start R: continue from last checkpoint Segmentation Figures (over subjects chrom.). Sleep No No Yes MPI universe: Servers 1... n Run out of time? Are we done? R crashed (bugs)? Rmpi crashed? LAM/MPI/nodes crashed? Yes Halt MPI universe Produce return results pages NFS shared temporary storage (31 : 58)

32 Where is this running? Web apps: how R, C, (32 : 58)

33 R, C, R, C, (33 : 58)

34 Log 2 (Ratio) R en acgh Barrett et al., 2004 Calling gains losses: hypothesis testing R, C, Hupe & Barillot, 2005 Inferring number of copy gains/losses: estimation Olshen, 2005 Arrays: a dot is a DNA fragment. Each array a sample. Each array all chromosomes. (For analysis, location in chromosome matters) Chromosome (34 : 58)

35 R, C, Millions of spots Hundreds or thouss of subjects. No need to hold everything in RAM at once. Package ff: memory-efficient storage of large data on disk fast access functions. Combined with: shared storage (35 : 58)

36 ff R, C, ff stores the object on disk. Read that object from various R processes. Different R processes can write in different ff objects (36 : 58)

37 ff (I) Common ff object R, C, write read only R 1 R 2 R n ff 1 ff 2 ff n (37 : 58)

38 ff (I) Common ff object R, C, write read only R 1 R 2 R n ff 1 ff 2 ff n R master (37 : 58)

39 ff (I) Common ff object R, C, write read only R 1 R 2 R n ff 1 ff 2 ff n R master ff all (37 : 58)

40 ff (II) write Data R master read only R, C, ff in (38 : 58)

41 ff (II) write Data R master read only (multicore) R, C, ff in (38 : 58)

42 ff (II) write Data R master read only R, C, ff in R 1 R 2 R n ff 1 ff 2 ff n (38 : 58)

43 ff (II) write Data R master R, C, read only i 1 ff in i 2 i n R 1 R 2 R n (38 : 58)

44 ff (II) write Data R master R, C, read only i 1 ff in i 2 i n R 1 R 2 R n (38 : 58)

45 ff (II) write Data R master read only R, C, ff in R 1 R 2 R n Fig.1 Fig.2 Fig.n ff 1 ff 2 ff n (38 : 58)

46 ff (II) write Data R master read only R, C, ff in R 1 R 2 R n Fig.1 Fig.2 Fig.n ff 1 ff 2 ff n (38 : 58)

47 ff (II) ff out Results write Data R master read only R, C, ff in R 1 R 2 R n Fig.1 Fig.2 Fig.n ff 1 ff 2 ff n (38 : 58)

48 Where is this running? R, C, ADaCGH2 (BioConductor package) Web-based application (39 : 58)

49 R, C, R, C, (40 : 58)

50 Log 2 (Ratio) R en acgh Barrett et al., 2004 Calling gains losses: hypothesis testing R, C, Hupe & Barillot, 2005 Inferring number of copy gains/losses: estimation Olshen, 2005 Arrays: a dot is a DNA fragment. Each array a sample. Each array all chromosomes. (For analysis, location in chromosome matters) Chromosome (41 : 58)

51 Store access (large) pre-computed results R, C, HMM for acgh data with Reversible Jump: Viterbi Common regions: count on the Viterbi paths. Fitting HMM/common regions: distinct operations. C: number-crunching. R: wrapper figures/tables. C: creates large amounts of data. In package RJaCGH (CRAN). (42 : 58)

52 R Fit HMM C (HMM) Store Viterbi as gzipped file R, C, (43 : 58)

53 R, C, R Fit HMM C (HMM) return filenames Store Viterbi as gzipped file (43 : 58)

54 R Fit HMM C (HMM) return filenames Store Viterbi as gzipped file R, C, Find common regions R pass filenames C (common regions) (43 : 58)

55 R Fit HMM C (HMM) return filenames Store Viterbi as gzipped file R, C, Find common regions R pass filenames C (common regions) return results Read Viterbi data Figures, tables (43 : 58)

56 R, C, R, C, (44 : 58)

57 Web-based: A few things we ve learned R, C, Configuration sucks (if you need to modify > 1 file) Too many languages Adding test cases to the testing suites:, R Documentation: in the, pages, L A T E X... Too much R to catch errors User interfaces: who designs them? (45 : 58)

58 Fault tolerance communication Manual check for errors (R ain t Erlang) Too much network traffic NFS shared storage Verify servers (modify LAM defs) Boot (new) LAM/MPI Start R: continue from last checkpoint Sleep Yes R, C, No No MPI universe: Servers 1... n Run out of time? Are we done? R crashed (coding errors)? Rmpi crashed? LAM/MPI crashed? (includes node crashes) Yes Halt MPI universe Produce return results pages NFS shared temporary storage (46 : 58)

59 Solutions? R, C, Literate programming org-mode Alternatives to MPI /or use Erlang... Keep things as they are (only a few painful events a year) (47 : 58)

60 Single machine applications R, C, Run applications in a single, multicore (e.g., 12) machine Just verify if the machine is up (48 : 58)

61 Rethinking -based applications Users can get into trouble. R, C, (49 : 58)

62 Rethinking -based applications R, C, Users can get into trouble. Sure, but we can do a good job... Provide state-of-the art statistical computational approaches (R ;-) Pedagogical examples pipelines Minimize the chance of users getting into trouble (49 : 58)

63 Rethinking -based applications R, C, Users can get into trouble. Sure, but we can do a good job... Provide state-of-the art statistical computational approaches (R ;-) Pedagogical examples pipelines Minimize the chance of users getting into trouble Web-based applications are here to stay (49 : 58)

64 ... so... R, C, Forget about them: just write R (plus C, etc) Go for it R will do its job (which is only part of the job) HPC available No problem with large data sets But other tools work necessary (50 : 58)

65 Regardless of -based applications... R, C, Parallel computing can be used routinely (library(parallel) in R ) with ff +. (51 : 58)

66 Acknowledgements R, C, O. M. Rueda, A. Alibés, A. Cañada, E. R. Morrissey, M. L. Neves, D. Rico. Funding: Fundación de Investigación Médica Mutua Madrileña, Project TIC C02-02 of the Spanish MEC BIO of the Spanish MICINN. Ramón y Cajal Programme of the Spanish Ministry of Education Science. CNIO (Spanish National Cancer Research Center). The R users developers for a vibrant statistical computing community amazing platform. The organizing scientific committees of the III Jornadas de Usuarios de R. (52 : 58)

67 Trying to fit it all Statistical Computing in Bioinformatics R, C, Implement existing approaches Implement for statisticians bioinformaticians - Parallel Computing - Fault tolerance Develop statistical methods Implement for wet lab users Web apps: - User friendly - No installation Information integration Increased speed (40x - 60x) - Statistical rigour - Best practices Answer biological question (53 : 58)

68 What do we gain? Effect of number of arrays (number of genes = 7399) Effect of number of genes (number of arrays = 160) Fold increase in speed (relative to 1 CPU) R, C, compression 5.0 on future, 2.0 et al Fold increase in speed (relative to 1 CPU) Original Sequential New sequential (1 CPU) Parall. 2 CPUs Parall. 10 CPUs Parall. 20 CPUs Parall. 60 CPUs Number of arrays (samples) (54 : 58) Number of genes

69 What do we gain? R en Breast data set (78 arrays x 4751 genes) DLBCL data set (160 arrays x 7399 genes) 2500 Large data 2000sets User wall time (seconds) R, C, User wall time (seconds) Number of simultaneous users Number of simultaneous users (55 : 58)

70 Tools... Org mode Literate programming R, Python, C/C++ Interactive debugging Web app. frameworks Bazaar-NG Commit hooks R, C, Admin. ConfigFiles WebHelp FunkLoad: - Whole system testing - Regression testing - Stress testing ''Continuous Integration'' InstallScript Distributed computing Parallel. (MPI, others) Grid computing Web services (56 : 58)

71 R, C, Too many languages Impedance mismatch problem: Building Web-based applications requires the mastering of a number of languages/technologies (e.g. HTML, CSS, CGI, ASP, PHP, XML, etc..). Such languages technologies were created to address different aspects on a by-need evolutionary manner. The result is a plethora of tools that are fitted together in an ad hoc fashion. El-Ansary, Grolaux, Van Roy, Rafea (2005) Overcoming the Multiplicity of Languages Technologies for Web-Based Development Using a Multi-paradigm Approach. R C HTML Python: CGI, data entry, display Python ( others): control monitor MPI Javascript: AJAX figures (57 : 58)

72 Other solutions? R, C, Too many languages Use languages designed to overcome this problem: Hop, Links, QHTML. Fault tolerance too much traffic Alternatives to MPI? Linda tuple spaces (also between-language funct.) Roll-our-own based on Rserve Have Erlang control R processes? (58 : 58)

R, paralelización, datos masivos y aplicaciones web: ejemplos del uso de R en bioinformática

R, paralelización, datos masivos y aplicaciones web: ejemplos del uso de R en bioinformática R, C, R, paralelización, datos masivos y aplicaciones : ejemplos del uso de R en bioinformática Ramón Díaz-Uriarte Dept. Bioquímica Universidad Autónoma de Madrid Madrid, Spain rdiaz02@gmail.com http://ligarto.org/rdiaz

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Red Hat Enterprise linux 5 Continuous Availability

Red Hat Enterprise linux 5 Continuous Availability Red Hat Enterprise linux 5 Continuous Availability Businesses continuity needs to be at the heart of any enterprise IT deployment. Even a modest disruption in service is costly in terms of lost revenue

More information

Guideline for stresstest Page 1 of 6. Stress test

Guideline for stresstest Page 1 of 6. Stress test Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

CLUSTER COMPUTING TODAY

CLUSTER COMPUTING TODAY David Chappell June 2011 CLUSTER COMPUTING TODAY WHAT S CHANGED AND WHY IT MATTERS Sponsored by Microsoft Corporation Copyright 2011 Chappell & Associates One way to make an application run faster is to

More information

Hardware/Software Guidelines

Hardware/Software Guidelines There are many things to consider when preparing for a TRAVERSE v11 installation. The number of users, application modules and transactional volume are only a few. Reliable performance of the system is

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Building highly available systems in Erlang. Joe Armstrong

Building highly available systems in Erlang. Joe Armstrong Building highly available systems in Erlang Joe Armstrong How can we get 10 nines reliability? Why Erlang? Erlang was designed to program fault-tolerant systems Overview n Types of HA systems n Architecture/Algorithms

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Abstract. Description

Abstract. Description Project title: Bloodhound: Dynamic client-side autocompletion features for the Apache Bloodhound ticket system Name: Sifa Sensay Student e-mail: sifasensay@gmail.com Student Major: Software Engineering

More information

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person

More information

Drupal Performance Tuning

Drupal Performance Tuning Drupal Performance Tuning By Jeremy Zerr Website: http://www.jeremyzerr.com @jrzerr http://www.linkedin.com/in/jrzerr Overview Basics of Web App Systems Architecture General Web

More information

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1 Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems

More information

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Performance Testing

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Performance Testing STeP-IN SUMMIT 2014 11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Mobile Performance Testing by Sahadevaiah Kola, Senior Test Lead and Sachin Goyal

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

co Characterizing and Tracing Packet Floods Using Cisco R

co Characterizing and Tracing Packet Floods Using Cisco R co Characterizing and Tracing Packet Floods Using Cisco R Table of Contents Characterizing and Tracing Packet Floods Using Cisco Routers...1 Introduction...1 Before You Begin...1 Conventions...1 Prerequisites...1

More information

SQL Server Performance Tuning for DBAs

SQL Server Performance Tuning for DBAs ASPE IT Training SQL Server Performance Tuning for DBAs A WHITE PAPER PREPARED FOR ASPE BY TOM CARPENTER www.aspe-it.com toll-free: 877-800-5221 SQL Server Performance Tuning for DBAs DBAs are often tasked

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance. Sunny Gleason COM S 717

Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance. Sunny Gleason COM S 717 Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance Sunny Gleason COM S 717 December 17, 2001 0.1 Introduction The proliferation of large-scale

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

HPC Growing Pains. Lessons learned from building a Top500 supercomputer HPC Growing Pains Lessons learned from building a Top500 supercomputer John L. Wofford Center for Computational Biology & Bioinformatics Columbia University I. What is C2B2? Outline Lessons learned from

More information

Universidad Simón Bolívar

Universidad Simón Bolívar Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García, Pedro Universidad Simón Bolívar In 1999, a couple of projects from USB received funding

More information

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015 7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan aidhog@gmail.com Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE

More information

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis http://open2dprot.sourceforge.net/ Revised 2-05-2006 * (cf. 2D-LC) Introduction There is a need for integrated proteomics

More information

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Scalability of web applications. CSCI 470: Web Science Keith Vertanen Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches

More information

Data Mining and Machine Learning in Bioinformatics

Data Mining and Machine Learning in Bioinformatics Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

HPC performance applications on Virtual Clusters

HPC performance applications on Virtual Clusters Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK pkritika@epcc.ed.ac.uk 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensie Computing Uniersity of Florida, CISE Department Prof. Daisy Zhe Wang Map/Reduce: Simplified Data Processing on Large Clusters Parallel/Distributed

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness

More information

Rweb: Web-based Statistical Analysis

Rweb: Web-based Statistical Analysis Rweb: Web-based Statistical Analysis Jeff Banfield Department of Mathematical Science Montana State University Bozeman, MT 59717 Abstract Rweb is a freely accessible statistical analysis environment that

More information

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog

More information

Learning Objectives. Chapter 1: Networking with Microsoft Windows 2000 Server. Basic Network Concepts. Learning Objectives (continued)

Learning Objectives. Chapter 1: Networking with Microsoft Windows 2000 Server. Basic Network Concepts. Learning Objectives (continued) Chapter 1: Networking with Microsoft Learning Objectives Plan what network model to apply to your network Compare the differences between Windows 2000 Professional, Server, Advanced Server, and Datacenter

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1

More information

Guide to SATA Hard Disks Installation and RAID Configuration

Guide to SATA Hard Disks Installation and RAID Configuration Guide to SATA Hard Disks Installation and RAID Configuration 1. Guide to SATA Hard Disks Installation...2 1.1 Serial ATA (SATA) Hard Disks Installation...2 2. Guide to RAID Configurations...3 2.1 Introduction

More information

PARALLELS SERVER 4 BARE METAL README

PARALLELS SERVER 4 BARE METAL README PARALLELS SERVER 4 BARE METAL README This document provides the first-priority information on Parallels Server 4 Bare Metal and supplements the included documentation. TABLE OF CONTENTS 1 About Parallels

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju Chapter 7: Distributed Systems: Warehouse-Scale Computing Fall 2011 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note:

More information

Intellicus Enterprise Reporting and BI Platform

Intellicus Enterprise Reporting and BI Platform Intellicus Cluster and Load Balancer Installation and Configuration Manual Intellicus Enterprise Reporting and BI Platform Intellicus Technologies info@intellicus.com www.intellicus.com Copyright 2012

More information

Guide to SATA Hard Disks Installation and RAID Configuration

Guide to SATA Hard Disks Installation and RAID Configuration Guide to SATA Hard Disks Installation and RAID Configuration 1. Guide to SATA Hard Disks Installation... 2 1.1 Serial ATA (SATA) Hard Disks Installation... 2 2. Guide to RAID Configurations... 3 2.1 Introduction

More information

Software Execution Protection in the Cloud

Software Execution Protection in the Cloud Software Execution Protection in the Cloud Miguel Correia 1st European Workshop on Dependable Cloud Computing Sibiu, Romania, May 8 th 2012 Motivation clouds fail 2 1 Motivation accidental arbitrary faults

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

SQL Best Practices for SharePoint admins, the reluctant DBA. ITP324 Todd Klindt

SQL Best Practices for SharePoint admins, the reluctant DBA. ITP324 Todd Klindt SQL Best Practices for SharePoint admins, the reluctant DBA ITP324 Todd Klindt Todd Klindt, MVP Solanite Consulting, Inc. http://www.solanite.com http://www.toddklindt.com/blog todd@solanite.com Author,

More information

Intro to Map/Reduce a.k.a. Hadoop

Intro to Map/Reduce a.k.a. Hadoop Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by

More information

Hadoop. Bioinformatics Big Data

Hadoop. Bioinformatics Big Data Hadoop Bioinformatics Big Data Paolo D Onorio De Meo Mattia D Antonio p.donoriodemeo@cineca.it m.dantonio@cineca.it Big Data Too much information! Big Data Explosive data growth proliferation of data capture

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

Scientific and Technical Applications as a Service in the Cloud

Scientific and Technical Applications as a Service in the Cloud Scientific and Technical Applications as a Service in the Cloud University of Bern, 28.11.2011 adapted version Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2 DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.

More information

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels technology brief RAID Levels March 1997 Introduction RAID is an acronym for Redundant Array of Independent Disks (originally Redundant Array of Inexpensive Disks) coined in a 1987 University of California

More information

Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using

Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using Amazon Web Services rather than setting up a physical server

More information

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Hadoop Distributed File System Propagation Adapter for Nimbus

Hadoop Distributed File System Propagation Adapter for Nimbus University of Victoria Faculty of Engineering Coop Workterm Report Hadoop Distributed File System Propagation Adapter for Nimbus Department of Physics University of Victoria Victoria, BC Matthew Vliet

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang Distributed RAID Architectures for Cluster I/O Computing Kai Hwang Internet and Cluster Computing Lab. University of Southern California 1 Presentation Outline : Scalable Cluster I/O The RAID-x Architecture

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Integrity Checking and Monitoring of Files on the CASTOR Disk Servers

Integrity Checking and Monitoring of Files on the CASTOR Disk Servers Integrity Checking and Monitoring of Files on the CASTOR Disk Servers Author: Hallgeir Lien CERN openlab 17/8/2011 Contents CONTENTS 1 Introduction 4 1.1 Background...........................................

More information

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence Exploring Oracle E-Business Suite Load Balancing Options Venkat Perumal IT Convergence Objectives Overview of 11i load balancing techniques Load balancing architecture Scenarios to implement Load Balancing

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications

Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications by Samuel D. Kounev (skounev@ito.tu-darmstadt.de) Information Technology Transfer Office Abstract Modern e-commerce

More information

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad Test Run Analysis Interpretation (AI) Made Easy with OpenLoad OpenDemand Systems, Inc. Abstract / Executive Summary As Web applications and services become more complex, it becomes increasingly difficult

More information

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

CrashPlan PRO Enterprise Backup

CrashPlan PRO Enterprise Backup CrashPlan PRO Enterprise Backup People Friendly, Enterprise Tough CrashPlan PRO is a high performance, cross-platform backup solution that provides continuous protection onsite, offsite, and online for

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Wikimedia architecture. Mark Bergsma <mark@wikimedia.org> Wikimedia Foundation Inc.

Wikimedia architecture. Mark Bergsma <mark@wikimedia.org> Wikimedia Foundation Inc. Mark Bergsma Wikimedia Foundation Inc. Overview Intro Global architecture Content Delivery Network (CDN) Application servers Persistent storage Focus on architecture, not so much on

More information

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf

Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant

More information

Server and Storage Sizing Guide for Windows 7 TECHNICAL NOTES

Server and Storage Sizing Guide for Windows 7 TECHNICAL NOTES Server and Storage Sizing Guide for Windows 7 TECHNICAL NOTES Table of Contents About this Document.... 3 Introduction... 4 Baseline Existing Desktop Environment... 4 Estimate VDI Hardware Needed.... 5

More information

WINDOWS AZURE AND WINDOWS HPC SERVER

WINDOWS AZURE AND WINDOWS HPC SERVER David Chappell March 2012 WINDOWS AZURE AND WINDOWS HPC SERVER HIGH-PERFORMANCE COMPUTING IN THE CLOUD Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents High-Performance

More information

JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing RESEARCH ARTICLE JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing David K. Brown, David L. Penkler, Thommas M. Musyoka, Özlem Tastan Bishop*

More information

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation Overview of I/O Performance and RAID in an RDBMS Environment By: Edward Whalen Performance Tuning Corporation Abstract This paper covers the fundamentals of I/O topics and an overview of RAID levels commonly

More information

Enterprise Service Bus

Enterprise Service Bus We tested: Talend ESB 5.2.1 Enterprise Service Bus Dr. Götz Güttich Talend Enterprise Service Bus 5.2.1 is an open source, modular solution that allows enterprises to integrate existing or new applications

More information

Google App Engine. Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008

Google App Engine. Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008 Google App Engine Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008 Google App Engine Does one thing well: running web apps Simple app configuration Scalable Secure 2 App Engine Does One Thing Well

More information

Hadoop Distributed File System (HDFS) Overview

Hadoop Distributed File System (HDFS) Overview 2012 coreservlets.com and Dima May Hadoop Distributed File System (HDFS) Overview Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized

More information

v7.1 Technical Specification

v7.1 Technical Specification v7.1 Technical Specification Copyright 2011 Sage Technologies Limited, publisher of this work. All rights reserved. No part of this documentation may be copied, photocopied, reproduced, translated, microfilmed,

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

RESEARCH GOAL. Enable BOINC to efficiently support apps that require interprocess communication.

RESEARCH GOAL. Enable BOINC to efficiently support apps that require interprocess communication. BOINC Workshop 11, Eshwar Rohit Supervisors: Dr. Jaspal Subhlok Dr. David P. Anderson SSL U.C, Berkeley RESEARCH GOAL Enable BOINC to efficiently support apps that require interprocess communication. Goals:

More information

SiteCelerate white paper

SiteCelerate white paper SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

Fundamentals of LoadRunner 9.0 (2 Days)

Fundamentals of LoadRunner 9.0 (2 Days) Fundamentals of LoadRunner 9.0 (2 Days) Quality assurance engineers New users of LoadRunner who need to load test their applications and/or executives who will be involved in any part of load testing.

More information

NTP Software File Auditor for Windows Edition

NTP Software File Auditor for Windows Edition NTP Software File Auditor for Windows Edition An NTP Software Installation Guide Abstract This guide provides a short introduction to installation and initial configuration of NTP Software File Auditor

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Accelerating Wordpress for Pagerank and Profit

Accelerating Wordpress for Pagerank and Profit Slide No. 1 Accelerating Wordpress for Pagerank and Profit Practical tips and tricks to increase the speed of your site, improve conversions and climb the search rankings By: Allan Jude November 2011 Vice

More information

Tandberg Data AccuVault RDX

Tandberg Data AccuVault RDX Tandberg Data AccuVault RDX Binary Testing conducts an independent evaluation and performance test of Tandberg Data s latest small business backup appliance. Data backup is essential to their survival

More information

Load Balancing and Clustering in EPiServer

Load Balancing and Clustering in EPiServer Load Balancing and Clustering in EPiServer Abstract This white paper describes the main differences between load balancing and clustering, and details EPiServer's possibilities of existing in a clustered

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

Parallels. Clustering in Virtuozzo-Based Systems

Parallels. Clustering in Virtuozzo-Based Systems Parallels Clustering in Virtuozzo-Based Systems (c) 1999-2008 2 C HAPTER 1 This document provides general information on clustering in Virtuozzo-based systems. You will learn what clustering scenarios

More information

Parallels Virtual Automation 6.1

Parallels Virtual Automation 6.1 Parallels Virtual Automation 6.1 Installation Guide for Windows April 08, 2014 Copyright 1999-2014 Parallels IP Holdings GmbH and its affiliates. All rights reserved. Parallels IP Holdings GmbH. c/o Parallels

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

History of Disaster - The BioWare Community Site

History of Disaster - The BioWare Community Site Writing a social application in PHP/ MySQL and what happens when a million people show up on opening day " Duleepa Dups Wijayawardhana MySQL Community Team "!!"#$%&#'()*#+(,-.$/#+*0#,-$1#-2 Who the hell

More information

Web Application Threats and Vulnerabilities Web Server Hacking and Web Application Vulnerability

Web Application Threats and Vulnerabilities Web Server Hacking and Web Application Vulnerability Web Application Threats and Vulnerabilities Web Server Hacking and Web Application Vulnerability WWW Based upon HTTP and HTML Runs in TCP s application layer Runs on top of the Internet Used to exchange

More information

Disaster Recovery Remote off-site Storage for single server environment

Disaster Recovery Remote off-site Storage for single server environment . White Paper Disaster Recovery Remote off-site Storage for single server environment When it comes to protecting your data there is no second chance January 1, 200 Prepared by: Bill Schmidley CompassPoint

More information