Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich
|
|
- Winfred Clarke
- 8 years ago
- Views:
Transcription
1 Technical Challenges for Big Health Care Data Donald Kossmann Systems Group Department of Computer Science ETH Zurich
2 What is Big Data? technologies to automate experience Purpose answer difficult questions help make difficult decisions
3 Big Data Examples What is French translation of this presentation? How long does it take from Zurich to Vitznau? Where should Migros build next supermarket? Should my bank give Donald a mortgage? Often the answer is not precise more data (experience) improves results more data (experience) good for corner cases
4 Big Data in Health Care Understanding vs. Experience Doctor: Please, take this drug. Patient: Why? Doctor: Because it works. Precision Medicine Question: find right therapy for individual patient too complex to understand the answer patients cannot wait for science more data -> more coverage of multi-morbid helps socialize health care: we are all equally rich
5 Bilder: istockphoto, Fotolia Cloud Era ERP 5
6 Challenges and Concerns Cost Where to store the data? Which technologies to use? How to get answers in most efficient way? Value Which data worth keeping? How to clense my data? Which questions make sense? Which answers? Security How to protect my data (internal and external attackers)? How do I collaborate with others who have data? Goal: Get all three aspects right! 6
7 What is special about Health Big Data? Cost data is in silos; needs to be (logically) centralized first data integration difficult due to lack of standards Value data is incomplete and imprecise (subjective statements) fatal consequences of mis-predictions Security give control to patients yet allow global analysis
8 Agenda A Sharing Architecture Processing Encrypted DAta
9 Data Silos Today clients clients clients clients clients HTTP(S) App Server App Server App Server App Server App Server SQL USZ USZ Dr. X Dr. X
10 Pros and Cons of Silos Pros Cost: proven technology Value: isolation of resources Security: well understood (as we will see) Cons Cost: expensive because no sharing of resources Value: cross-silo analytics difficult / impossible Security: on-premise security less rigorous
11 Shared Data Architecture (Cloud) clients clients clients clients clients HTTP(S) App Server App Server App Server App Server App Server SQL QP QP QP QP Storage Storage Storage
12 Metaphor Compare these two architectures with Shopping Mall vs. Amazon
13 Optimizing Shared Data Architecture In Shared Memory, access pattern is diffuse everybody wildly accesses all data nodes If indexes do not work, then optimize scans optimize for the worst case Looking at many queries, much data is relevant batch queries and kill many birds with one stone tradeoff between latency and throughput control latency (SLAs) by partitioning the data
14 Agenda A Sharing Architecture Processing Encrypted Data
15 Data in the Cloud Donald Dirk Joachim Köln Leipzig Bonn Data are not encrypted Great to process the data in the cloud Little protection against attackers 15
16 Encryption in the Cloud Joachim Donald Dirk ax$!2 A!(1T %&!ez Xz6!! Data is encrypted in the cloud. Confidentiality: Data protected against attackers. Cost: all processing on-premise with data shipping. Utility: analysis involves loss of sovereignity of data.
17 Cipherbase: Secure Co-processor Joachim %&!e% Xz6!! All computation done on secure hardware in the cloud Confidentiality: (visible) data encrypted at all times. Cost: special algorithms for tight integration. Utility: support all operations. (across silos next slide.)
18 Analytics Across Silos Joachim Donald Dirk ax$!2 %&!e% Xz6!! Donald & Joachim authorize Dirk to run specific query. Dirk only sees aggregated results. No raw data. Donald and Joachim only see their own data.
19 Why trust trusted Hardware? Three options Dedicated co-processors: e.g., IBM 4970 Extensions to commodity processors: Intel SGX Custom hardware: FPGAs We chose FPGAs no operating system (less software to trust) open source the layout available and cheap
20 Summary General ICT Trends cloud computing: industrialization of computing digitallly-born data data and infrastructure sharing Huge opportunities Big Data: answer questions we cannot answer today Role of Technology help navigate cost vs. value vs. security triangle Big Data in Health Care leverages general trends (some specifics) important to manage expectations
21 Crescando Storage Manager (idisk) A distributed (relational) table: MM on NUMA horizontally partitioned distributed within and across machines Query / update interface SELECT * FROM table WHERE <any predicate> UPDATE table SET <anything> WHERE <any predicate> Some nice properties constant / predictable latency & data freshness solves the Amadeus use case support for Snapshot Isolation, monotonic writes
22 Design Operate MM like disk in shared-nothing architect. Core ~ Spindle (many cores per machine & data center) all data kept in main memory (log to disk for recovery) each core scans one partition of data all the time Batch queries and updates: shared scans do trivial MQO (at scan level on system with single table) control read/update pattern -> no data contention Index queries / not data just as in the stream processing world predictable+optimizable: rebuild indexes every second Updates are processed before reads
23 Crescando on 1 Machine (N Cores) Scan Thread Scan Thread Input Queue (Operations) Split Scan Thread Scan Thread Merge Output Queue (Result Tuples)... Input Queue (Operations) Scan Thread Output Queue (Result Tuples)
24 {record, {query-ids} } results is Predicate Indexes Queries + Upd. qs Unindexed Queries Active Queries Record 0 records Crescando on 1 Core Snapshot n Snapshot n+1 data partition Read Cursor Write Cursor
25 Scanning a Partition Record 0 Snapshot n+1 Snapshot n Read Cursor Write Cursor
26 Scanning a Partition Record 0 Snapshot n+1 Snapshot n Read Cursor Write Cursor Merge cursors
27 Scanning a Partition Record 0 Build indexes for next batch of queries and updates Snapshot n+1 Snapshot n Read Cursor Write Cursor Merge cursors
28 Implementation Details Optimization decide for batch of queries which indexes to build runs once every second (must be fast) Query + update indexes different indexes for different kinds of predicates e.g., hash tables, R-trees, tries,... must fit in L2 cache (better L1 cache) Probe indexes Updates in right order, queries in any order Persistence & Recovery Log updates / inserts to disk (not a bottleneck)
29 Crescando vs. MySQL - Throughput Amadeus workload, vary updates Synthetic read-only workload, vary skew
30 Cipherbase: Secure Co-processor Idea: Farm out computation on encrypted data to co-processor Most database work on commodity hardware (cheap & fast) Logging, Locking / Synchronization, Buffer Management, Scheduling etc. Expressions on encrypted or (partially) homomorphically encrypted data Secure co-processor evaluates expressions on encrypted data Arithmetic, Comparisons and Intrinsics (MIN, MAX etc.) Trusted Code Base easy to verify Cloud DBMS Query Parsing Buffer Pool Resource Scheduling Logging/ Recovery Transaction Manager Locking Query Execution Access Methods SQL OS TM Encryption Key Expression Evaluation
31 TPC-C Results Normalized Throughput Plaintext Customer Strong/Weak Strong/Strong Opt NoOpt
SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK
3/2/2011 SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK Systems Group Dept. of Computer Science ETH Zürich, Switzerland SwissBox Humboldt University Dec. 2010 Systems Group = www.systems.ethz.ch
More informationRackscale- the things that matter GUSTAVO ALONSO SYSTEMS GROUP DEPT. OF COMPUTER SCIENCE ETH ZURICH
Rackscale- the things that matter GUSTAVO ALONSO SYSTEMS GROUP DEPT. OF COMPUTER SCIENCE ETH ZURICH HTDC 2014 Systems Group = www.systems.ethz.ch Enterprise Computing Center = www.ecc.ethz.ch On the way
More informationCLOUD COMPUTING Y SU IMPACTO EN LA INFORMATICA
CLOUD COMPUTING Y SU IMPACTO EN LA INFORMATICA Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland www.systems.ethz.ch JISBD - 2010 1 Background ETH Zürich Systems Group
More informationIn-Memory Data Management for Enterprise Applications
In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University
More informationData Management in the Cloud
Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
More informationSQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation
SQL Server 2014 New Features/In- Memory Store Juergen Thomas Microsoft Corporation AGENDA 1. SQL Server 2014 what and when 2. SQL Server 2014 In-Memory 3. SQL Server 2014 in IaaS scenarios 2 SQL Server
More informationWhy compute in parallel? Cloud computing. Big Data 11/29/15. Introduction to Data Management CSE 344. Science is Facing a Data Deluge!
Why compute in parallel? Introduction to Data Management CSE 344 Lectures 23 and 24 Parallel Databases Most processors have multiple cores Can run multiple jobs simultaneously Natural extension of txn
More informationWhite Paper. Optimizing the Performance Of MySQL Cluster
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
More informationCloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015
Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational
More informationSCALABLE DATA SERVICES
1 SCALABLE DATA SERVICES 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview MySQL Database Clustering GlusterFS Memcached 3 Overview Problems of Data Services 4 Data retrieval
More informationOLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni
OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni Agenda Database trends for the past 10 years Era of Big Data and Cloud Challenges and Options Upcoming database trends Q&A Scope
More informationA1 and FARM scalable graph database on top of a transactional memory layer
A1 and FARM scalable graph database on top of a transactional memory layer Miguel Castro, Aleksandar Dragojević, Dushyanth Narayanan, Ed Nightingale, Alex Shamis Richie Khanna, Matt Renzelmann Chiranjeeb
More informationTips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier
Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier Simon Law TimesTen Product Manager, Oracle Meet The Experts: Andy Yao TimesTen Product Manager, Oracle Gagan Singh Senior
More informationScaling Analysis Services in the Cloud
Our Sponsors Scaling Analysis Services in the Cloud by Gerhard Brückl gerhard@gbrueckl.at blog.gbrueckl.at About me Gerhard Brückl Working with Microsoft BI since 2006 Windows Azure / Cloud since 2013
More informationModule 14: Scalability and High Availability
Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationMemory-Centric Database Acceleration
Memory-Centric Database Acceleration Achieving an Order of Magnitude Increase in Database Performance A FedCentric Technologies White Paper September 2007 Executive Summary Businesses are facing daunting
More informationThe Classical Architecture. Storage 1 / 36
1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage
More informationChapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationSAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationCloud Computing: Meet the Players. Performance Analysis of Cloud Providers
BASEL UNIVERSITY COMPUTER SCIENCE DEPARTMENT Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers Distributed Information Systems (CS341/HS2010) Report based on D.Kassman, T.Kraska,
More informationElastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud
Elastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud Data Warehouse and Business Intelligence Architect Credit Suisse, Zurich Joint research between Credit Suisse and ETH Zurich:
More informationCentralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
More informationStay Tuned for Today s Session! NAVIGATING THE DATABASE UNIVERSE"
Stay Tuned for Today s Session! NAVIGATING THE DATABASE UNIVERSE" Dr. Michael Stonebraker and Scott Jarr! Navigating the Database Universe" A Few Housekeeping Items! Remember to mute your line! Type your
More informationSQL Server 2012 Optimization, Performance Tuning and Troubleshooting
1 SQL Server 2012 Optimization, Performance Tuning and Troubleshooting 5 Days (SQ-OPT2012-301-EN) Description During this five-day intensive course, students will learn the internal architecture of SQL
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationCognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services
Cognos8 Deployment Best Practices for Performance/Scalability Barnaby Cole Practice Lead, Technical Services Agenda > Cognos 8 Architecture Overview > Cognos 8 Components > Load Balancing > Deployment
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationOracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
More informationThe Methodology Behind the Dell SQL Server Advisor Tool
The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity
More informationDaniel J. Adabi. Workshop presentation by Lukas Probst
Daniel J. Adabi Workshop presentation by Lukas Probst 3 characteristics of a cloud computing environment: 1. Compute power is elastic, but only if workload is parallelizable 2. Data is stored at an untrusted
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationCloud Computing - A Database Perspective. Donald Kossmann Systems Group, ETH Zurich http://systems.ethz.ch
Cloud Computing - A Database Perspective Donald Kossmann Systems Group, ETH Zurich http://systems.ethz.ch Agenda Promises of Cloud Computing Benchmarking the State-of-the Art Amazon, Google, Microsoft
More informationWeb Servers Outline. Chris Chin, Gregory Seidman, Denise Tso. March 19, 2001
Web Servers Outline Chris Chin, Gregory Seidman, Denise Tso March 19, 2001 I. Introduction A. What is a web server? 1. is it anything that can be retrieved with an URL? 2. (web service architecture diagram)
More informationSAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here
PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.
More informationInge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
More informationFPGAs for Trusted Cloud Computing
FPGAs for Trusted Cloud Computing Traditional Servers Datacenter Cloud Servers Datacenter Cloud Manager Client Client Control Client Client Control 2 Existing cloud systems cannot offer strong security
More informationSQL Server Performance Tuning and Optimization
3 Riverchase Office Plaza Hoover, Alabama 35244 Phone: 205.989.4944 Fax: 855.317.2187 E-Mail: rwhitney@discoveritt.com Web: www.discoveritt.com SQL Server Performance Tuning and Optimization Course: MS10980A
More informationCS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113
CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113 Instructor: Boris Glavic, Stuart Building 226 C, Phone: 312 567 5205, Email: bglavic@iit.edu Office Hours:
More informationMark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationHigh-Volume Data Warehousing in Centerprise. Product Datasheet
High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationRethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
More informationDistributed Architecture of Oracle Database In-memory
Distributed Architecture of Oracle Database In-memory Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationMigration Scenario: Migrating Batch Processes to the AWS Cloud
Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog
More informationConfiguration and Development
Configuration and Development BENEFITS Enables powerful performance monitoring. SQL Server 2005 equips Microsoft Dynamics GP administrators with automated and enhanced monitoring tools that ensure 24x7
More informationAnalyzing IBM i Performance Metrics
WHITE PAPER Analyzing IBM i Performance Metrics The IBM i operating system is very good at supplying system administrators with built-in tools for security, database management, auditing, and journaling.
More informationENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING
ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING Enzo Unified Extends SQL Server to Simplify Application Design and Reduce ETL Processing CHALLENGES SQL Server does not scale out
More informationConfiguring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Sun Microsystems Trondheim, Norway Agenda Apache Derby introduction Performance and durability Performance tips Open source database
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationScaleArc for SQL Server
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
More informationCREATING SQL SERVER DISASTER RECOVERY SOLUTIONS WITH SIOS DATAKEEPER
CREATING SQL SERVER DISASTER RECOVERY SOLUTIONS WITH SIOS DATAKEEPER Learn how DataKeeper Cluster Edition can be used to create disaster recovery solutions for SQL Server deployments. By Allan Hirt, SQLHA
More informationMicrosoft SQL Server: MS-10980 Performance Tuning and Optimization Digital
coursemonster.com/us Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital View training dates» Overview This course is designed to give the right amount of Internals knowledge and
More information<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information
More informationSQL Server 2008 Performance and Scale
SQL Server 2008 Performance and Scale White Paper Published: February 2008 Updated: July 2008 Summary: Microsoft SQL Server 2008 incorporates the tools and technologies that are necessary to implement
More informationDatabase Scalability and Oracle 12c
Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data marcelle@piction.com Warning I will be covering topics and saying things that will cause a rethink in
More informationEnabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings
Solution Brief Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Introduction Accelerating time to market, increasing IT agility to enable business strategies, and improving
More informationIn-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012
In-Memory Columnar Databases HyPer Arto Kärki University of Helsinki 30.11.2012 1 Introduction Columnar Databases Design Choices Data Clustering and Compression Conclusion 2 Introduction The relational
More informationCAT: Azure SQL DB Premium Deep Dive and Mythbuster
CAT: Azure SQL DB Premium Deep Dive and Mythbuster Ewan Fairweather Senior Program Manager Azure Customer Advisory Team Tobias Ternstrom Principal Program Manager Data Platform Group Cloud & Enterprise
More informationResponse Time Analysis
Response Time Analysis A Pragmatic Approach for Tuning and Optimizing Oracle Database Performance By Dean Richards Confio Software, a member of the SolarWinds family 4772 Walnut Street, Suite 100 Boulder,
More informationPreview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
More informationCluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
More informationDatacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
More informationOracle InMemory Database
Oracle InMemory Database Calgary Oracle Users Group December 11, 2014 Outline Introductions Who is here? Purpose of this presentation Background Why In-Memory What it is How it works Technical mechanics
More information<Insert Picture Here> Adventures in Middleware Database Abuse
Adventures in Middleware Database Abuse Graham Wood Architect, Real World Performance, Server Technologies Real World Performance Real-World Performance Who We Are Part of the Database
More informationEnterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
More informationAugmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence
Augmented Search for IT Data Analytics New frontier in big log data analysis and application intelligence Business white paper May 2015 IT data is a general name to log data, IT metrics, application data,
More informationBigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
More informationDISTRIBUTED AND PARALLELL DATABASE
DISTRIBUTED AND PARALLELL DATABASE SYSTEMS Tore Risch Uppsala Database Laboratory Department of Information Technology Uppsala University Sweden http://user.it.uu.se/~torer PAGE 1 What is a Distributed
More informationOracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationEMC Unified Storage for Microsoft SQL Server 2008
EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information
More informationEnhancing SQL Server Performance
Enhancing SQL Server Performance Bradley Ball, Jason Strate and Roger Wolter In the ever-evolving data world, improving database performance is a constant challenge for administrators. End user satisfaction
More informationOracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationAvailability Digest. MySQL Clusters Go Active/Active. December 2006
the Availability Digest MySQL Clusters Go Active/Active December 2006 Introduction MySQL (www.mysql.com) is without a doubt the most popular open source database in use today. Developed by MySQL AB of
More informationOne-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code
More informationIBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop
IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationWhy Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
More informationPRODUCT OVERVIEW SUITE DEALS. Combine our award-winning products for complete performance monitoring and optimization, and cost effective solutions.
Creating innovative software to optimize computing performance PRODUCT OVERVIEW Performance Monitoring and Tuning Server Job Schedule and Alert Management SQL Query Optimization Made Easy SQL Server Index
More informationOracle Database In-Memory A Practical Solution
Oracle Database In-Memory A Practical Solution Sreekanth Chintala Oracle Enterprise Architect Dan Huls Sr. Technical Director, AT&T WiFi CON3087 Moscone South 307 Safe Harbor Statement The following is
More informationHardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls
Hardware Performance Optimization and Tuning Presenter: Tom Arakelian Assistant: Guy Ingalls Agenda Server Performance Server Reliability Why we need Performance Monitoring How to optimize server performance
More informationX4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
More informationHow To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationScaleArc idb Solution for SQL Server Deployments
ScaleArc idb Solution for SQL Server Deployments Objective This technology white paper describes the ScaleArc idb solution and outlines the benefits of scaling, load balancing, caching, SQL instrumentation
More informationBuilding Scalable Applications Using Microsoft Technologies
Building Scalable Applications Using Microsoft Technologies Padma Krishnan Senior Manager Introduction CIOs lay great emphasis on application scalability and performance and rightly so. As business grows,
More informationCognos Performance Troubleshooting
Cognos Performance Troubleshooting Presenters James Salmon Marketing Manager James.Salmon@budgetingsolutions.co.uk Andy Ellis Senior BI Consultant Andy.Ellis@budgetingsolutions.co.uk Want to ask a question?
More informationConfiguring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture
More informationVirtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
More informationOutdated Architectures Are Holding Back the Cloud
Outdated Architectures Are Holding Back the Cloud Flash Memory Summit Open Tutorial on Flash and Cloud Computing August 11,2011 Dr John R Busch Founder and CTO Schooner Information Technology JohnBusch@SchoonerInfoTechcom
More informationHarnessing the Power of the Microsoft Cloud for Deep Data Analytics
1 Harnessing the Power of the Microsoft Cloud for Deep Data Analytics Today's Focus How you can operate your business more efficiently and effectively by tapping into Cloud based data analytics solutions
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More information