Big Data. Without a Big Database. Kate Matsudaira
|
|
- Daniel Rogers
- 8 years ago
- Views:
Transcription
1 Big Data Without a Big Database Kate Matsudaira
2 Two kinds of data nicknames examples: created/modified by: sensitivity to staleness: user, transactional user accounts shopping cart/orders user messages users high reference, nontransactional product/offer catalogs catalogs static geolocation data dictionaries business (you) low plan for growth: hard easy access optimization: read/write mostly read
3 Two kinds of data nicknames examples: created/modified by: sensitivity to staleness: user, transactional user accounts shopping cart/orders user messages users high reference, nontransactional product/offer catalogs catalogs static geolocation data dictionaries business (you) low plan for growth: hard easy access optimization: read/write mostly read
4
5 reference data
6 reference data user data
7
8 reference data
9 reference data user data
10 Performance Reminder main memory read ms (100 ns) network round trip 0.5 ms (500,000 ns) disk seek 10 ms (10,000,000 ns) source:
11
12
13
14 The Beginning load balancer webapp webapp load balancer BIG DATABASE data loader
15 The Beginning availability problems load balancer webapp webapp load balancer BIG DATABASE data loader
16 The Beginning availability problems performance problems load balancer webapp webapp load balancer BIG DATABASE data loader
17 The Beginning availability problems performance problems load balancer webapp webapp load balancer BIG DATABASE data loader scalability problems
18 Replication load balancer webapp webapp load balancer BIG DATABASE data loader
19 Replication REPLICA load balancer webapp webapp load balancer BIG DATABASE data loader
20 Replication REPLICA load balancer webapp webapp load balancer BIG DATABASE data loader scalability problems
21 Replication REPLICA load balancer webapp webapp load balancer BIG DATABASE data loader scalability problems performance problems
22 operational overhead Replication REPLICA load balancer webapp webapp load balancer BIG DATABASE data loader scalability problems performance problems
23 Local Caching REPLICA load balancer webapp load balancer BIG DATABASE webapp data loader
24 Local Caching REPLICA cache load balancer webapp load balancer cache BIG DATABASE webapp cache data loader
25 Local Caching scalability problems REPLICA cache load balancer webapp load balancer cache BIG DATABASE webapp cache data loader
26 Local Caching scalability problems operational overhead REPLICA cache load balancer webapp load balancer cache BIG DATABASE webapp cache data loader
27 Local Caching scalability problems operational overhead REPLICA cache load balancer webapp load balancer cache BIG DATABASE webapp cache data loader performance problems
28 Local Caching scalability problems operational overhead REPLICA cache load balancer webapp load balancer cache BIG DATABASE webapp cache consistency problems data loader performance problems
29 Local Caching scalability problems operational overhead REPLICA cache load balancer webapp load balancer cache BIG DATABASE webapp cache long tail performance problems consistency problems data loader performance problems
30 80% of requests query 10% of entries (head) The Long Tail Problem 20% of requests query remaining 90% of entries (tail)
31 Big Cache replica load balancer webapp webapp load balancer BIG CACHE database data loader preload
32 Big Cache operational overhead replica load balancer webapp webapp load balancer BIG CACHE database data loader preload
33 Big Cache operational overhead performance problems replica load balancer webapp webapp load balancer BIG CACHE database data loader preload
34 Big Cache operational overhead performance problems replica load balancer webapp webapp load balancer BIG CACHE database data loader preload scalability problems
35 Big Cache operational overhead performance problems replica load balancer webapp webapp load balancer BIG CACHE database data loader consistency problems preload scalability problems
36 Big Cache long tail performance problems operational overhead performance problems replica load balancer webapp webapp load balancer BIG CACHE database data loader consistency problems preload scalability problems
37 Big Cache long tail performance problems operational overhead performance problems replica load balancer webapp webapp load balancer BIG CACHE database data loader consistency problems preload scalability problems
38 Big Cache Technologies
39 memcached(b) Big Cache Technologies
40 Big Cache Technologies memcached(b) Do I look like I need a cache? ElastiCache (AWS)
41 Big Cache Technologies memcached(b) Do I look like I need a cache? ElastiCache (AWS) Oracle Coherence
42
43 Targeted generic data/ use cases.
44 Dynamically assign keys to the nodes Targeted generic data/ use cases.
45 Dynamically assign keys to the nodes Targeted generic data/ use cases. Scales horizontally
46 Dynamically assign keys to the nodes Targeted generic data/ use cases. Dynamically rebalances data Scales horizontally
47 Dynamically assign keys to the nodes Targeted generic data/ use cases. Dynamically rebalances data Scales horizontally Poor performance on cold starts
48 Dynamically assign keys to the nodes Targeted generic data/ use cases. Dynamically rebalances data Scales horizontally No assumptions about loading/ updating data Poor performance on cold starts
49 Big Cache Technologies Additional hardware Additional configuration Additional monitoring Extra network hop Slow scanning Additional deserialization
50 Big Cache Technologies Additional hardware Additional configuration Additional monitoring }operational overhead Extra network hop Slow scanning Additional deserialization
51 Big Cache Technologies Additional hardware Additional configuration Additional monitoring } operational overhead Extra network hop Slow scanning Additional deserialization }performance
52 NoSQL to The Rescue? NoSQL Replica load balancer webapp webapp load balancer NoSQL Database data loader
53 NoSQL to The Rescue? NoSQL Replica load balancer webapp webapp load balancer NoSQL Database data loader some performance problems
54 NoSQL to The Rescue? NoSQL Replica load balancer webapp webapp load balancer NoSQL Database data loader some performance problems some scalability problems
55 NoSQL to some operational overhead The Rescue? NoSQL Replica load balancer webapp webapp load balancer NoSQL Database data loader some performance problems some scalability problems
56 Remote Store Retrieval Latency network remote store network client
57 Remote Store Retrieval Latency network remote store network client TCP request: 0.5 ms Lookup/write response: 0.5 ms TCP response: 0.5 ms read/parse response: 0.25 ms
58 Remote Store Retrieval Latency network remote store network client TCP request: 0.5 ms Lookup/write response: 0.5 ms TCP response: 0.5 ms read/parse response: 0.25 ms Total time to retrieve single value: 1.75 ms
59 Total time to retrieve A single value from remote store: 1.75 ms from memory: ms (10 main memory reads)
60 Total time to retrieve A single value from remote store: 1.75 ms from memory: ms (10 main memory reads) Sequential access of 1 million random keys from remote store: 30 minutes from memory: 1 second
61 The Truth About Databases
62 What I'm going to call as the hot data cliff: As the size of your hot data set (data frequently read at sustained rates above disk I/O capacity) approaches available memory, write operation bursts that exceeds disk write I/O capacity can create a trashing death spiral where hot disk pages that MongoDB desperately needs are evicted from disk cache by the OS as it consumes more buffer space to hold the writes in memory. MongoDB Source:
63 Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can't be larger than memory. Redis source:
64 They are fast if everything fits into memory.
65
66 Can you keep it in memory yourself? full cache data loader load balancer webapp webapp load balancer full cache full cache data loader data loader BIG DATABASE
67 Can you keep it in memory yourself? operational relief full cache data loader load balancer webapp webapp load balancer full cache full cache data loader data loader BIG DATABASE
68 Can you keep it in memory yourself? operational relief full cache data loader load balancer webapp webapp load balancer full cache full cache data loader data loader BIG DATABASE scales infinitely
69 Can you keep it in memory yourself? operational relief full cache data loader load balancer webapp webapp load balancer full cache full cache data loader data loader BIG DATABASE performance gain scales infinitely
70 Can you keep it in memory yourself? operational relief full cache data loader load balancer webapp webapp load balancer full cache full cache data loader data loader BIG DATABASE consistency problems performance gain scales infinitely
71 Fixing Consistency deployment cell webapp full cache data loader load balancer webapp full cache data loader webapp full cache data loader
72 Fixing Consistency 1. Deployment Cells 2. Sticky user sessions deployment cell webapp full cache data loader load balancer webapp full cache data loader webapp full cache data loader
73 How do you fit all of that data into memory? credit:
74 "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.! We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.! Donald Knuth
75 How do you fit all that data in memory?
76 The Answer
77 Domain Model Design Domain Layer (or Model Layer):! Responsible for representing concepts of the business, information about the business situation, and business rules. State that reflects the business situation is controlled and used here, even though the technical details of storing it are delegated to the infrastructure. This layer is the heart of business software.! Eric Evans, Domain-Driven Design,
78 Domain Model Design Guidelines
79 Domain Model Design Guidelines #1 Keep it immutable
80 Domain Model Design Guidelines #1 Keep it immutable #2 Use independent hierarchies
81 Domain Model Design Guidelines #1 Keep it immutable #3 Optimize Data #2 Use independent hierarchies Help! I am in the trunk!
82 intern()your immutables K1 V1 K1 V1 A A K2 V2 B C F D E E K2 V2 B C D E B C D E F
83 private final Map<Class<?>, Map<Object, WeakReference<Object>>> cache = new ConcurrentHashMap<Class<?>, Map<Object, WeakReference<Object>>>();! public <T> T intern(t o) { if (o == null) return null; Class<?> c = o.getclass(); Map<Object, WeakReference<Object>> m = cache.get(c); if (m == null) cache.put(c, m = synchronizedmap(new WeakHashMap<Object, WeakReference<Object>>())); WeakReference<Object> r = T v = (r == null)? null : (T) r.get(); if (v == null) { v = o; m.put(v, new WeakReference<Object>(v)); } return v; }
84 Use Independent Hierarchies Product id = title= Product Summary productid = Offers productid = Offers Specifications Description Specifications productid = Reviews productid = Product Info Reviews Rumors Model History Model History productid = Description productid = Rumors productid =
85 Collection Optimization
86 Leverage Primitive Keys/Values collection with 10,000 elements [0.. 9,999] java.util.arraylist<integer> java.util.hashset<integer> gnu.trove.list.array.tintarraylist size in memory 200K 546K 40K gnu.trove.set.hash.tinthashset 102K Trove ( High Performance Collections for Java )
87 Optimize small immutable collections Collections with small number of entries (up to ~20): class ImmutableMap<K, V> implements Map<K,V>, Serializable {... }! class MapN<K, V> extends ImmutableMap<K, V> { final K k1, k2,..., kn; final V v1, v2,..., public boolean containskey(object key) { if (eq(key, k1)) return true; if (eq(key, k2)) return true;... return false; }...
88 java.util.hashmap:! Space Savings 128 bytes + 32 bytes per entry!! compact immutable map:! 24 bytes + 8 bytes per entry
89 Numeric Data Optimization
90 Price History Example
91 Example: Price History Problem: Store daily prices for 1M products, 2 offers per product Average price history length per product ~2 years! Total price points: (1M + 2M) * 730 = ~2 billion
92 Price History First attempt TreeMap<Date, Double>! 88 bytes per entry * 2 billion = ~180 GB
93 Typical Shopping Price History price $ days
94 Typical Shopping Price History price $ days
95 Typical Shopping Price History price $ days
96 Run Length Encoding a a a a a a b b b c c c c c c 6 a 3 b 6 c
97 Price History! positive: price (adjusted to scale) negative: run length (precedes price) zero: unavailable Optimization Drop pennies Store prices in primitive short (use scale factor to represent prices greater than Short.MAX_VALUE) Memory: 15 * (array) + 24 (start date) + 4 (scale factor) = 74 bytes
98 Reduction compared to TreeMap<Date, Double>: Space Savings 155 times Estimated memory for 2 billion price points: 1.2 GB
99 Reduction compared to TreeMap<Date, Double>: Space Savings 155 times Estimated memory for 2 billion price points: 1.2 GB << 180 GB
100 public class PriceHistory {! private final Date startdate; // or use org.joda.time.localdate private final short[] encoded; private final int scalefactor;! public PriceHistory(SortedMap<Date, Double> prices) { } // encode public SortedMap<Date, Double> getpricesbydate() { } // decode! Price History Model public Date getstartdate() { return startdate; } // Below computations implemented directly against encoded data public Date getenddate() { } public Double getminprice() { } public int getnumchanges(double minchangeamt, double minchangepct, boolean abs) { } public PriceHistory trim(date startdate, Date enddate) { } public PriceHistory interpolate() { }
101 Know Your Data
102 Compress text
103 String Compression: byte arrays Use the minimum character set encoding static Charset UTF8 = Charset.forName("UTF- 8");! String s = "The quick brown fox jumps over the lazy dog ; // 42 chars, 136 bytes byte[] b a= "The quick brown fox jumps over the lazy dog.getbytes(utf8); // 64 bytes String s1 = Hello ; // 5 chars, 64 bytes byte[] b1 = Hello.getBytes(UTF8); // 24 bytes! byte[] tobytes(string s) { return s == null? null : s.getbytes(utf8); } String tostring(byte[] b) { return b == null? null : new String(b, UTF8); }
104 String Compression: shared prefix Great for URLs public class PrefixedString { private PrefixedString prefix; private byte[] public int hashcode() { public boolean equals(object o) { } }
105 String Compression: short alphanumeric case-insensitive strings public abstract class AlphaNumericString { public static AlphaNumericString make(string s) { try { return new Numeric(Long.parseLong(s, Character.MAX_RADIX)); } catch (NumberFormatException e) { return new Alpha(s.getBytes(UTF8)); } } protected abstract String public String tostring() {return value(); } private static class Numeric extends AlphaNumericString { long value; Numeric(long value) { this.value = value; protected String value() { return Long.toString(value, Character.MAX_RADIX); public int hashcode() { public boolean equals(object o) { } } private static class Alpha extends AlphaNumericString { byte[] value; Alpha(byte[] value) {this.value = value; protected String value() { return new String(value, UTF8); public int hashcode() { public boolean equals(object o) { } } }
106 String Compression: Large Strings Image source:
107 Gzip Become the master of your strings! String Compression: Large Strings Image source:
108 bzip2 Gzip Become the master of your strings! String Compression: Large Strings Image source:
109 bzip2 Just convert to byte[] first, then compress Gzip Become the master of your strings! String Compression: Large Strings Image source:
110 JVM Tuning
111 JVM Tuning Image srouce:
112 make sure to use compressed pointers (-XX:+UseCompressedOops) JVM Tuning Image srouce:
113 make sure to use compressed pointers (-XX:+UseCompressedOops) use low pause GC (Concurrent Mark Sweep, G1) JVM Tuning This s#!% is heavy! Image srouce:
114 make sure to use compressed pointers (-XX:+UseCompressedOops) use low pause GC (Concurrent Mark Sweep, G1) JVM Tuning Adjust generation sizes/ratios Overprovision heap by ~30% This s#!% is heavy! Image srouce:
115 JVM Tuning
116 JVM Tuning
117 JVM Tuning Print garbage collection
118 JVM Tuning Print garbage collection If GC pauses still prohibitive then consider partitioning
119 In summary
120 Know your Business
121 Know your data
122 Know when to optimize
123 The End! My company: My website: And much of the credit for this talk goes to Leon Stein for developing the technology. Thank you, Leon.
124 How do you load the data?
125 Cache Loading webapp full cache data loader a d b al a n c er webapp full cache data loader reliable file store (S3) webapp full cache data loader cooked datasets
126 Cache loading tips & tricks
127 Final datasets should be compressed and stored (i.e. S3) Cache loading tips & tricks
128 Final datasets should be compressed and stored (i.e. S3) Keep the format simple (CSV, JSON) Help! I am in the trunk! Cache loading tips & tricks
129 Final datasets should be compressed and stored (i.e. S3) Keep the format simple (CSV, JSON) Poll for updates Poll frequency == data inconsistency threshold Help! I am in the trunk! Cache loading tips & tricks
130 Cache Loading Time Sensitivity
131 Cache Loading: low time Sensitivity Data /tax-rates /date= tax-rates csv.gz /date= tax-rates csv.gz /date= tax-rates csv.gz
132 /prices /full Cache Loading: medium/high time Sensitivity /date= price-obs csv.gz /date= /inc /date= T csv.gz T csv.gz
133 Cache Loading Strategy Swap Image src:
134 Cache is immutable, so no locking is required Cache Loading Strategy Swap Image src:
135 Cache is immutable, so no locking is required meow Cache Loading Strategy Swap Works well for infrequently updated data sets And for datasets that need to be refreshed each update Image src:
136 Cache Loading Strategy: CRUD
137 Deletions can be tricky YARRRRRR! Cache Loading Strategy: CRUD
138 Avoid full synchronization Deletions can be tricky YARRRRRR! Cache Loading Strategy: CRUD
139 Avoid full synchronization YARRRRRR! Deletions can be tricky Consider loading cache in small batches. Use one container per partition Cache Loading Strategy: CRUD
140 Concurrent Locking with Trove Map public class LongCache<V> { private TLongObjectMap<V> map = new TLongObjectHashMap<V>(); private ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); private Lock r = lock.readlock(), w = lock.writelock(); public V get(long k) { r.lock(); try { return map.get(k); } finally { r.unlock(); } } public V update(long k, V v) { w.lock(); try { return map.put(k, v); } finally { w.unlock(); } } public V remove(long k) { w.lock(); try { return map.remove(k); } finally { w.unlock(); } } }
141 Cache loading optimizations
142 I am cooking the data sets. Ha! Periodically generate serialized data/state Keep local copies Cache loading optimizations
143 I am cooking the data sets. Ha! Periodically generate serialized data/state Validate with CRC or hash Keep local copies Cache loading optimizations
144 Dependent Caches instance product summary load balancer health check status aggregator (servlet) offers dependencies matching predictions
145 Dependent Caches instance product summary load balancer health check status aggregator (servlet) offers dependencies matching predictions
146 Dependent Caches instance product summary load balancer health check status aggregator (servlet) offers dependencies matching predictions
147 Dependent Caches instance product summary load balancer health check status aggregator (servlet) offers dependencies matching predictions
148 Dependent Caches instance product summary load balancer health check status aggregator (servlet) offers dependencies matching predictions
149 Deployment Cell Status deployment cell webapp status aggregator load balancer health check cell status aggregator HTTP or JMX 1 status aggregator 2 status aggregator
150 Hierarchical Status Aggregation
TWO KINDS OF DATA THIS TALK IS ABOUT REFERENCE DATA. nicknames user, transactional reference, non-transactional
LEON STEIN current: chief architect, @decide.com before that: farecast.com (now http://www.bing.com/travel) before that: amazon before that: at&t wireless TWO KINDS OF DATA nicknames user, transactional
More informationMID-TIER DEPLOYMENT KB
MID-TIER DEPLOYMENT KB Author: BMC Software, Inc. Date: 23 Dec 2011 PAGE 1 OF 16 23/12/2011 Table of Contents 1. Overview 3 2. Sizing guidelines 3 3. Virtual Environment Notes 4 4. Physical Environment
More informationHigh Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading
High Frequency Trading and NoSQL Peter Lawrey CEO, Principal Consultant Higher Frequency Trading Agenda Who are we? Brief introduction to OpenHFT. What does a typical trading system look like What requirements
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationKey Components of WAN Optimization Controller Functionality
Key Components of WAN Optimization Controller Functionality Introduction and Goals One of the key challenges facing IT organizations relative to application and service delivery is ensuring that the applications
More informationTushar Joshi Turtle Networks Ltd
MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering
More informationLiferay Performance Tuning
Liferay Performance Tuning Tips, tricks, and best practices Michael C. Han Liferay, INC A Survey Why? Considering using Liferay, curious about performance. Currently implementing and thinking ahead. Running
More informationCognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services
Cognos8 Deployment Best Practices for Performance/Scalability Barnaby Cole Practice Lead, Technical Services Agenda > Cognos 8 Architecture Overview > Cognos 8 Components > Load Balancing > Deployment
More informationTomcat Tuning. Mark Thomas April 2009
Tomcat Tuning Mark Thomas April 2009 Who am I? Apache Tomcat committer Resolved 1,500+ Tomcat bugs Apache Tomcat PMC member Member of the Apache Software Foundation Member of the ASF security committee
More informationEvidence based performance tuning of
Evidence based performance tuning of enterprise Java applications By Jeroen Borgers jborgers@xebia.com Evidence based performance tuning of enterprise Java applications By Jeroen Borgers jborgers@xebia.com
More informationRaima Database Manager Version 14.0 In-memory Database Engine
+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized
More informationPerformance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.
Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationGround up Introduction to In-Memory Data (Grids)
Ground up Introduction to In-Memory Data (Grids) QCON 2015 NEW YORK, NY 2014 Hazelcast Inc. Why you here? 2014 Hazelcast Inc. Java Developer on a quest for scalability frameworks Architect on low-latency
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationWisdom from Crowds of Machines
Wisdom from Crowds of Machines Analytics and Big Data Summit September 19, 2013 Chetan Conikee Irfan Ahmad About Us CloudPhysics' mission is to discover the underlying principles that govern systems behavior
More informationSAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
More informationHow To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
More informationRakam: Distributed Analytics API
Rakam: Distributed Analytics API Burak Emre Kabakcı May 30, 2014 Abstract Today, most of the big data applications needs to compute data in real-time since the Internet develops quite fast and the users
More informationLow level Java programming With examples from OpenHFT
Low level Java programming With examples from OpenHFT Peter Lawrey CEO and Principal Consultant Higher Frequency Trading. Presentation to Joker 2014 St Petersburg, October 2014. About Us Higher Frequency
More informationFAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara
CS535 Big Data - Fall 2015 W1.B.1 CS535 Big Data - Fall 2015 W1.B.2 CS535 BIG DATA FAQs Wait list Term project topics PART 0. INTRODUCTION 2. A PARADIGM FOR BIG DATA Sangmi Lee Pallickara Computer Science,
More informationIntegrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
More informationPractical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationHADOOP PERFORMANCE TUNING
PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The
More informationAbout me. Copyright 2015, Oracle and/or its affiliates. All rights reserved.
GOTO Chicago RETURN Cameron Purdy Senior Vice President Oracle About me The last time I spoke at a non-oracle event was QConin June 2012 I used to have fun making up Top 10 lists that would poke fun at
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationConfiguring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture
More informationAn Oracle White Paper September 2013. Advanced Java Diagnostics and Monitoring Without Performance Overhead
An Oracle White Paper September 2013 Advanced Java Diagnostics and Monitoring Without Performance Overhead Introduction... 1 Non-Intrusive Profiling and Diagnostics... 2 JMX Console... 2 Java Flight Recorder...
More informationQuantifind s story: Building custom interactive data analytics infrastructure
Quantifind s story: Building custom interactive data analytics infrastructure Ryan LeCompte @ryanlecompte Scala Days 2015 Background Software Engineer at Quantifind @ryanlecompte ryan@quantifind.com http://github.com/ryanlecompte
More informationThe Classical Architecture. Storage 1 / 36
1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage
More informationCloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
More informationChapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
More informationScaling Progress OpenEdge Appservers. Syed Irfan Pasha Principal QA Engineer Progress Software
Scaling Progress OpenEdge Appservers Syed Irfan Pasha Principal QA Engineer Progress Software Michael Jackson Dies and Twitter Fries Twitter s Fail Whale 3 Twitter s Scalability Problem Takeaways from
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationSSD Performance Tips: Avoid The Write Cliff
ebook 100% KBs/sec 12% GBs Written SSD Performance Tips: Avoid The Write Cliff An Inexpensive and Highly Effective Method to Keep SSD Performance at 100% Through Content Locality Caching Share this ebook
More informationEffective Java Programming. measurement as the basis
Effective Java Programming measurement as the basis Structure measurement as the basis benchmarking micro macro profiling why you should do this? profiling tools Motto "We should forget about small efficiencies,
More informationLecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
More informationA Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
More informationSocial Networks and the Richness of Data
Social Networks and the Richness of Data Getting distributed Webservices Done with NoSQL Fabrizio Schmidt, Lars George VZnet Netzwerke Ltd. Content Unique Challenges System Evolution Architecture Activity
More informationJava Interview Questions and Answers
1. What is the most important feature of Java? Java is a platform independent language. 2. What do you mean by platform independence? Platform independence means that we can write and compile the java
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationJava 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner
1 vk» Java 7 Recipes (,\['«** - < g!p#«josh Juneau Carl Dea Freddy Guime John O'Conner Contents J Contents at a Glance About the Authors About the Technical Reviewers Acknowledgments Introduction iv xvi
More informationPerformance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
More informationGoogle File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationDistributed storage for structured data
Distributed storage for structured data Dennis Kafura CS5204 Operating Systems 1 Overview Goals scalability petabytes of data thousands of machines applicability to Google applications Google Analytics
More informationJBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing
JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing January 2014 Legal Notices JBoss, Red Hat and their respective logos are trademarks or registered trademarks of Red Hat, Inc. Azul
More informationScaling Database Performance in Azure
Scaling Database Performance in Azure Results of Microsoft-funded Testing Q1 2015 2015 2014 ScaleArc. All Rights Reserved. 1 Test Goals and Background Info Test Goals and Setup Test goals Microsoft commissioned
More informationScalability of web applications. CSCI 470: Web Science Keith Vertanen
Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches
More informationHDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
More information1. Comments on reviews a. Need to avoid just summarizing web page asks you for:
1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
More informationTransactions and ACID in MongoDB
Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently
More informationA Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationWhite Paper. Optimizing the Performance Of MySQL Cluster
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
More information<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information
More informationZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG
ZingMe Practice For Building Scalable PHP Website By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG Agenda About ZingMe Scaling PHP application Scalability definition Scaling up vs
More informationOn- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
More informationEnterprise Manager Performance Tips
Enterprise Manager Performance Tips + The tips below are related to common situations customers experience when their Enterprise Manager(s) are not performing consistent with performance goals. If you
More informationWebSphere Architect (Performance and Monitoring) 2011 IBM Corporation
Track Name: Application Infrastructure Topic : WebSphere Application Server Top 10 Performance Tuning Recommendations. Presenter Name : Vishal A Charegaonkar WebSphere Architect (Performance and Monitoring)
More informationRunning a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationResource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
More informationCache Configuration Reference
Sitecore CMS 6.2 Cache Configuration Reference Rev: 2009-11-20 Sitecore CMS 6.2 Cache Configuration Reference Tips and Techniques for Administrators and Developers Table of Contents Chapter 1 Introduction...
More informationTips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier
Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier Simon Law TimesTen Product Manager, Oracle Meet The Experts: Andy Yao TimesTen Product Manager, Oracle Gagan Singh Senior
More informationIn-Memory Performance for Big Data
In-Memory Performance for Big Data Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi Kuno, Joseph Tucek, Mark Lillibridge, Alistair Veitch VLDB 2014, presented by Nick R. Katsipoulakis A Preliminary Experiment
More informationApplication Performance in the Cloud
Application Performance in the Cloud Understanding and ensuring application performance in highly elastic environments Albert Mavashev, CTO Nastel Technologies, Inc. amavashev@nastel.com What is Cloud?
More informationDesign and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen
Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and
More informationBigMemory & Hybris : Working together to improve the e-commerce customer experience
& Hybris : Working together to improve the e-commerce customer experience TABLE OF CONTENTS 1 Introduction 1 Why in-memory? 2 Why is in-memory Important for an e-commerce environment? 2 Why? 3 How does
More informationStorage in Database Systems. CMPSCI 445 Fall 2010
Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query
More informationSimple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1)
Naming vs. Locating Entities Till now: resources with fixed locations (hierarchical, caching,...) Problem: some entity may change its location frequently Simple solution: record aliases for the new address
More informationMySQL Storage Engines
MySQL Storage Engines Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels
More informationCS514: Intermediate Course in Computer Systems
: Intermediate Course in Computer Systems Lecture 7: Sept. 19, 2003 Load Balancing Options Sources Lots of graphics and product description courtesy F5 website (www.f5.com) I believe F5 is market leader
More informationBigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
More informationUsing Object Database db4o as Storage Provider in Voldemort
Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how
More informationUnit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3
Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM
More informationVirtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
More informationOracle WebLogic Server 11g Administration
Oracle WebLogic Server 11g Administration This course is designed to provide instruction and hands-on practice in installing and configuring Oracle WebLogic Server 11g. These tasks include starting and
More informationChapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design
Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database
More informationPTC System Monitor Solution Training
PTC System Monitor Solution Training Patrick Kulenkamp June 2012 Agenda What is PTC System Monitor (PSM)? How does it work? Terminology PSM Configuration The PTC Integrity Implementation Drilling Down
More informationSMALL INDEX LARGE INDEX (SILT)
Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song
More informationCHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS
CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS Java EE Components Java EE Vendor Specifications Containers Java EE Blueprint Services JDBC Data Sources Java Naming and Directory Interface Java Message
More informationMS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
More informationCloud Operating Systems for Servers
Cloud Operating Systems for Servers Mike Day Distinguished Engineer, Virtualization and Linux August 20, 2014 mdday@us.ibm.com 1 What Makes a Good Cloud Operating System?! Consumes Few Resources! Fast
More informationCourse Description. Course Audience. Course Outline. Course Page - Page 1 of 5
Course Page - Page 1 of 5 WebSphere Application Server 7.0 Administration on Windows BSP-1700 Length: 5 days Price: $ 2,895.00 Course Description This course teaches the basics of the administration and
More informationMongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
More informationOriginal-page small file oriented EXT3 file storage system
Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn
More informationSDFS Overview. By Sam Silverberg
SDFS Overview By Sam Silverberg Why did I do this? I had an Idea that I needed to see if it worked. Design Goals Create a dedup file system capable of effective inline deduplication for Virtual Machines
More informationJava DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860
Java DB Performance Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860 AGENDA > Java DB introduction > Configuring Java DB for performance > Programming tips > Understanding Java DB performance
More informationOracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
More informationAgenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
More informationIn-Memory Databases MemSQL
IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More information