Performance of LibRe protocol inside Cassandra workflow
|
|
|
- Britney Weaver
- 10 years ago
- Views:
Transcription
1 Performance of LibRe protocol inside Cassandra workflow January, 2015 Paris From Sathiya Prabhu Kumar Ph.D student, CNAM Under the supervision of Raja Chiky, Sylvain Lefebvre LISITE - ISEP 28 Rue Notre Dame des Champs Paris, France. Eric Gressier Soudan CEDRIC - CNAM 292 Rue Saint-Martin Paris, France. 1
2 Contents 1 Introduction 3 2 LibRe Algorithm Description Update Operation Read Operation CaLibRe: Cassandra with LibRe LibRe implementation inside Cassandra YCSB: Yahoo Cloud Services Benchmark Evaluating Data Consistency using YCSB CaLibRe performance evaluation using YCSB Test Setup Test Evaluation Conclusion 9 6 Future Works 9 2
3 1 Introduction The stronger consistency options offered by most of the modern storage systems aka NoSql databases [10] can take one of the two possible variants. First one is reading or writing to all the replicas: Consistencylevel ALL. Second is reading and writing to a majority of replicas: Consistency-level QUORUM. Some of the limitations of these models include extra cost on request latency, high number of messages and limits the system availability if one or few of the data replicas are unable to contact. In large distributed storage systems, one or few of the nodes may be down at some point in time. Hence, in case of reading or writing to all the replicas, there is always a threat on system availability for a set of data. Most of the modern storage systems are designed to be write optimized, which are meant to be dominant for writes, letting the system to always succeed the writes and as fast as possible. For example, in Cassandra, consistency option ANY and HintedHandoff confirm this behavior [7]. So in case of system that needs higher write possibility and minimum read guarantee, the desired consistency level is ONE [7]. If a data is written with consistency level ONE, the only mode to preserve consistency of this data during read time is reading from all the replicas. But as mentioned earlier, this could be a threat for the system availability. The consistency option QUORUM [7] can tolerate if few replicas are down or couldn t be contacted, but the pass condition for the writes goes stronger with a considerable cost on both read and write operations. The key performance of the modern storage systems rely on handling most of the data in-memory. The system is designed in a way to cache most of the recent data in order to handle the read requests faster. In case of forwarding a request to all or majority of replicas to retrieve a data, the possibility that all replicas get the right data in their cache would be improbable. Hence, the request latency will always be elevated by the slowest replica responding the request. Moreover, since stronger consistency options always come with a cost on request latency, applying stronger consistency during a peak point where the number of users trying to access the same data are high could exacerbate the system. Another probable problem exposed by these consistency options is on the system elasticity. In the recent growth of cloud computing, the need for scalability evolved to elasticity. Scalability is the ability of the platform or architecture to handle the increasing system demand by adding additional resources. Elasticity is the system tactics to handle the varying system demand efficiently with the available system resources based on the current situation in an automatic way. Scalability is a long-term resolution, whereas Elasticity is a short-term resolution. In the recent trend of data computing, to achieve elasticity, the replica count of a data needs to be altered dynamically as in case of CDNs (Content Delivery Networks) [17]. Holding varied number of replicas instead of static replica number for a data enables elasticity, allowing the system to provision and de-provision the resources automatically. In case of varying number of replicas, the stronger consistency options: Consistency-level ALL or QUORUM might be inefficient as the replicas set for the given data is continuously changing. 2 LibRe Analyzing the pros and cons of existing consistency options, we propose a new consistency protocol named LibRe: Library for Replications. The main goal of the policy is to achieve stronger consistency while reading from ONE replica instead of ALL or a QUORUM of replicas irrespective of the number of replicas disconnected during reads and/or writes. From the insight of the Eventual Consistency systems, at least one of the replicas contain the recent version of the needed data at any point in time. Thus, forwarding the read requests to the right node: the node that contains the recent version of the data or in other words, by restrict forwarding the read requests to the Stale Node: the node that is not updated, the consistency of the data would be preserved. Hence, the design considerations of LibRe are: Ensure Read requests are forwarded to the node holding the recent version of the needed data. Ensure system Availability and Latency constraints are not violated. One of the main challenges of the model is How to claim a node is stale?. A stale node is the one where the recent update for the needed data is not yet applied. We can t assume a node is stale if it contains one or few stale data among huge datasets. The node is stale only for a particular data items, but still consistent for other data items. Hence a registry is needed to monitor each write and update operations and to provide information in accordance to each data item. So that, the same node can be classified to be stale for particular data items and consistent for remaining data items. For example, let s 3
4 assume a topology where servers are connected to a common high-speed network. If a particular node in the topology is down or separated from the network on any case for a period of few minutes, the node will be inconsistent only for the operations that happened during this period. However, the node will be consistent for the remaining data items. Hence, the core idea of our approach is to identify the messages lost by a node. In other words, the model tries to identify the updates missed by a particular node. It enables the system to stop forwarding the requests to the stale node until it is consistent again. A node is considered to be stale if it contains stale data for the incoming request. This restriction can be freed once the lost updates are reapplied on the node. Hence, in order to achieve so, during update of specific data items on a node, the node has to make an announcement to LibRe Registry. The announcement should be alike: the particular data with corresponding resource identifier is added/modified in this node. So that, with the Registry of nodes announcements, the Read requests will be forwarded to the node that contains the recent version of the needed data. The next obvious question would be Where to keep the Registry?. An entity that strikes the mind immediately would be the NameNode of the Hadoop Distributed File System (HDFS) [14, 16]. The NameNode of the HDFS that manages add/copy/move/delete of a file in the file system and helps to locate the DataNodes where the needed data lives does more or less what we wanted to do with LibRe. The NameNode of HDFS manages the metadata about the larger file blocks, but in case of LibRe, the Registry has to be maintained for each individual data item, which might be bit tricker than what NameNode actually manages. On the other hand, NameNode of HDFS is a single point of failure and could be the bottleneck for the real time: OLTP requests. The inspired system for the course of LibRe registry is the Zookeeper [8]. The initial implementation of LibRe protocol tried using Zookeeper for managing the registry information. However, for better performance, in the recent version of LibRe protocol, Zookeeper is replaced by Distributed Hash Table (DHT) [18]. With the DHT design, each node manages the registry information about a set of data. 2.1 Algorithm Description Figure 1 shows the position of LibRe in the system architecture. The Frontend is a node in the system where the client connects to. In a Multi-Reader, Muti-Writer architecture, each node in the system can take the role of Frontend. When the frontend receives a read request, the target node to query the request will be retrieved via the LibRe protocol. As showed in the figure 1, the LibRe protocol consists of three components namely Registry, Availability Manager and Advertisement Manager. The Registry is a key-value store in which for a given data-key, the set of replicas containing the recent version of the data is stored. Availability Manager is contacted during read operations, which is responsible for forwarding the read request to an appropriate node that holds the recent version of the data. During Write operations, the replica notifies the Advertisement Manager about the current update. Availability Manager in turn responsible for updating the Registry recognising the recentness of the data. The whole setup is a Distributed Hash Table (DHT) [18]. While retrieving the set of replicas that is responsible for storing the given data-key, one among those replicas holds the three components for that particular data-key. Figure 1: LibRe Architecture Diagram Figure 2a and 2b shows the sequence diagram of a distributed data storage system with the LibRe contribution during the Write and Read requests respectively. From the diagram, when a client issues a Write/Update request, the frontend forwards it to the appropriate replicas based on the chosen consistency 4
5 (a) LibRe Write/Update Operation (b) LibRe Read Operation Figure 2: LibRe Sequence Diagram level. When the Write operation is successfully accomplished by the replica, the replica checks whether the update has to be notified to the LibRe node. If yes, the update will be notified to the Advertisement Manager of the corresponding LibRe node. If not, no action will be taken. During read operations, the frontend checks whether the information about the corresponding data-key is managed by the LibRe. If yes, the frontend forwards the request to the LibRe node. The LibRe node in turn finds an appropriate replica node to query the read request via the Availability Manager and forwards the request to that node. Finally, after querying the needed data, the replica node forwards the read response (result) directly to the client Update Operation Algorithm 1 describes the contribution of LibRe s Advertisement Manager during an update operation. In Update Operation, there are actually two cases: Insert: When a data is written to the storage system for the first time. Update: When an existing data is modified. gossiped from an another replica. The update could be either issued by a client or Algorithm 1 Update Operation 1: function log(datakey, versionid, nodeip ) 2: if Reg.exists(dataKey) then 3: RegV ersionid Reg.getVersionId(dataKey) 4: if versionid = RegV ersionid then 5: replicas Reg.getReplicas (datakey) 6: replicas appendep(replicas, nodeip ) 7: Reg.updateReplicas(dataKey, replicas) 8: else if versionid > RegV ersionid then 9: replicas reinitialize(nodeip ) 10: Reg.updateReplicas(dataKey, replicas) 11: Reg.updateVersionId(versionId) 12: end if 13: else 14: replicas nodeip 15: Reg.createEntry(dataKey, replicas) 16: Reg.updateVersionId(versionId) 17: end if 18: end function When a replica node sends an advertisement message regarding an update, the Availability Manager of the LibRe node follows the following actions. First the protocol checks whether the data-key already exists in the Registry: line 2. If the data-key exists in the Registry (update operation), line 3: the version-id logged in the Registry for the respective data-key will be taken. Version-id is nothing but a monotonically increasing number representing the recentness of the update, for instance say: timestamp of the operation. Line 4: If the version-id logged in the Registry matches with the version-id of the operation (gossiped update), then line 7: the node s IP-address will be appended along with the existing replica list. Line 8: If both the version-id s don t match and if the version-id of the operation is greater than the version-id exists in the Registry (which means, the update is new), the line 9-10: replica list for the data-key will be reinitialized to the node s IP-address. Also line 11: the version-id will be updated to 5
6 the version-id of the operation. In other case, line 13: if the data-key doesn t exist in the Registry (write operation), a new entry will be created for the data-key with the node s IP-address and the version-id of the operation (line 14 to 16). This setup helps to achieve the Last Writer Wins policy [13] Read Operation Algorithm 2 describes the LibRe policy during the read operation. According to the algorithm, since the Registry holds information about the replicas holding the recent version of the needed data, the replicas information will be retrieved from the registry (line 2). Then, line 3: a target node will be chosen from the replicas to forward the request: line 6. However, if the Registry doesn t contain the information about the needed data-key, for the favor of system Availability over Consistency, the protocol follows the default system behavior to retrieve a target node. Algorithm 2 LibRe Read 1: function gettargetnode(datakey) 2: replicas Reg.getReplicas(dataKey) 3: targetnode gettargetnode(replicas) 4: if targetnode is NULL then 5: targetnode usedefaultmethod(datakey) 6: end if 7: forwardrequestto(targetnode) 8: end function 3 CaLibRe: Cassandra with LibRe With the encouraging results obtained by simulation, we decided to implement LibRe inside a Distributed Storage System: Cassandra. Cassandra[9] is one of the most popular NoSql systems available open-source and continuously refined and enhanced by a big open-source community. Cassandra offers different consistency options for reading and writing, some of the most used options are Consistency-level ONE, QUORUM and ALL [7]. The reason for choosing Cassandra for studying LibRe performance is its opensource nature and options to switch between different consistency-levels. The motivation is to incorporate the LibRe protocol for accomplishing reads and writes and compare its performance with Cassandra s native consistency options: ONE, QUORUM, ALL. 3.1 LibRe implementation inside Cassandra LibRe protocol is implemented inside Cassandra release version In the native workflow, while querying a data, the endpoints (replicas) addresses are retrieved locally via matching the token number of the data with the token number of the nodes [18, 15, 6]. Each operation, Read/Write is sent to all the endpoints and waits for at least one successful response to succeed the operation if the chosen Consistency-level is ONE. In case of Consistency-level ALL, the successful response is expected from all the endpoints. Whereas Consistency-level QUORUM, expects success response from the majority of the endpoints. While retrieving the set of endpoints addresses via matching the token number of the data against the token number of the nodes, the first endpoint address before sorting it based on the proximity, becomes the LibRe node for that data. A separate thread pool for the LibRe messaging service is designed for handling the LibRe messages effectively. In the LibRe implementation, the write operations follow the native workflow of Cassandra, except the node sends a LibRe-Announcement message to update the LibRe registry at the end of the successful write (if the data has to be managed by LibRe). In order to verify whether the data has to be managed by LibRe or not a Bloom Filter is used [4]. The LibRe- Announcement message contains the metadata about the update such as the data-key associated to the update, IP-address of the node, and the version-id. In the initial implementation of LibRe inside Cassandra, the hash of the modifying value is used as the version-id, although the hash value is not monotonically increasing and couldn t be compared directly to an existing version-id. When the LibRe node receives a LibRe-Announcement message, the LibRe Announcement manager will take control of it and handle the message. The messaging service will un-marshal the needed data and transfer it to the LibRe announcement manager to update the LibRe registry. The registry is updated matching the version-id of the operation with the version-id of the data already registered in the registry as described in the section 2.1.1, except the comparison of the two version id s. In this case, assuming there is no delay 6
7 in the update messages, the version-id that doesn t match with the version-id of the registry is always considered to be the new version. However, in the future implementation, the hash of the modifying value will be replaced by the timestamp of the operation, which would increase monotonically and facilitate the comparison. During read operation, if the data-key (Row-key in Cassandra vocabulary) exists in the Bloom Filter, instead of following the default path in querying the request, the node will send a LibRe-Availability message to the LibRe node. The LibRe-Availability message is prepared as the normal Read Message for the needed data with the data-key on top of the message. When the LibRe node receives a LibRe- Availability message, the LibRe availability manager finds an appropriate replica node to query the data. The data-key added on top of the message helps to find the target node without un-marshaling the whole message. After finding a target node to query the data, the read message will be sent to the appropriate node extracting the Read Message from the LibRe-Availability message. Handling the LibRe-Availability message in this way profits time in un-marshalling and marshalling the whole message; also speeds the read response process. 4 YCSB: Yahoo Cloud Services Benchmark YCSB: Yahoo Cloud Services Benchmark is one of the open source project written in JAVA for benchmarking the cloud databases. The main goal of YCSB is to measure the performance of the system by means of Throughput, Request Latency, Saturation Point and the Scalability of the database under common workload patterns. Unlike simulation tools, YCSB facilitates to examine the physical features and behaviors of the system with live workload passing it to the real-time system. The reason for YCSB implementation over the existence of traditional (e.g., TPC-C) benchmarking is the generation gap in the data storage systems. The current generation data storage systems introduced the facts like BASE [11] compliance in contrast to ACID [12], incorporations of tradeoffs and adaptations to diverged workload patterns. The TPC-C kind of benchmarking tools generates workloads which are more suitable for traditional RDBMS storage systems that includes transaction. Whereas, the modern storage systems are not meant for these kind of workload patterns. YCSB generates different types of workloads in compliance with the patterns suitable for the modern storage systems. The different workload patterns generated by YCSB includes Update Heavy, Read Mostly, Read Only, Read Latest, Scan-Insert and Read-Modify-Write. 4.1 Evaluating Data Consistency using YCSB In distributed data store, a data item is logically one and the same irrespective of the physical copies. A read for a data item arrived after an update should see the last updated value. The read that don t see the last updated value is called stale read. Till date, there is no proper tool or library in the literature for evaluating data consistency (number of stale reads) both in real time and via simulations. YCSB provides invoice about the performance of the systems for the current workload via Minimum Latency, Maximum Latency, Average Latency, 95th Percentile, 99th Percentile, but don t count on the number of stale reads. In YCSB, the workloads will be generated with the number of client threads defined by the user. As each data store system is developed for different use case models, the querying interface for each data store technically differ from one another. Hence in order to query different data stores, YCSB offers additional bindings called Database Interface Layer/ Db binding that are written exclusively for the appropriate data stores to work along with YCSB. The role of the Database Interface Layer is to translate the workloads generated by the workload generator into system appropriate format to be queried. One issue with YCSB in extending it to measure a new metric along with the offered metrics is that, after each query, the database interface layer returns the needed logistics to the status evaluation module instantly. After all the operations are completed, the status evaluation module computes the needed metrics and prints the invoice of the test case to an output file. So a pact between the two layers should be established safely without affecting the existing computations in order to compute a new metric. In our case, in the database interface layer of Cassandra, we introduced a Hash-Map data structure to account each I/O operation. So during write/update operation, the concatenation of the row-key and the column-name forms the key of the Hash-Map and the column timestamp as the value. During read operation, the timestamp of the retrieved data will be compared along with the timestamp registered for the particular data in the Hash-Map. If both the values are compliant, then the read will be considered as consistent, else the stale read count will be incremented by one. After each operation, along with other logistics of 7
8 the operation, the stale read count is returned to the status executor module. By this way, the status executor module prints the number of stale reads encountered during the test suit in the output file along with the default statistics. 4.2 CaLibRe performance evaluation using YCSB (a) Read Latency (b) Write Latencys (c) Stale Reads Figure 3: CaLibRe-Performance Evaluation Test Setup The experiment was conducted on a cluster of 19 Cassandra/CaLibRe instances, which includes 4 medium, 4 small and 11 micro instances of Amazon EC2 [1] and 1 large instance for the YCSB test suite. All instances were running on the platform Ubuntu Server LTS - 64 bit. The workload pattern used for the test suite is workload-a: Update-Heavy with Record Count of , Operation Count of , Thread Count of 10 and Replication-Factor as 3 for the test case. The test case compares the performance of 19 Cassandra instances with different consistency option: ONE, ALL and QUORUM against 19 CaLibRe instances with consistency-level: ONE. Since encountering number of stale reads in a small test setup is minimum, partial update propagation is injected into the Cassandra and CaLibRe cluster to account for the system performance under this scenario. Hence, during update operation, instead of propagating the update to all the replicas: 3, the update will not be propagated to one of the replicas. In our case, since the Replication-Factor is chosen as 3, instead of propagating the update to 3 replicas, the update will be propagated to only 2 replicas Test Evaluation The figures 3a, 3b and 3c respectively shows the evaluation of Read Latency, Write Latency and the number of Stale Reads of Cassandra with different consistency options against Cassandra with the LibRe protocol. In the figure 3a, the entity ONE represents the read and write operations with consistency option ONE. The read and write operations with consistency option QUORUM is implicated by the entity QUORUM. The entity ONE-ALL represents the operations with write consistency option ONE 8
9 and read consistency option ALL. The entity CALIBRE represents the performance of the proposed LibRe protocol built inside Cassandra. From the read latency graph: figure 3a, we can see clearly that the 95th Percentile Latency of CALIBRE remains the same as that of the other consistency options of Cassandra. The 99th Percentile Latency of CALIBRE and cassandra with consistency level ONE remains same and better than the other options: ONE-ALL and QUORUM. The minimum and average latencies of CALIBRE is slightly elevated when comparing to Cassandra with consistency option ONE but better than the consistency option QUORUM and ALL. On the other hand, while looking at the write latency graph: figure 3b, the consistency options both ONE and ONE-ALL does the write only once. So, the main comparator for the CALIBRE during Write s is the consistency option QUORUM. From the graph, it is clear that CALIBRE is better than the consistency option QUORUM in all the points: 99th Percentile, Minimum and Average latencies. However, when comparing to the entities ONE and ONE-ALL, there were very minimum elevation seen in CALIBRE but seems not much significant. On looking at the last graph 3c: Stale Reads, the number of stale reads recorded with the consistency option ONE is higher. There were few stale reads found in other consistency options too but are negligible when comparing to the number of requests: Hence it can be taken as, except consistency option ONE the consistency options ONE-ALL, QUORUM and CaLibRe confirms tight consistency. On conclusion, the performance of CALIBRE proved tight consistency with better read and write latencies than the popular combinations of the Cassandra s native consistency options. 5 Conclusion The work described in this document aims at studying tradeoff between Consistency, Latency and Availability of modern storage systems. The idea was to track the updates of some of the data items in order to offer better tradeoffs in the system. The performance of the system with the proposed idea: LibRe was evaluated in a real world distributed data storage system: Cassandra. Since encountering number of stale reads in a small test setup is minimum, partial update propagation is injected into the test setup to account for the system performance under this scenario. The performance results lead to several conclusions regarding read & write latencies. The results proves that LibRe protocol offers a better tradeoff between Consistency, Latency and Availability of distributed data storage system. In other words, if we can track the updates of few vulnerable data items, the distributed data storage system would maintain better tradeoff between Consistency, Latency and Availability. Some of the additional facts unveiled in our research includes: restrict forwarding read requests to stale nodes hold strong consistency, possibility of achieving strong consistency irrespective of number of replicas up/down, possibility of achieving strong consistency with minimal read and write guarantees. Measuring the consistency of a distributed data storage system remains challenging both by simulations and benchmarking the system with live workloads. Hence, in order to benchmark the system with live workloads, we have extended one of the popular bechmarking tool YCSB to account on the consistency aspect of the system along with the existing system metrics offered by YCSB. 6 Future Works Most of the popular distributed data storage systems offer different levels of consistency options and let the user/application to choose the desired consistency-level based on the severity of the request. But as the user and the system behaviors change dynamically over time, deciding a consistency-level in advance becomes tricky for the application developers. Adding intelligence to the system to predict the needed consistency-level based on the user s and/or system behavior is a broad area of research. With the help of some statistical information, it is possible to detect the user s behavior in the future. For example in case of time series data, the data that is accessed most over time follow similar behavior in future with respect to some influential factors. Some use cases of this type demand strong consistency during peak point and reduced consistency during non-peak point. In this type of use cases, in order to avoid exacerbating the system by applying strong consistency during the peak point where number of users try to access a particular data are high, LibRe protocol can be used to ensure the consistency of the data with lesser load to the system. A sample use case that we are currently experimenting is the Velib use case [3, 2], an application that gives information about the number of bikes and/or bike-slots available before taking/parking the bike at a velib station. The article [5] gives background about the use case and sorted the velib stations into three clusters: Balanaced, Overloaded and Underloaded. The idea is 9
10 to use the LibRe protocol if the request is during the peak time and the needed information about the station belongs to the Overloaded or Underloaded cluster. References [1] Amazon aws. [2] Velib - mairie de paris. [3] Velib - public bicycle sharing system in paris. [4] A. Broder, M. Mitzenmacher, and A. B. I. M. Mitzenmacher. Network applications of bloom filters: A survey. In Internet Mathematics, pages , [5] Y. Chabchoub and C. Fricker. Classifications of the velib stations using kmenas, dynamic time wraping and dba averaging method. IWCIM 14. IEEE Computer Society, Nov [6] F. Dabek. A Distributed Hash Table. PhD thesis, Massachusetts Institute of Technology, Nov [7] E. Hewitt. Cassandra: The Definitive Guide. O Reilly Media, Inc., 1st edition, [8] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internetscale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 10, pages 11 11, Berkeley, CA, USA, USENIX Association. [9] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35 40, Apr [10] J. Pokorny. Nosql databases: A step to database scalability in web environment. In Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, iiwas 11, pages , New York, NY, USA, ACM. [11] D. Pritchett. Base: An acid alternative. Queue, 6(3):48 55, May [12] G. Ramakrishnan. Database Management Systems. McGraw-Hill, third edition, [13] Y. Saito and M. Shapiro. Optimistic replication. ACM Comput. Surv., 37(1):42 81, Mar [14] K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1 10, May [15] E. Sit. Storing and Managing Data in a Distributed Hash Table. PhD thesis, Massachusetts Institute of Technology, June [16] T. White. Hadoop: The Definitive Guide. O Reilly Media, Inc., 1st edition, [17] Wikipedia. Content delivery network. delivery network, May [18] H. Zhang, Y. Wen, H. Xie, and N. Yu. Distributed Hash Table - Theory, Platforms and Applications. Springer Briefs in Computer Science. Springer,
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
Benchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
Cassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
Hypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects
LARGE-SCALE DATA STORAGE APPLICATIONS
BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
ZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
High Throughput Computing on P2P Networks. Carlos Pérez Miguel [email protected]
High Throughput Computing on P2P Networks Carlos Pérez Miguel [email protected] Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
The Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
The Hadoop Distributed File System
The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture
Getting Started with SandStorm NoSQL Benchmark
Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
Comparing Scalable NOSQL Databases
Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : [email protected] Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011 Clarications
Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
Hadoop Distributed FileSystem on Cloud
Hadoop Distributed FileSystem on Cloud Giannis Kitsos, Antonis Papaioannou and Nikos Tsikoudis Department of Computer Science University of Crete {kitsos, papaioan, tsikudis}@csd.uoc.gr Abstract. The growing
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
Snapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
Using Object Database db4o as Storage Provider in Voldemort
Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
these three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
Measuring Elasticity for Cloud Databases
Measuring Elasticity for Cloud Databases Thibault Dory, Boris Mejías Peter Van Roy ICTEAM Institute Univ. catholique de Louvain [email protected], [email protected], [email protected]
MASTER PROJECT. Resource Provisioning for NoSQL Datastores
Vrije Universiteit Amsterdam MASTER PROJECT - Parallel and Distributed Computer Systems - Resource Provisioning for NoSQL Datastores Scientific Adviser Dr. Guillaume Pierre Author Eng. Mihai-Dorin Istin
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011
Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011 Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works Analytics and Real-time what and why Facebook Insights
Introduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Apache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
CHAPTER 7 SUMMARY AND CONCLUSION
179 CHAPTER 7 SUMMARY AND CONCLUSION This chapter summarizes our research achievements and conclude this thesis with discussions and interesting avenues for future exploration. The thesis describes a novel
A survey of big data architectures for handling massive data
CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - [email protected] Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context
The Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
Distributed Data Stores
Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
HDFS Space Consolidation
HDFS Space Consolidation Aastha Mehta*,1,2, Deepti Banka*,1,2, Kartheek Muthyala*,1,2, Priya Sehgal 1, Ajay Bakre 1 *Student Authors 1 Advanced Technology Group, NetApp Inc., Bangalore, India 2 Birla Institute
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment
International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
WSO2 Message Broker. Scalable persistent Messaging System
WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable Messaging Distributed Message Brokers WSO2 MB Architecture o Distributed Pub/sub architecture o Distributed Queues architecture
Policy-based Pre-Processing in Hadoop
Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden [email protected], [email protected] Abstract While big data analytics provides
Distributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers
BASEL UNIVERSITY COMPUTER SCIENCE DEPARTMENT Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers Distributed Information Systems (CS341/HS2010) Report based on D.Kassman, T.Kraska,
NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Processing of Hadoop using Highly Available NameNode
Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale
CSE-E5430 Scalable Cloud Computing Lecture 11
CSE-E5430 Scalable Cloud Computing Lecture 11 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 30.11-2015 1/24 Distributed Coordination Systems Consensus
Highly Available Hadoop Name Node Architecture-Using Replicas of Name Node with Time Synchronization among Replicas
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 3, Ver. II (May-Jun. 2014), PP 58-62 Highly Available Hadoop Name Node Architecture-Using Replicas
Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
Evaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo
High Availability for Database Systems in Cloud Computing Environments Ashraf Aboulnaga University of Waterloo Acknowledgments University of Waterloo Prof. Kenneth Salem Umar Farooq Minhas Rui Liu (post-doctoral
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh
Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems
Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:
Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity
P3 InfoTech Solutions Pvt. Ltd http://www.p3infotech.in July 2013 Created by P3 InfoTech Solutions Pvt. Ltd., http://p3infotech.in 1 Web Application Deployment in the Cloud Using Amazon Web Services From
Introduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect [email protected] Validating if the workload generated by the load generating tools is applied
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Eventually Consistent
Historical Perspective In an ideal world there would be only one consistency model: when an update is made all observers would see that update. The first time this surfaced as difficult to achieve was
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection
C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU Berlin) with Marco Canini (UCL), Stefan Schmid, Anja Feldmann (TU Berlin) Tail-latency matters One User Request
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015
Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015 http://www.endpoint.com/ Table of Contents Executive Summary...
Performance Workload Design
Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Yahoo! Cloud Serving Benchmark
Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper [email protected] Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and
Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk s for MongoDB Workloads December 214 951 SanDisk Drive, Milpitas, CA 9535 214 SanDIsk Corporation. All rights reserved www.sandisk.com Accelerating Big
Building a Scalable News Feed Web Service in Clojure
Building a Scalable News Feed Web Service in Clojure This is a good time to be in software. The Internet has made communications between computers and people extremely affordable, even at scale. Cloud
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
multiparadigm programming Multiparadigm Data Storage for Enterprise Applications
focus multiparadigm programming Multiparadigm Data Storage for Enterprise Applications Debasish Ghosh, Anshin Software Storing data the same way it s used in an application simplifies the programming model,
Study and Comparison of Elastic Cloud Databases : Myth or Reality?
Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri
X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Performance Testing of Big Data Applications
Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies [email protected] Shirish Bhale: Director of Engineering
