Virtual file system on NoSQL for processing high volumes of HL7 messages
|
|
|
- Dulcie Randall
- 10 years ago
- Views:
Transcription
1 Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License. doi: / Virtual file system on NoSQL for processing high volumes of HL7 messages Eizen KIMURA 1 and Ken ISHIHARA Dept. Medical Informatics of Medical School of Ehime University 687 Abstract. The Standardized Structured Medical Information Exchange (SS-MIX) is intended to be the standard repository for HL7 messages that depend on a local file system. However, its scalability is limited. We implemented a virtual file system using NoSQL to incorporate modern computing technology into SS-MIX and allow the system to integrate local patient IDs from different healthcare systems into a universal system. We discuss its implementation using the database MongoDB and describe its performance in a case study. Keywords. HL7, Data Science, Distributed Computing, NoSQL, SS-MIX Introduction The leveraging of big data analysis in the medical domain could break new ground in the management of lifestyle-related diseases and also increase the speed of drug development. The US government recently advocated the Big Data Research and Development Initiative, with the NIH announcing that more than 200 terabytes of genomic data from a thousand genomic research projects will be available on the Amazon Web Service 1. Conversely, in Japan, the National Database (NDB) already contains more than 6.9 billion health insurance claims, yet, there is no still concrete plan to develop a big data analysis framework. In 2006, the Ministry of Health Labour and Welfare introduced the Standardized Structured Medical Information Exchange project to promote the exchange of health information among institutions 2. The design concept of SS-MIX aims at simplicity by making use of standard file systems and storing HL7 messages in a standard directory structure. However, its development started before the Internet era, and it is intended for use with local file systems. The idea was for a clinic or hospital to be able to provide patient data on a portable storage device (e.g., CD-ROM, USB memory stick), enabling that a patient to take the data to another institution. SS-MIX also lacks a distributed data processing scheme and has limited scalability. Moreover, in Japan, there is no nationwide patient ID system, but rather institution-specific IDs, regional patient IDs, clinical research registration IDs, and so on. A scheme needs to be developed to aggregate these IDs and medical records, and to enable analysis in a cross-sectional manner. One way forward would be to preserve the simplicity of SS-MIX, but add various capabilities including large-scale storage, high-speed search, distributed processing, and the ability to aggregate multiple patient IDs into unique nationwide IDs. Google has already built a distributed data storage system called BigTable 3. By separating user data based on metadata, it offers 1 Corresponding Author. Eizen Kimura Medical School of Ehime Univ. [email protected]
2 688 E. Kimura and K. Ishihara / Virtual File System on NoSQL for Processing High Volumes an individualised user experience, although all of the resources of each user are accumulated on a single cloud storage system 4. In the present study, we leverage cloud technology to aggregate all patient medical records in SS-MIX, improve its search and distribution processing performance, and share medical information with stakeholders securely. To this end, we applied the same metadata scheme used in BigTable. Convert(HL7(messages(to(BSON(for(MongoDB(store( MSH ^~\& XXXX C PRIORITYHEALTH PRIORITYHEALTH ORU^R01 Q T P 2.3 PID ^^^Priority Health LASTNAME^FIRSTNAME^INIT M PD ^PCPLAST^PCPFIRST^M^^^^^NPI OBR 1 185L29839X64489JLPF~X64489^ACC_NUM JLPF^Lipid Panel - C 1694^DOCLAST^DOCFIRST^^MD OBX 1 NM JHDL^HDL Cholesterol (CAD) 1 62 CD:289^mg/dL >40^>40 "" "" F ^^^"" OBX 2 NM JTRIG^Triglyceride (CAD) 1 72 CD:289^mg/dL ^35^150 "" "" F ^^^"" OBX 3 NM JVLDL^VLDL-C (calc - CAD) 1 14 CD:289^mg/dL "" "" F ^^^"" OBX 4 NM JLDL^LDL-C (calc - CAD) CD:289^mg/dL 0-100^0^100 H "" F ^^^"" OBX 5 NM JCHO^Cholesterol (CAD) CD:289^mg/dL ^90^200 H "" F ^^^"" Original'Raw'HL7'Message Question: The average value of cholesterol of male adults (age: ) Select HL7 Records satisfying following conditions: Sex is Man (PID 8) AND Birth date is in between 1936/04/01 and 1972/03/31 AND OBX has JHDL entry. Retrieve Cholesterol Laboratory Result from OBX 5th eld Count number of results and sum the cholesterol values Calcurate average value of collected cholesterol values Convert var res = db.somecoll.mapreduce( map,reduce, { nalize: nalize, out:{ replace: "map_reduce_example" }, query: { "HL7Message.PID.PID_8" : "M", "HL7Message.PID.PID_7": {"$gte": , "$lte": }, "HL7Message.OBX.OBX_3.OBX_3_0": "JHDL", }}); var map = function() { for (idx in this['hl7message']['obx']) { if (this['hl7message']['obx'][idx]['obx_3']['obx_3_0'] == "JHDL") { var key = "JHDL"; var value = { sum : parseint(this['hl7message']['obx'][idx]['obx_5']), count : 1} emit(key,value);}}} var reduce = function(key, values) { reducedval = { sum: 0,count: 0}; values.foreach(function(value) { if (!isnan(value.sum)) { reducedval.sum+=value.sum; reducedval.count+=value.count; } }); return (reducedval); } var nalize = function(key, reducedval) { return { sum: reducedval.sum, count: reducedval.count, average: reducedval.sum / reducedval.count, }; Serialized'XML'Message <?xml version="1.0" encoding="utf-8"?> <HL7Message> <MSH> <MSH_0>MSH</MSH_0> <MSH_1>^~\&</MSH_1> <MSH_2>XXXX</MSH_2> <MSH_3>C</MSH_3> <OBX> <OBX_0>OBX</OBX_0> <OBX_1>1</OBX_1> <OBX_2>NM</OBX_2> <OBX_3> <OBX_3_0>JHDL</OBX_3_0> <OBX_3_1>HDL Cholesterol (CAD)</OBX_3_1> </OBX_3> <OBX_4>1</OBX_4> <OBX_5>62</OBX_5> <OBX_6> <OBX_6_0>CD:289</OBX_6_0> <OBX_6_1>mg/dL</OBX_6_1> </OBX_6> <OBX_7> <OBX_7_0>>40</OBX_7_0> <OBX_7_1>>40</OBX_7_1> </OBX_7> <OBX_8>""</OBX_8> <OBX_9/> <OBX_10>""</OBX_10> <OBX_11>F</OBX_11> <OBX_12/> <OBX_13/> <OBX_14> </OBX_14> <OBX_15/> <OBX_16/> <OBX_17> <OBX_17_0/> <OBX_17_1/> <OBX_17_2/> <OBX_17_3>""</OBX_17_3> </OBX_17> <OBX_18/> </OBX> <OBX> <OBX_0>OBX</OBX_0> <OBX_1>2</OBX_1> <OBX_2>NM</OBX_2> <OBX_3> <OBX_3_0>JTRIG</OBX_3_0> <OBX_3_1>Triglyceride (CAD)</OBX_3_1> </OBX_3> <OBX_4>1</OBX_4> <OBX_5>72</OBX_5> <OBX_6> <OBX_6_0>CD:289</OBX_6_0> <OBX_6_1>mg/dL</OBX_6_1> </OBX_6> <OBX_7> <OBX_7_0>35-150</OBX_7_0> <OBX_7_1>35</OBX_7_1> <OBX_7_2>150</OBX_7_2> </OBX_7> ConfigServer (mongod) Convert {"HL7Message"=> {"MSH"=> {"MSH_0"=>"MSH", "MSH_1"=>"^~\\&", "MSH_2"=>"XXXX", "MSH_3"=>"C", "MSH_4"=>"PRIORITYHEALTH", Fig. 1 Storing and MapReduce HL7 messages on virtual file system BSON'Message "OBX"=> [{"OBX_0"=>"OBX", "OBX_1"=>"1", "OBX_2"=>"NM", "OBX_3"=>{"OBX_3_0"=>"JHDL", "OBX_3_1"=>"HDL Cholesterol (CAD)"}, "OBX_4"=>"1", "OBX_5"=>"62", "OBX_6"=>{"OBX_6_0"=>"CD:289", "OBX_6_1"=>"mg/dL"}, "OBX_7"=>{"OBX_7_0"=>">40", "OBX_7_1"=>">40"}, "OBX_8"=>"\"\"", "OBX_9"=>nil, "OBX_10"=>"\"\"", "OBX_11"=>"F", "OBX_12"=>nil, "OBX_13"=>nil, "OBX_14"=>" ", "OBX_15"=>nil, "OBX_16"=>nil, "OBX_17"=> {"OBX_17_0"=>nil, "OBX_17_1"=>nil, "OBX_17_2"=>nil, "OBX_17_3"=>"\"\""}, "OBX_18"=>nil}, {"OBX_0"=>"OBX", "OBX_1"=>"2", "OBX_2"=>"NM", "OBX_3"=>{"OBX_3_0"=>"JTRIG", "OBX_3_1"=>"Triglyceride (CAD)"}, "OBX_4"=>"1", "OBX_5"=>"72", "OBX_6"=>{"OBX_6_0"=>"CD:289", "OBX_6_1"=>"mg/dL"}, "OBX_7"=>{"OBX_7_0"=>"35-150", "OBX_7_1"=>"35", "OBX_7_2"=>"150"}, Import(BSON(Messages(into MongoDB(Sharding(Clusters Sharding'Nodes i (mongod)' Query Mapping Shuffilng Reducing Final'Result Rou$ng'Server (mongos) Mongo'Map'Reduce'Framework 1. Methods The virtual file system uses MongoDB (version 2.4.9) as the NoSQL backend. MongoDB offers distributed processing on multiple nodes via sharding, which consists of 10 nodes 3. One (called mongos) is for the routing server process, one (mongod) mediates the interaction between sharding nodes and clients, and the rest of the nodes process distributed data. Each node is deployed on Science Cloud 5 and is run on a CentOS 5.7 (64 bit) Intel Xeon X5675 chip at 3.97 GHz with 12 cores and 96 Gb RAM, 10 Gbps x 2. The General Parallel File System (GPFS) 6 was built on the RAID6 system and consists of 600 sets of 3-Tb/7200-rpm hard disks; we linked this system to the 10 nodes using 10 Gbps connections. Because MongoDB uses Binary Java Script Object
3 E. Kimura and K. Ishihara / Virtual File System on NoSQL for Processing High Volumes 689 Notation (BSON) as an internal representation 7, we developed a tool that converts raw HL7 ver 2.x messages into BSON format and then stores the converted messages under an HL7Message node of the document (Figs. 1, 2). It decomposes every separator of the HL7 message, arranges its contents in accordance with the hierarchical structure of the BSON document, and assigns consecutive numbers to each one. It also extracts the patient ID, institutional ID, and the type of message from the original HL7 message and arranges the metadata simulating SS-MIX standard storage under the SS-MIX node of the BSON document (Fig. 1). The virtual file system that simulates SS-MIX storage is developed using Filesystem in Userspace (FUSE) 8. It mounts a virtual SS-MIX storage system on the host and then converts the requests and responses of file system access to the query to, and response from, the MongoDB. The FUSE module was developed using Ruby and the FUSEfs module. It simulates the file system hierarchy using the metadata under the SS-MIX node in the BSON document (Fig. 1). The tool for aggregating patient IDs adds an entry containing a universal patient ID and a new institution ID to the existing metadata under the SS-MIX node. This makes it possible to search all medical records for any given patient across healthcare facilities. The system must be able to efficiently register data from nationwide healthcare systems in real time. To test this, we performed various evaluations. First, to assess the relationship between the number of sharding nodes and the performance of data registration, the average processing times were calculated from five processing times. We simulated a case in which 40 clients sent a message that included 100 HL7 messages and repeated this 500 times. Thus, we determined the registration time to process 2 million HL7 messages. We repeated this process using a different number of nodes, from one to eight. Next, we investigated how the numbers of concurrent connections and of bulk-transferred HL7 messages affected registration performance. On the sharding setting of eight nodes, we measured the average number of registrations while changing the following conditions. We assumed various concurrent connections (from 1 to 40 clients), different numbers of HL7 messages per inquiry (1000, 2000, 4000, and 8000), and repeated this 500 times. To evaluate its performance processing distributed data, we prepared a MapReduce scenario that collected laboratory data on high-density lipoprotein (HDL) cholesterol levels of men aged years in April First, the system performed a query to narrow down the HL7 records to only those that matched our conditions (gender, or PID-8; birth date, or PID- 7; lab test result, whose OBX-3 is JHDL). In Map process, the system extracts the JHDL laboratory test results from the value of the OBX-5 field from the OBX resides previously matched HL7 messages. In Reduce process, it counts the number of laboratory test and the sum of the laboratory test result values from every node. In finalizing process, it calculates the average value for HDL cholesterol from previously corrected values. We conducted this process 10 times, changing the number of nodes and number of HL7 messages involved, and determined the average processing time. 2. Results The average size of HL7 messages was 824 bytes, and that of BSON-converted ones was 3568 bytes. When 100 million HL7 messages were stored in MongoDB, its physical volume was Gb. Figure 3 shows the relationship between the number of nodes and data registration performance. Registration performance increased up to four nodes, after which it remained constant. Figure 4 shows the relationship between
4 690 E. Kimura and K. Ishihara / Virtual File System on NoSQL for Processing High Volumes the number of concurrent connections and the number of bulk-transferred HL7 messages. As the number of concurrent connections increased, the registration performance improved, up to 34 simultaneous connections. At that point, registration peaked at 7664 messages per second and thereafter reached a plateau. The number of bulk-transferred messages had no impact on the overall performance. As long as every node has less than 30 million messages, the sharding shows an inverse proportion to the number of nodes. The processing time was measured as t = /x (s) (R2 = 0.987) (x: number of nodes), and it shows the O(n) order performance scale. Fig 2. SS-MIX schema on MongoDB Fig 3. Performance of bulk message transfer Fig. 4 Performance of MongoDB sharding Fig. 5 MapReduce processing time 3. Discussion MongoDB uses the sharding keys to keep O(n) order search performance as a whole by adding sharding nodes. We had to take advantage of assigned equally distributed ID, not patient ID for the sharding key because patient ID was known to be considerable variation in distribution. This method shows high scalability in processing cross tabulations by reducing the need of cross-referring data over another nodes. As MongoDB depends on memory-mapped files 9, its performance is reduced greatly when its contents exceed the capacity of the server memory. According to our tests, its performance was degraded once 30 million messages were stored in a single node. MongoDB is a document-oriented No-SQL that uses the BSON format as an internal storing representation and allows indexing of all document contents. Hence, we believe that it is suitable to use MongoDB as the NoSQL infrastructure for structured documents such as HL7 CDA R2. Our system converts raw HL7 messages into BSON
5 E. Kimura and K. Ishihara / Virtual File System on NoSQL for Processing High Volumes 691 format, and it proved to be scalable. Assuming a server akin to what we used in the present study, 55 nodes will be sufficient to process one billion HL7 records from all healthcare institutions in Japan. Our system may conduct a MapReduce process in minutes and handle real-time streaming of laboratory results to detect anomalies, such as signs of infectious disease spread. However, in our tests, the routing server eventually reached a plateau in registration performance. Hence, we have to increase the nodes on the routing server to avoid this problem in the future. The previous study 10 has the similar system settings of ours one. However, the main difference is that the study stores data files on legacy file system. It builds the metadata indexes for the files and stores them into MongoDB. It provides the virtual file system that shows the files limited by some queries against metadata. Meanwhile, our system stores data directly and adds the metadata for simulating virtual file system on MongoDB to overcome the performance limitation of legacy file system. A healthcare setting can mount its data through a virtual file system, separated from other healthcare settings data. Despite the fact that our approach does not need the preexistence of a file system, providing the virtual file system was required to ensure compatibility with legacy applications on the SS-MIX storage requires a file system. HL7 had been developing innovative standards framework Fast Healthcare Interoperability Resources (FHIR) for sharing medical information 11. In its specification, FHIR adopts JSON as a standard representation format, which has sideby-side compatibility with BSON. Therefore, FHIR documents will be the primary targets of parallel distributed processing immediately by storing in MongoDB. We will verify whether FHIR is a suitable format for distributed processing in cloud computing. Acknowledgements: Data processing and other research was performed using the NICT Science Cloud at the National Institute of Information and Communications Technology (NICT) as a collaborative research project. This work was supported by MEXT KAKENHI Grant Number References [1] Policy OoSaT. OBAMA ADMINISTRATION UNVEILS BIG DATA INITIATIVE: ANNOUNCES $200 MILLION IN NEW R&D INVESTMENTS 2012; Available from: [2] Kimura M, Nakayasu K, Ohshima Y, Fujita N, Nakashima N, Jozaki H, et al. SS-MIX: A Ministry Project to Promote Standardized Healthcare Information Exchange. Methods of Information in Medicine. 2011;50(2):131. [3] Chodorow K. Scaling MongoDB: O'Reilly Media, Inc.; [4] Cooper J. How Entities and Indexes are Stored. 2009; Available from: [5] Murata KT, Watari S, Nagatsuma T, Kunitake M, Watanabe H, Yamamoto K, et al. A Science Cloud for Data Intensive Sciences. Data Science Journal. 2013;12:WDS139-WDS46. [6] Schmuck FB, Haskin RL. GPFS: A Shared-Disk File System for Large Computing Clusters. FAST. 2002;2:19. [7] Cattell R. Scalable SQL and NoSQL data stores. ACM SIGMOD Record. 2011;39(4): [8] Szeredi M. Filesystem in Userspace. 2013; Available from: [9] Parker Z, Poe S, Vrbsky SV. Comparing NoSQL MongoDB to an SQL DB. Proceedings of the 51st ACM Southeast Conference; Savannah, Georgia : ACM; p [10] Jacobi MR, editor. Applied Parallel Metadata Indexing. Conference: 4th Annual Computing and Information Technology Student Mini-Showcase; 2012: Los Alamos National Laboratory (LANL). [11] HL7. FHIR: Fast healthcare interoperability resources [cited /17]; Available from:
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
A method for handling multi-institutional HL7 data on Hadoop in the cloud
A method for handling multi-institutional HL7 data on Hadoop in the cloud { Masamichi Ishii *1, Yoshimasa Kawazoe *1, Akimichi Tatsukawa 2*, Kazuhiko Ohe *2 *1 Department of Planning, Information and Management,
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
MongoDB: document-oriented database
MongoDB: document-oriented database Software Languages Team University of Koblenz-Landau Ralf Lämmel, Sebastian Jackel and Andrei Varanovich Motivation Need for a flexible schema High availability Scalability
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
Hadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
MongoDB Developer and Administrator Certification Course Agenda
MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL
MongoDB and Couchbase
Benchmarking MongoDB and Couchbase No-SQL Databases Alex Voss Chris Choi University of St Andrews TOP 2 Questions Should a social scientist buy MORE or UPGRADE computers? Which DATABASE(s)? Document Oriented
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.
L7_L10 MongoDB Agenda What is MongoDB? Why MongoDB? Using JSON Creating or Generating a Unique Key Support for Dynamic Queries Storing Binary Data Replication Sharding Terms used in RDBMS and MongoDB Data
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment
R&D supporting future cloud computing infrastructure technologies Research and Development on Autonomic Operation Control Infrastructure Technologies in the Cloud Computing Environment DEMPO Hiroshi, KAMI
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
Getting Started with MongoDB
Getting Started with MongoDB TCF IT Professional Conference March 14, 2014 Michael P. Redlich @mpredli about.me/mpredli/ 1 1 Who s Mike? BS in CS from Petrochemical Research Organization Ai-Logix, Inc.
Comparisons Between MongoDB and MS-SQL Databases on the TWC Website
American Journal of Software Engineering and Applications 2015; 4(2): 35-41 Published online April 28, 2015 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j.ajsea.20150402.12 ISSN: 2327-2473
Benchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
A Big Data-driven Model for the Optimization of Healthcare Processes
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
Analisi di un servizio SRM: StoRM
27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The
NoSQL Databases. Polyglot Persistence
The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
Contextual cloud-based service oriented architecture for clinical workflow
592 Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - [email protected]. CMSC 601 - Presentation
HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - [email protected] CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous
Big Data Visualization with JReport
Big Data Visualization with JReport Dean Yao Director of Marketing Greg Harris Systems Engineer Next Generation BI Visualization JReport is an advanced BI visualization platform: Faster, scalable reports,
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <[email protected]> @GEOFFLANE
NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE WHAT IS NOSQL? NON-RELATIONAL DATA STORAGE USUALLY SCHEMA-FREE ACCESS DATA WITHOUT SQL (THUS... NOSQL) WIDE-COLUMN / TABULAR
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
NoSQL document datastore as a backend of the visualization platform for ECM system
NoSQL document datastore as a backend of the visualization platform for ECM system JURIS RATS RIX Technologies Riga, Latvia Abstract: - The aim of the research is to assess performance of the NoSQL Document-oriented
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Lustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins
The Definitive Guide to MongoDB The NoSQL Database for Cloud and Desktop Computing 11 111 TECHNISCHE INFORMATIONSBIBLIO 1 HEK UNIVERSITATSBIBLIOTHEK HANNOVER Eelco Plugge, Peter Membrey and Tim Hawkins
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
Big Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Big data and urban mobility
Big data and urban mobility Antònia Tugores,PereColet Instituto de Física Interdisciplinar y Sistemas Complejos, IFISC(UIB-CSIC) Abstract. Data sources have been evolving the last decades and nowadays
Bi-Directional Interface between EMR and Quest Diagnostics Microsoft.NET with SQL Server Reporting Services solution for Healthcare Company
Bi-Directional Interface between EMR and Quest Diagnostics Microsoft.NET with SQL Server Reporting Services solution for Healthcare Company Executive Summary One of our EMR clients approached us to setup
Building Heavy Load Messaging System
CASE STUDY Building Heavy Load Messaging System About IntelliSMS Intelli Messaging simplifies mobile communication methods so you can cost effectively build mobile communication into your business processes;
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015
Leveraging the Power of SOLR with SPARK Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Welcome Johannes Weigend - CTO QAware GmbH - Software architect / developer - 25 years
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Big Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
IMPLEMENTING GREEN IT
Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK
Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying
CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY
White Paper CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY DVTel Latitude NVMS performance using EMC Isilon storage arrays Correct sizing for storage in a DVTel Latitude physical security
DYNAMIC QUERY FORMS WITH NoSQL
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH
NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO [email protected] DAMA SF December 15, 2011
NoSQL - What we ve learned with mongodb Paul Pedersen, Deputy CTO [email protected] DAMA SF December 15, 2011 DW2.0 and NoSQL management decision support intgrated access - local v. global - structured v.
Supporting in- and off-hospital Patient Management Using a Web-based Integrated Software Platform
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)
Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2
Performance Analysis of Book Recommendation System on Hadoop Platform
Performance Analysis of Book Recommendation System on Hadoop Platform Sugandha Bhatia #1, Surbhi Sehgal #2, Seema Sharma #3 Department of Computer Science & Engineering, Amity School of Engineering & Technology,
IBM Rational Asset Manager
Providing business intelligence for your software assets IBM Rational Asset Manager Highlights A collaborative software development asset management solution, IBM Enabling effective asset management Rational
.NET User Group Bern
.NET User Group Bern Roger Rudin bbv Software Services AG [email protected] Agenda What is NoSQL Understanding the Motivation behind NoSQL MongoDB: A Document Oriented Database NoSQL Use Cases What is
Using Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne, 4.9.2013
NoSQL web apps w/ MongoDB, Node.js, AngularJS Dr. Gerd Jungbluth, NoSQL UG Cologne, 4.9.2013 About us Passionate (web) dev. since fallen in love with Sinclair ZX Spectrum Academic background in natural
How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. [email protected]. 2011 Whamcloud, Inc.
EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. [email protected] Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues
these three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
Big Data Visualization and Dashboards
Big Data Visualization and Dashboards Boney Pandya Marketing Manager Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
MakeMyTrip CUSTOMER SUCCESS STORY
MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
NoSQL. Thomas Neumann 1 / 22
NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Tunebot in the Cloud. Arefin Huq 18 Mar 2010
Tunebot in the Cloud Arefin Huq 18 Mar 2010 What is Tunebot? What is Tunebot? http://tunebot.cs.northwestern.edu Automated online music search engine for query-by-humming (QBH). What is Tunebot? http://tunebot.cs.northwestern.edu
Business Process Management with @enterprise
Business Process Management with @enterprise March 2014 Groiss Informatics GmbH 1 Introduction Process orientation enables modern organizations to focus on the valueadding core processes and increase
Ad Hoc Analysis of Big Data Visualization
Ad Hoc Analysis of Big Data Visualization Dean Yao Director of Marketing Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs Analysis A practical case of why, what and how to use a NoSQL Database Management System instead of a relational one José Manuel Ciges Regueiro
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
White Paper February 2010. IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario
White Paper February 2010 IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario 2 Contents 5 Overview of InfoSphere DataStage 7 Benchmark Scenario Main Workload
