A method for handling multi-institutional HL7 data on Hadoop in the cloud

Size: px
Start display at page:

Download "A method for handling multi-institutional HL7 data on Hadoop in the cloud"

Transcription

1 A method for handling multi-institutional HL7 data on Hadoop in the cloud { Masamichi Ishii *1, Yoshimasa Kawazoe *1, Akimichi Tatsukawa 2*, Kazuhiko Ohe *2 *1 Department of Planning, Information and Management, The University of Tokyo Hospital, Japan *2 Department of Medical Informatics and Economics, Graduate School of Medicine, The University of Tokyo, Japan AGENDA 1. Introduction 2. Technology Brief 3. Implementation Processes & Outcomes 4. Conclusion

2 many complaints EMR systems wide-spread, but clinicians (also often researchers) voice many complaints because Their information access demands for - clinical research for - evidence based medicine But Getting less in return for their elaboration of keyboarding EMRs

3 Purposes of querying clinical data Classified and subtotaled 195 retrieval request sheets submitted by clinicians working for The University of Tokyo between 2006 and Figure 1. Purposes of querying clinical data

4 IF direct retrieval of the clinical data were available

5 Slowdown of On-premise DWH The bigger Data Warehouse data become the more time your query job linearly takes Improving the performance means scaling up the IT infrastructure (leads to greater costs perhaps exponentially so) Most of your DWHs based on RDB technology. This requires ongoing maintenance as well as a thorough understanding of the relevant table schemas before you can launch your querying jobs

6 SS-MIX Growing into In Japan, an increasing number of medical institutions exchange medical records via SS-MIX. On-premises SS-MIX storages integrated, it would become clinical Big Data!! SS-MIX :the Standardized Structured Medical-record Information exchange >>>> the Japanese de facto standard for HL7 format medical records

7 IF multi-institutional SS-MIX storages became integrated, what would you want from it? rapid direct retrieval of clinical Big Data became available

8 How to make the most of secondary use of clinical records Challenges Our Goal : Providing IT infrastructure to make it easier for clinicians to directly retrieve for themselves what they want from clinical Big Data

9 Technology Brief Cloud computing technology + distributed processing architecture a convenient, cost-effective, and scalable environment You can share clinical data among different medical institutions Our prototype system will meet our challenges We address how the effective use of Hadoop & Pig will bring great benefits to clinicians working in your medical institution who are eager to carry out EBM or epidemiological studies

10 Components of IT infrastructure Selected Hadoop as a distributed processing framework To ensure future scalability built a Hadoop cluster in the cloud To assure direct access to Big Data Selected Hadoop Pig as the data retrieval component

11 What s Pig Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. ~ quoted from ~ But, Pig has no function for parsing HL7 message (as semi-structured data)

12 Newly developed developed a set of specific utilities to optimise clinical data search and retrieval in minimum time 1 data migrating tool : merge & convert HL7 file to a file optimised for distributed processing in Hadoop 2 HL7 message tabulating tool : User Definition Functions for parsing HL7 These tools help users to store, manage and retrieve HL7 data on Hadoop in the cloud

13 1 Data migration tool Efficient query execution in Hadoop depends on file size. the average size of an HL7 message (2 4 KB) is too small for distribution on Hadoop ( default block size 64GB ) SS-MIX storage (HL7 message files) Data migration Tool Merge each file according the Data type Convert a HL7 file into one line ; line feed code <br> tag <EOF> line feed code Make personal information anonymous Encode each file to UTF-8 Add key values ADT for HDFS PPR for HDFS OUL for HDFS RDE for HDFS RAS for HDFS

14 1 Data migration tool (adding key values) HL7 filename (SS-MIX) conventions Patient id _ Transaction Date/Time _ Data Type _ Placer Order Number _ Date/time Of Message _ diagnosis and treatment department _ condition flag OUL ^ 脳 外 ^L HL7 message (contents) <HT> <HT>OUL-01<HT> <HT> <HT>08^diabetes^L<HT>1<HT> MSH ^~\& OUL^R22^OUL_R22 <br> PID ^^^^PI 匿 名 ^ 患 者 名 ^^^^^L^I <br> PV O 32^^^^^C 32<br> SPM ^ 全 血 ( 添 加 物 入 り)^JC10^84^ 血 漿 ^99Z13 <br> OBR _ E518^ 血 糖 尿 糖 ^99Z <br> ORC SC _01 <br> OBX <br> OBX <LF> One line LEGEND: <HT>:horizontal tabulation 0x09 <LF>: line feed 0x0a

15 2 HL7 message tabulating tool SS-MIX Storage ( HL7 message files ) MSH ^~\& OUL^R22^OUL_R22 PID ^^^^PI 匿 名 ^ 患 者 名 ^^^^^L^I PV O 32^^^^^C 32 SPM ^ 全 血 ( 添 加 物 入 り)^JC10^84^ 血 漿 ^99Z13 OBR _ E518^ 血 糖 尿 糖 ^99Z ORC SC _01 OBX 0001 NM 3D ^グリコヘモグロビンA1c 全 血 ( 添 加 物 入 り)^JC10^ _84^ヘモグロビンA1c(JDS)^99Z % ^%^99Z F SPM ^ 血 清 ^JC10^85^ 血 清 ^99Z13 OBR _ E564^ 生 化 学 免 疫 ^99Z ORC SC _01 OBX 0001 NM 3C ^クレアチニン 血 清 ^JC10^ _85^クレアチニン(Cre) ^99Z mg/dl^mg/dl^99z F REGISTER /usr/lib/pig/p4udf.jar define NSSMIX p4udf.normalizessmix('pid_3_1 OBR_7 OBX_3_1 OBX_3_2 OBX_5'); 1 a = LOAD 'SSMIX2/OML UTF-8.ssmix2.log' as (aa,bb,cc,dd,ee,ff,ssmix:chararray); b = FOREACH a GENERATE NSSMIX(ssmix) as MEISAI; Create relations & schemas 23 { ( , , 3D , グリコヘモグロビンA1c 全 血 ( 添 加 物 入 り), 4.9 ), ( , , 3C , クレアチニン 血 清, 1.11 ) } 45

16 Implementation Process & Outcomes 1) Setting up the Hadoop cluster on a cloud service 2) Investigating query pattern requirements 3) Merging HL7 messages 4) Adding key values to each HL7 messages 5) Migrating merged files to HDFS 6) Querying clinical data with PIG scripts 7) Preliminary evaluation

17 1) Setting up the Hadoop Cluster on a Cloud Service cdh-05 cdh-04 Data Data Node Data Node cdh-03 cdh-02 cdh-01 Data Node HDFS Data Node HDFS Node HDFS HDFS HDFS CentOS V5.6 with 6 virtual CPU, CentOS 16GB V5.6 memory, with 6 virtual 100GB CPU, CentOS virtual disk 16GB V5.6 memory, with 6 virtual 100GB CPU, CentOS virtual disk 16GB V5.6 memory, with 6 virtual 100GB CPU, virtual CentOS disk 16GB V5.6 memory, with 6 virtual 100GB CPU, virtual disk 16GB memory, 100GB virtual disk 10 nodes in a Cloud Job Name Node cdh-00 cdh-06 cdh-07 Data Node Data CentOS V5.6 with Node Data 6 virtual CPU, 16GB CentOS memory, V5.6 with 100GB 6 virtual virtual CPU, disk CentOS V5.6 with CentOS V5.6 with 6 virtual CPU, 16GB memory, 100GB virtual disk 16GB memory, 100GB 6 virtual virtual CPU, disk 16GB CentOS memory, V5.6 with 100GB 6 virtual virtual CPU, disk 16GB memory, 100GB virtual disk cdh-08 cdh-09 cdh-10 HDFS Node Data HDFS Node Data HDFS Node HDFS HDFS Figure 2. Hadoop Cluster Client PC to launch PIG scripts pig-01

18 2) Investigating Query Pattern Requirements Classified and subtotaled 195 retrieval request sheets submitted by clinicians working for Tokyo Univ. Figure 3. Frequency of using Data Type Dosing (RDE), 39 Diagnosis (PPR), 67 Lab. Results (OUL), 35 Patient property (ADT), 138 Figure 4. Frequency of Data types used in the search requests

19 3) Data Merging Flow HL7 Diagram messages (Overview) 4) Adding key values to each HL7 5) Migrating merged files to HDFS HIS/EHR HIS /EHR Patient Information medication, injection Laboratory results x-ray reports, CT reports, MRI reports, etc.. Diagnosis Pathological diagnosis Endoscopic diagnosis Physiology diagnostic - electrocardiogram SS-MIX Server 3 SS-MIX Storage (Standardi zed) HL7 V2.5 3)4) Data Migration tool Cloud Computing Services Gateway Server 5) File transport /Upload to HDFS Multi-institutional medical storage Institution A HL7 V2.5 HL7 V2.5 HDFS Institution B HDFS Institution C The merged files consist of -450,000 patients properties (ADT) -3.5 million disease diagnosis records (PPR) -10 million labo test result records (OUL) -2 million drug dosing records (RDE) recorded between June 2010 and January HL7 V2.5 HDFS n

20 6) Querying clinical data with PIG scripts Sample(benchmark query) retrieve the ID of patients who had been diagnosed as Type2 diabetes mellitus who visited as outpatient in 2010, and whose lab test results was such as ( HbA1c >= 6.5 % ) and ( CRE < 2.0 mg/dl ) The duration between one specimen collection data/time and the other must be within seven days

21 7) Preliminary evaluation We defined benchmark queries and compared query performance in the cloud environment to performance in an on-premise DWH Queries executed to a cloud-based HL7 storage were significantly faster than queries executed by the on-premise DWH Figure 5. result of sample benchmark query

22 Conclusion outcomes and lessons learnt - A Hadoop cloud based system designed to share HL7 messages between medical organisations has potential to speed up data-retrieving queries. The system offers potential for efficient, fast data retrieval, and substantial benefits to clinicians seeking information on specific medical data.

23 Acknowledgement This research is granted by the Japan Society for the Promotion of Science(JSPS) through the Funding Program for World-Leading Innovative R&D on Science and Technology(FIRST Program), initiated by the Council for Science and Technology Policy(CSTP).

24 おわり (FIN)

The deployment of OHMS TM. in private cloud

The deployment of OHMS TM. in private cloud Healthcare activities from anywhere anytime The deployment of OHMS TM in private cloud 1.0 Overview:.OHMS TM is software as a service (SaaS) platform that enables the multiple users to login from anywhere

More information

Generic EHR HL7 Interface Specification Abraxas v. 4

Generic EHR HL7 Interface Specification Abraxas v. 4 Generic EHR HL7 Interface Specification Abraxas v. 4 Merge Healthcare 900 Walnut Ridge Drive Hartland, WI 53029 USA 877.44.MERGE 2012 Merge Healthcare. The information contained herein is confidential

More information

Bi-Directional Interface between EMR and Quest Diagnostics Microsoft.NET with SQL Server Reporting Services solution for Healthcare Company

Bi-Directional Interface between EMR and Quest Diagnostics Microsoft.NET with SQL Server Reporting Services solution for Healthcare Company Bi-Directional Interface between EMR and Quest Diagnostics Microsoft.NET with SQL Server Reporting Services solution for Healthcare Company Executive Summary One of our EMR clients approached us to setup

More information

HL7 Interface Specification Merge LabAccess v. 3.6

HL7 Interface Specification Merge LabAccess v. 3.6 HL7 Interface Specification Merge LabAccess v. 3.6 Merge Healthcare 900 Walnut Ridge Drive Hartland, WI 53029 USA 877.44.MERGE 12 Merge Healthcare. The information contained herein is confidential and

More information

Masimo Patient Safetynet HL7 Interface Specifications

Masimo Patient Safetynet HL7 Interface Specifications TECHNICAL BULLETIN Masimo Patient Safetynet HL7 Interface Specifications Version 1.0 August 2013 CAUTION: Federal law restricts this device to sale by or on the order of a physician. For professional use.

More information

ELR 2.5.1 Clarification Document for EHR Technology Certification

ELR 2.5.1 Clarification Document for EHR Technology Certification ELR 2.5.1 Clarification Document for EHR Technology Certification Date: July 16, 2012 Co-Authored By: Centers for Disease Control and Prevention And Association of Public Health Laboratories Table of Contents

More information

JiveX Enterprise PACS Solutions. JiveX HL7 Gateway Conformance Statement - HL7. Version: 4.7.1 As of 2015-05-20

JiveX Enterprise PACS Solutions. JiveX HL7 Gateway Conformance Statement - HL7. Version: 4.7.1 As of 2015-05-20 JiveX Enterprise PACS Solutions JiveX HL7 Gateway Conformance Statement - HL7 Version: 4.7.1 As of 2015-05-20 VISUS Technology Transfer GmbH Universitätsstr. 136 D-44799 Bochum Germany Phone: +49 (0) 234

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Current Status of Databases in Japan

Current Status of Databases in Japan Current Status of Databases in Japan 2012.03 Kiyoshi Kubota, MD, PhD, FISPE Department of Pharmacodpiemiology, Graduate School of Medicine, University of Tokyo Kubotape-tky@umin.ac.jp NPO Drug Safety Research

More information

How To Create A Large Data Storage System

How To Create A Large Data Storage System UT DALLAS Erik Jonsson School of Engineering & Computer Science Secure Data Storage and Retrieval in the Cloud Agenda Motivating Example Current work in related areas Our approach Contributions of this

More information

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Virtual file system on NoSQL for processing high volumes of HL7 messages

Virtual file system on NoSQL for processing high volumes of HL7 messages Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under

More information

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Copyright 1982 2009 Swearingen Software, Inc. All rights reserved.

Copyright 1982 2009 Swearingen Software, Inc. All rights reserved. Swearingen Software HL7 Inbound & Outbound Specs 2009 Copyright 1982 2009 Swearingen Software, Inc. All rights reserved. Swearingen Software, Inc. 6950 Empire Central Drive Houston, TX 77040 Table of Contents

More information

Introduction to Information and Computer Science: Information Systems

Introduction to Information and Computer Science: Information Systems Introduction to Information and Computer Science: Information Systems Lecture 1 Audio Transcript Slide 1 Welcome to Introduction to Information and Computer Science: Information Systems. The component,

More information

HL7 Conformance Statement

HL7 Conformance Statement HL7 Conformance Statement Product Image-Arena 4.3 Product No.: T.08.0122 Effective Date: 2010-04-30 Benjamin Wagner Document 04 rev.: D32.0083-04 Image-Arena 4.3 HL7 conformance statement Table of contents

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains

More information

Eligible Hospital (EH) Onboarding Approach for the Meaningful Use (MU) Incentive Program

Eligible Hospital (EH) Onboarding Approach for the Meaningful Use (MU) Incentive Program Eligible Hospital (EH) Onboarding Approach for the Meaningful Use (MU) Incentive Program Promise Nkwocha, MSc. RHCE New York City Department of Health and Mental Hygiene INTRODUCTION New York City Department

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

HL7 Conformance Statement

HL7 Conformance Statement HL7 Conformance Statement Release VA20B (2014-03-28) ITH icoserve technology for healthcare GmbH Innrain 98, 6020 Innsbruck, Austria +43 512 89059-0 www.ith-icoserve.com Any printout or copy of this document

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Big Data Too Big To Ignore

Big Data Too Big To Ignore Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction

More information

Hitachi Open Middleware for Big Data Processing

Hitachi Open Middleware for Big Data Processing Hitachi Open Middleware for Big Data Processing 94 Hitachi Open Middleware for Big Data Processing Jun Yoshida Nobuo Kawamura Kazunori Tamura Kazuhiko Watanabe OVERVIEW: The quantity of being handled by

More information

Cognos Performance Troubleshooting

Cognos Performance Troubleshooting Cognos Performance Troubleshooting Presenters James Salmon Marketing Manager James.Salmon@budgetingsolutions.co.uk Andy Ellis Senior BI Consultant Andy.Ellis@budgetingsolutions.co.uk Want to ask a question?

More information

RESPONSES TO QUESTIONS AND REQUESTS FOR CLARIFICATION Updated 7/1/15 (Question 53 and 54)

RESPONSES TO QUESTIONS AND REQUESTS FOR CLARIFICATION Updated 7/1/15 (Question 53 and 54) RESPONSES TO QUESTIONS AND REQUESTS FOR CLARIFICATION Updated 7/1/15 (Question 53 and 54) COLORADO HOUSING AND FINANCE AUTHORITY 1981 BLAKE STREET DENVER, CO 80202 REQUEST FOR PROPOSAL Intranet Replacement

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Circumventing Picture Archiving and Communication Systems Server with Hadoop Framework in Health Care Services

Circumventing Picture Archiving and Communication Systems Server with Hadoop Framework in Health Care Services Journal of Social Sciences 6 (3): 310-314, 2010 ISSN 1549-3652 2010 Science Publications Circumventing Picture Archiving and Communication Systems Server with Hadoop Framework in Health Care Services 1

More information

An Easily Accessed Clinical Research Database from your Epic EMR

An Easily Accessed Clinical Research Database from your Epic EMR Loyola University Chicago Health Sciences Division Stritch School of Medicine (SSOM) An Easily Accessed Clinical Research Database from your Epic EMR February 13, 2014 Speakers: Richard H. Kennedy, Ph.D.

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012 HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012 WEBTECH EDUCATIONAL SERIES HITACHI DATA SYSTEMS HADOOP SOLUTION Customers are seeing exponential growth of unstructured data from their social media websites

More information

- 3 - Overview of benchmark evaluation About the joint R&D of the ultrafast database engine by IIS and Hitachi

- 3 - Overview of benchmark evaluation About the joint R&D of the ultrafast database engine by IIS and Hitachi - more - FOR IMMEDIATE RELEASE Hitachi's Database Product Based on Achievement of Collaborative Research by Institute of Industrial Science, the University of Tokyo and Hitachi Obtains the World's First

More information

SOLUTION BRIEF. IMAT Enhances Clinical Trial Cohort Identification. imatsolutions.com

SOLUTION BRIEF. IMAT Enhances Clinical Trial Cohort Identification. imatsolutions.com SOLUTION BRIEF IMAT Enhances Clinical Trial Cohort Identification imatsolutions.com Introduction Timely access to data is always a top priority for mature organizations. Identifying and acting on the information

More information

ImagePilot. HL7 Conformance Statement. Manufacturer: 1 Sakura-machi, Hino-shi Tokyo 191-8511, Japan

ImagePilot. HL7 Conformance Statement. Manufacturer: 1 Sakura-machi, Hino-shi Tokyo 191-8511, Japan ImagePilot HL7 Conformance Statement Manufacturer: 1 Sakura-machi, Hino-shi Tokyo 191-8511, Japan Revision History Date Version Description August 28, 2009 Rev. 1.0 April 1, 2010 Rev. 1.1 Values that

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Healthcare Data: Secondary Use through Interoperability

Healthcare Data: Secondary Use through Interoperability Healthcare Data: Secondary Use through Interoperability Floyd Eisenberg MD MPH July 18, 2007 NCVHS Agenda Policies, Enablers, Restrictions Date Re-Use Landscape Sources of Data for Quality Measurement,

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Electronic Medical Records Getting It Right and Going to Scale

Electronic Medical Records Getting It Right and Going to Scale Electronic Medical Records Getting It Right and Going to Scale W. Ed Hammond, Ph.D. Duke University Medical Center 02/03/2000 e-hammond, Duke 0 Driving Factors Patient Safety Quality Reduction in cost

More information

HealthLink Messaging Technology

HealthLink Messaging Technology HealthLink Messaging Technology Universally available, cost effective healthcare messaging The HealthLink Messaging System Universally available, cost effective healthcare messaging HealthLink is the leading

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

Overview. LATITUDE Patient Management. EMR Integration Testing Scenarios - 357742-003

Overview. LATITUDE Patient Management. EMR Integration Testing Scenarios - 357742-003 Contents Overview... 1 Testing Scenarios Overview... 2 Live Production Systems Testing... 3 Preparation...3 Process...4 Post Test Cleanup...4 Expected Results...5 Troubleshooting Tips...5 Insert Message

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

The Regional Medical Business Process Optimization Based on Cloud Computing Medical Resources Sharing Environment

The Regional Medical Business Process Optimization Based on Cloud Computing Medical Resources Sharing Environment BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0034 The Regional Medical

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Managing and Conducting Biomedical Research on the Cloud Prasad Patil Managing and Conducting Biomedical Research on the Cloud Prasad Patil Laboratory for Personalized Medicine Center for Biomedical Informatics Harvard Medical School SaaS & PaaS gmail google docs app engine

More information

Lessons Learned: Cloud Computing and Cost Savings

Lessons Learned: Cloud Computing and Cost Savings Lessons Learned: Cloud Computing and Cost Savings Natalie Hatchette Julie Hines John Leahy Daniel Rude 15 July 2011 This document is proprietary. Agenda Overview of Cloud Computing for Government Agencies

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

HL7 Interface Specification. HL7 Interface 1.2

HL7 Interface Specification. HL7 Interface 1.2 Interface Specification Interface 1.2 May 2004 Interface 1.2 Specification TABLE OF CONTENTS 1 INTRODUCTION... 3 1.1 Purpose...3 1.2 Related Documents...3 2 IMPLEMENTATION... 4 3 COMMUNICATION PROFILE...

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Keyword: YARN, HDFS, RAM

Keyword: YARN, HDFS, RAM Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Big Data and

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Introduction to Health Insurance

Introduction to Health Insurance Chapter 2 PART 2 of 2 Introduction to Health Insurance Copyright 2013 Delmar, Cengage Learning. ALL RIGHTS RESERVED. 1 Healthcare Documentation Documentation is the systematic, logical, and consistent

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

Goals. Accelerating adoption & exchange of EHRs project. Evaluation Indicators 2010 2011 2012 EMR adoption (Hospitals) 20% (100 hospitals)

Goals. Accelerating adoption & exchange of EHRs project. Evaluation Indicators 2010 2011 2012 EMR adoption (Hospitals) 20% (100 hospitals) 12 th International HL7 Interoperability Conference Development of an interoperability infrastructure for exchange of electronic health records among hospitals in Taiwan Chien-Tsai Liu Professor, Graduate

More information

Data and Document Migrations

Data and Document Migrations UroChart EHR Data and Document Migrations Introduction This document is organized as follows: Introduction Migration Considerations and Concepts Migration Process Overview This document will serve as a

More information

ELR 2.5.1 Clarification Document for EHR Technology Certification V1.1

ELR 2.5.1 Clarification Document for EHR Technology Certification V1.1 ELR 2.5.1 Clarification Document for EHR Technology Certification V1.1 Date: October 16, 2012 Co-Authored By: Centers for Disease Control and Prevention And Association of Public Health Laboratories Table

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

HL7 Format and Electronic Sharing

HL7 Format and Electronic Sharing HL7 Format and Electronic Sharing Mark Madrilejo (mark.madrilejo@network180.org) Application Engineer, network180, Member HIE Standards Committee and Working Subcommittees for CCD and Consent Management

More information

Hadoop on Windows Azure: Hive vs. JavaScript for Processing Big Data

Hadoop on Windows Azure: Hive vs. JavaScript for Processing Big Data Hive vs. JavaScript for Processing Big Data For some time Microsoft didn t offer a solution for processing big data in cloud environments. SQL Server is good for storage, but its ability to analyze terabytes

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Workflow Solutions Data Collection, Data Review and Data Management

Workflow Solutions Data Collection, Data Review and Data Management Data Collection, Data Review and Data Management Workflow Finding more efficient ways to support patient needs begins with better workflow management. MGC Diagnostics has developed a complete workflow

More information

Optimizing the Hybrid Cloud

Optimizing the Hybrid Cloud Judith Hurwitz President and CEO Marcia Kaufman COO and Principal Analyst Sponsored by IBM Introduction Hybrid cloud is fast becoming a reality for enterprises that want speed, predictability and flexibility

More information

Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services

Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services Orange County Convention Center Orlando, Florida June 3-5, 2014 Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services Kevin Davis, Senior Data Warehouse Engineer, Adobe Hemant Puranik,

More information

A Study of Data Management Technology for Handling Big Data

A Study of Data Management Technology for Handling Big Data Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

More information

Matchmaking in the Cloud: Amazon EC2 and Apache Hadoop at eharmony

Matchmaking in the Cloud: Amazon EC2 and Apache Hadoop at eharmony Matchmaking in the Cloud: Amazon EC2 and Apache Hadoop at eharmony Speaker logo centered below image Steve Kuo, Software Architect Joshua Tuberville, Software Architect Goal > Leverage EC2 and Hadoop to

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

PostgreSQL Performance Characteristics on Joyent and Amazon EC2 OVERVIEW In today's big data world, high performance databases are not only required but are a major part of any critical business function. With the advent of mobile devices, users are consuming data

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

K1000: Advanced Topics

K1000: Advanced Topics K1000: Advanced Topics Tyler Gingrich Senior Engineering Manager, K1000 Craig Thatcher, Software Engineer, K1000 Topics Konductor Scripting Managed Installs Munin 2 1/23/13 Konductor Background process

More information

EMR Adoption Survey. Instructions. This survey contains a series of multiple-choice questions corresponding to the 5-stage EMR Adoption Model.

EMR Adoption Survey. Instructions. This survey contains a series of multiple-choice questions corresponding to the 5-stage EMR Adoption Model. EMR Adoption Survey Instructions This survey contains a series of multiple-choice questions corresponding to the -stage EMR Adoption Model. If the respondent is a physician, ask all questions. If the respondent

More information

The Electronic Health Record as a Clinical Study Information Hub

The Electronic Health Record as a Clinical Study Information Hub The Electronic Health Record as a Clinical Study Information Hub Naoto Kume EHR Research Unit, Department of Social Informatics, Graduate School of Informatics, Kyoto University kume@kuhp.kyoto-u.ac.jp

More information

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University

More information

Optimization of Distributed Crawler under Hadoop

Optimization of Distributed Crawler under Hadoop MATEC Web of Conferences 22, 0202 9 ( 2015) DOI: 10.1051/ matecconf/ 2015220202 9 C Owned by the authors, published by EDP Sciences, 2015 Optimization of Distributed Crawler under Hadoop Xiaochen Zhang*

More information

Product Overview. Payroll & Personnel Data Warehouse

Product Overview. Payroll & Personnel Data Warehouse Payroll & Personnel Data Warehouse Grapevine Solutions 2010 HAPPI HAPPI (History & Archiving for Payroll & Personnel Information) was developed by Grapevine Solutions to meet the needs of organisations

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information