De-Identified Personal Health Care System Using Hadoop
|
|
|
- Margery Melton
- 9 years ago
- Views:
Transcription
1 International Journal of Electrical and Computer Engineering (IJECE) Vol. 5, No. 6, December 2015, pp. 1492~1499 ISSN: De-Identified Personal Health Care System Using Hadoop Dasari Madhavi, B.V.Ramana Department of Information Technology, AITAM, Tekkali, A.P. Article Info Article history: Received Jun 12, 2015 Revised Aug 18, 2015 Accepted Sep 4, 2015 Keyword: Base64 Big Data Hadoop Health care records Map Reduce ABSTRACT Hadoop technology plays a vital role in improving the quality of healthcare by delivering right information to right people at right time and reduces its cost and time. Most properly health care functions like admission, discharge, and transfer patient data maintained in Computer based Patient Records (CPR), Personal Health Information (PHI), and Electronic Health Records (EHR). The use of medical Big Data is increasingly popular in health care services and clinical research. The biggest challenges in health care centers are the huge amount of data flows into the systems daily. Crunching this Big Data and de-identifying it in a traditional data mining tools had problems. Therefore to provide solution to the de-identifying personal health information, Map Reduce application uses jar files which contain a combination of MR code and PIG queries. This application also uses advanced mechanism of using UDF (User Data File) which is used to protect the health care dataset. De-identified personal health care system is using Map Reduce, Pig Queries which are needed to be executed on the health care dataset. The application input dataset that contains the information of patients and de-identifies their personal health care. De-identification using Hadoop is also suitable for social and demographic data. Copyright 2015 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Dasari Madhavi, Department of Information Technology, AITAM, Tekkali, A.P. [email protected] 1. INTRODUCTION Big Data is a combination of any type of large and complex datasets that it becomes difficult to process using on existing data management tools or traditional data processing applications. Big Data is about real-time analysis and data driven decision-making process. Big Data is playing crucial role in Health care and several health institutes help to analyzing the large volume of information. Today huge amount of patient data is generated in health care organizations so we can provide the patient care quality and program analysis. By using traditional systems we can t normalize that data because increasing the digitization of health care data means that organizations often add terabytes worth of patient records to data centers annually Hadoop Innovation in Health Care Intelligence Many organizations are discovered that their existing data mining and analysis techniques simply not up to yet the task of handling Big Data. One possible to this problem is to build Hadoop cluster. Hadoop is open-source distributed data storage and analysis frame work that access large volume of datasets that may be structured, unstructured and semi structured. Health care data tends to reside in multiple places like EMRs or EHRs, radiology, pharmacy etc. Journal homepage:
2 IJECE ISSN: Aggregating the data which comes from all over the organization into central system such as an Enterprise Data Warehouse (EDW) and make this data accessible and actionable. Hadoop tools in health care industry provide the secure results for analyzing the large volume of patient data at the same time it can give the reliability of clinical outcomes. A successful outcome is a renewal of a prescription in the expected time period. Hadoop can store renewal information and tie it to social media content and online reminders. Hadoop technology can play major role in health care industry, this technology very useful to the public sector; it can improve the patient safety and security Literature Survey The increasing digitization of healthcare information is analyzing using new techniques for improve the quality of care, health care results, and minimize the costs. Organizations must analyze internal and external patient information to more accurately measure risk and outcomes. At the same time, many clients are working to increase data transparency to produce new insight knowledge. Praveen Kumar et al [1], in their work proposed that Hadoop is based on Map Reduction is a powerful tool manages the huge amount of data. With this echo system can use fault tolerant techniques. Emad A Mohammed et al [2], in their work big clinical data analytics would emphasize modelling of whole interacting processes in clinical settings and clinical datasets can be evolution of ultra-large-scale datasets. Arantxa Duque Barrachina et al [3] proposed that using Hadoop techniques large datasets can be used to identification of large dataset. K. Divya et al [4], for protecting the data used a progressive encryption scheme. Hong song Chen [6], in their research article a novel Hadoop-based biosensor Sunspot wireless network architecture, ECC digital signature algorithm, Mysql database and Hadoop HDFS cloud storage; security administrator can use it to protect and manage key data. Lidong Wang et al [7], in their work based on SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis, Radio Frequency Identification Technology (RFID). 2. RESEARCH METHOD For de-identifying dataset following Platforms and tools are used for Big Data analytics in health care. This work follows the procedure as: 1) Data collection 2) Hadoop Cluster and Map reduce 3) Experiments 2.1. Data Collection In this paper Big Data minimum size consider as a peta byte. Big Data is available in so many sectors like Web and social networking, Machine to machine, Enormous exchange, Biometric sensor data, Human-created data, Gaming industry, Agriculture and Education departments Hadoop Cluster and Map Reduce Hadoop is a software framework for allows processing of large datasets across the large clusters of computer. Hadoop Distributed File System is a java based distributed file system that can collect all kinds of data without prior organization. Map Reduce is a software programming model for processing large set of data in parallel. Hadoop cluster is interconnected between the HDFS and Map Reduce. So we can implement the program for Hadoop cluster Experiment Hadoop is an open source framework which is written in java by Apache software foundation and is used to write software application which requires to process huge amount of data. It works in parallel on large clusters which would have thousands of computer nodes on the clusters. It also processes the data very reliable manner and fault-tolerant manner. Hadoop can be installed in cloud era operating system and after completion of Hadoop installation automatically HDFS process will be started with the daemons. HDFS is having two main layers Master node, Data node. Master Node or Name Node is the master of the system maintained and managed by the Data Node. Master Node can split data into slave node. Data Node or Secondry Name node is provide the actual storage is having the responsibility to read and write for the clients. Map Reduce is an algorithm or concept to process huge amount of data in a quicker way. As per its name it can divide into Mapper and Reducer. De-Identified Personal Health Care System Using Hadoop (Dasari Madhavi)
3 1494 ISSN: Figure 1. Hadoop application and infrastructure interactions 2.4. Typical Forms of Knowledge De-identification The HIPAA rule provides protects most individually identifiable health information. There are a couple of common knowledge de-identification tactics that may be deployed to enhance protection in the Hadoop atmosphere, comparable to storage-level encryption and data protecting Storage Degree Encryption Storage-degree encryption the whole quantity that the info set is saved in is encrypted on the disk volume stage while at relaxation on the info store, which protects towards unauthorized personnel who can have bodily obtained the disk, from being equipped to read something from it. This is a priceless control in a Hadoop cluster or any massive information store due to common place disk repairs and swap-outs. However this doesn't safeguard the information from access when the disk is running within the system. Decryption technique is applied automatically when the information is read via the running process, and live, inclined data is totally exposed to any user or system gaining access to the method Knowledge Protecting Knowledge overlaying is a priceless manner for obfuscating touchy knowledge, more commonly used for production of test and development knowledge from reside construction know-how. Nonetheless, masked information is meant to be irreversible, which limits its price for a lot of analytic functions and publish-processing necessities. Furthermore there's no warranty that the distinct covering transformation chosen for a detailed sensitive knowledge subject absolutely obfuscates it from identification, mainly when correlated with different knowledge within the Hadoop data lake [7] and certain protecting strategies mayor is probably not approved through auditors and assessors, affecting whether or not they real meet regulatory compliance necessities and provide risk less harbor within the event of a data breach Cryptography Base 64 algorithm Cryptography Base 64 is a technique designed to represent an arbitrary sequence of octets (8 bit) in an exceedingly printable text type that enables passing binary information through channels that square measure designed for flat American Standard Code for Information Interchange text like SMTP (Postal, 1982). It additionally permits embedding of binary information in media supporting American Standard Code for Information Interchange text only like XML files. Base64 content Transfer encryption cryptography coding secret writing or Base64 secret writing is outlined in RFC Results and Discussion 3.1. Preliminary Data Preparation This work can be involved dummy patient health patient dataset collected in the HSCIC (Health & Social Care Information Centre) contains fields of patient name, patient id, date of birth, id, gender, disease, and disease id. That data can be maintained in the format of CSV (Comma Separated Value) file. IJECE Vol. 5, No. 6, December 2015 :
4 IJECE ISSN: Figure 2. Data Preparation Procedure 3.2. Preliminary Data Analysis The dataset is in CSV (comma Separated Value) format. The results in this project consisting by using Base 64 encoded algorithm encrypt the plain text into encrypted data. Here using Hadoop single node system then am setting class path for Hadoop jar files. Figure 3. Health care_sample_dataset1 (plain text) Export CLASSPATH=.${HADOOP_HOME}/Hadoop-core cdh3u6.jar :${ CLASSPATH} Export CLASSPATH=.${HADOOP_HOME}/commons-codec-1.4.jar :${ CLASSPATH} After the run my java code for mapper, reducer & combiner. Javac DeIdentifyData.java Figure 4. Result of added manifest De-Identified Personal Health Care System Using Hadoop (Dasari Madhavi)
5 1496 ISSN: Hadoop Cluster Result After that placing dataset into Hadoop distributed file system, and run the Hadoop job and finally got the output file. Hadoop fs -put health care_sample_dataset1.csv /health/health care_sample_dataset1.csv Hadoop fs -put health care_sample_dataset1.csv /health/health care_sample_dataset1.csv Hadoop jar /home/cloudera/deidentifydata1.jar DeIdentifyData /health deintout124 Internally the Mapper and Reducer process will be started. Figure 5.1. Result of Mapper &Reducer initial stages Figure 5.2. Result of status of the Mapper and Reducer IJECE Vol. 5, No. 6, December 2015 : Figure 5.3. Result of Mapper &Reducer Final stage
6 IJECE ISSN: Then we can check the name node status and job tracker in the browser. Figure 5.4. Result of the output job file details Figure 6. Result of Hadoop Counters Figure 7. Result of Mapper and Reducer completion graph De-Identified Personal Health Care System Using Hadoop (Dasari Madhavi)
7 1498 ISSN: Figure 8. Result of final decrypted output (Encrypted Text) 4. CONCLUSION Big Data analytics can possibly change the way medicinal services supplier s utilization complex advancements to pick up understanding from their clinical and other information vaults and settle on educated choices. Later on we'll see the fast, across the board execution and utilization of big information examination over the social insurance association and the human services industry. To that end, the few difficulties highlighted above, must be tended to. As large information investigation gets to be more standard, issues ensuring security, defending security, building benchmarks and administration, and without a break enhancing the instruments and innovations will collect consideration. Big Data applications in medicinal services are at an early phase of advancement, yet fast advances in stages and devices can quicken their developing procedure. REFERENCES [1] Praveen Kumar, et al., Efficient Capabilities of Processing of Big Data using Hadoop Map Reduce, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 3, No. 6, [2] Emad A. Mohammed, et al., Applications of the Map Reduce programming Frame work to clinical Big Data analysis: current landscape and future trends, Big Data Mining, [3] Arantxa Duque Barrachina and Aisling O Driscoll, A Big Data methodology for categorising technical support requests using Hadoop and Mahout, Journal of Big Data, [4] K. Divya, N. Sadhasivam, Secure Data Sharing in Cloud Environment Using Multi Authority Attribute Based Encryption, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2, No. 1, [5] Sabia and Sheetal Kalra, Applications of Big Data: Current Status and Future Scope, International Journal of Computer Applications, Vol. 3, No. 5, pp , [6] Hongsong Chen and Zhongchuan Fu, Hadoop-Based Healthcare Information System Design and Wireless Security Communication Implementation, Hindawi Publishing Corporation Mobile Information Systems, [7] Lidong Wang, Cheryl Ann Alexander, Medical Applications and Healthcare Based on Cloud Computing International Journal of Cloud Computing and Services Science (IJ-CLOSER), Vol. 2, No. 4, pp , [8] Priyanka K, Prof. Nagarathna Kulennavar, A Survey on Big Data Analytics in Health Care, International Journal of Computer Science and Information Technologies, Vol. 5, No. 4, pp , [9] Aditi Bansal, Ankita Deshpande, Priyanka Ghare, Seema Dhikale, and Balaji Bodkhe, Healthcare Data Analysis using Dynamic Slot Allocation in Hadoop, International Journal of Recent Technology and Engineering (IJRTE), Vol. 3, No. 5, pp , [10] A Technical Review on, Protecting Big Data Protection Solutions for the Business Data Lake, White paper, [11] D. Peter Augustine, Leveraging Big Data Analytics and Hadoop in Developing India s Health care Services International Journal of Computer Applications, Vol. 89, No. 16, [12] Muni Kumar N, Manjula R., et al., Role of Big Data Analytics in Rural Health Care -A Step Towards Svasth Bharath, International Journal of Computer Science and Information Technologies, Vol. 5, No. 6, pp , IJECE Vol. 5, No. 6, December 2015 :
8 IJECE ISSN: BIOGRAPHIES OF AUTHORS D.Madhavi has received B.Tech degree in Information Technology from JNTU, Kakinada. She is currently an M.Tech, Information Technology student in an autonomous institute Aditya Institute of Technology and Management, Tekkali, India. Affiliated to JNTU, Kakinada. Her areas of interests Data Mining and Network Security. Dr.B.V.Ramana obtained his doctor s degree from Andhra University, India. He is currently working as Professor and Head of the department of Information Technology, Aditya Institute of Technology and Management, India. He has published 18 papers in international Journals and conferences. De-Identified Personal Health Care System Using Hadoop (Dasari Madhavi)
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved
Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.
An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute
A Database Hadoop Hybrid Approach of Big Data
A Database Hadoop Hybrid Approach of Big Data Rupali Y. Behare #1, Prof. S.S.Dandge #2 M.E. (Student), Department of CSE, Department, PRMIT&R, Badnera, SGB Amravati University, India 1. Assistant Professor,
ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
Improving Data Processing Speed in Big Data Analytics Using. HDFS Method
Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
Big Data on Cloud Computing- Security Issues
Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now
Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science
A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
A Study of Data Management Technology for Handling Big Data
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,
Reduction of Data at Namenode in HDFS using harballing Technique
Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu [email protected] [email protected] Abstract HDFS stands for the Hadoop Distributed File System.
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
Big Data: Study in Structured and Unstructured Data
Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 [email protected], [email protected] Abstract With the overlay of digital world, Information is available
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Fast, Low-Overhead Encryption for Apache Hadoop*
Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Detection of Distributed Denial of Service Attack with Hadoop on Live Network
Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING Jaseena K.U. 1 and Julie M. David 2 1,2 Department of Computer Applications, M.E.S College, Marampally, Aluva, Cochin, India 1 [email protected],
Big Data? Definition # 1: Big Data Definition Forrester Research
Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Big Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
USING BIG DATA FOR INTELLIGENT BUSINESSES
HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 2015 Brasov, 28-30 May 2015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC USING BIG DATA FOR INTELLIGENT
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
WHITE PAPER ON. Operational Analytics. HTC Global Services Inc. Do not copy or distribute. www.htcinc.com
WHITE PAPER ON Operational Analytics www.htcinc.com Contents Introduction... 2 Industry 4.0 Standard... 3 Data Streams... 3 Big Data Age... 4 Analytics... 5 Operational Analytics... 6 IT Operations Analytics...
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
IEHRC - An Approach towards Interoperable E-Health Records over Cloud using ECHISTAR
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
Big Data Analytics in Health Care
Big Data Analytics in Health Care S. G. Nandhini 1, V. Lavanya 2, K.Vasantha Kokilam 3 1 13mss032, 2 13mss025, III. M.Sc (software systems), SRI KRISHNA ARTS AND SCIENCE COLLEGE, 3 Assistant Professor,
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
Real Time Processing of Web Pages for Consumer Analytics
Real Time Processing of Web Pages for Consumer Analytics Authored by: Team : SYNERGY Fnu Lavanya Ramani (007529223) Shweta Tiwari (007496190) Sravanti Akella (007522801) Prof. Rakesh Ranjan Synergy Team
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
How To Use Big Data Effectively
Why is BIG Data Important? March 2012 1 Why is BIG Data Important? A Navint Partners White Paper May 2012 Why is BIG Data Important? March 2012 2 What is Big Data? Big data is a term that refers to data
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Are You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
An Hadoop-based Platform for Massive Medical Data Storage
5 10 15 An Hadoop-based Platform for Massive Medical Data Storage WANG Heng * (School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876) Abstract:
IMPROVED SECURITY MEASURES FOR DATA IN KEY EXCHANGES IN CLOUD ENVIRONMENT
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 IMPROVED SECURITY MEASURES FOR DATA IN KEY EXCHANGES IN CLOUD ENVIRONMENT Merlin Shirly T 1, Margret Johnson 2 1 PG
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
Design of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
A Study on Security and Privacy in Big Data Processing
A Study on Security and Privacy in Big Data Processing C.Yosepu P Srinivasulu Bathala Subbarayudu Assistant Professor, Dept of CSE, St.Martin's Engineering College, Hyderabad, India Assistant Professor,
Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
ISSN:2320-0790. Keywords: HDFS, Replication, Map-Reduce I Introduction:
ISSN:2320-0790 Dynamic Data Replication for HPC Analytics Applications in Hadoop Ragupathi T 1, Sujaudeen N 2 1 PG Scholar, Department of CSE, SSN College of Engineering, Chennai, India 2 Assistant Professor,
Boarding to Big data
Database Systems Journal vol. VI, no. 4/2015 11 Boarding to Big data Oana Claudia BRATOSIN University of Economic Studies, Bucharest, Romania [email protected] Today Big data is an emerging topic,
Enhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment
www.wipro.com Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment Pon Prabakaran Shanmugam, Principal Consultant, Wipro Analytics practice Table of Contents 03...Abstract
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Regression model approach to predict missing values in the Excel sheet databases
Regression model approach to predict missing values in the Excel sheet databases Filling of your missing data is in your hand Z. Mahesh Kumar School of Computer Science & Engineering VIT University Vellore,
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps
White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,
