Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution

Size: px
Start display at page:

Download "Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution"

Transcription

1 , pp Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik Yu Convergence of IT Devices Institute Busan(CIDI), Dong-Eui University agicap@deu.ac.kr, ysyu@deu.ac.kr Abstract Individual data might not be thought of as that important for business purposes. However, Big Data analytics use cases are increasing, because individual data can become a valuable data aggregate from which any hidden information can be found, once it s collected into large volumes. Big Data. Known as one of conventional Big Data analytics technologies, Hadoop is a widely accepted technology to analyze structured/ unstructured Big Data to date. However, Hadoop has a high possibility for response time latency with larger data because of batch processing systems, which makes it difficult to do real time analysis for massive amounts of high speed event data under the current business environment and market conditions. In this paper, open source CEP (Complex Event Processing)-based technologies are used as an alternative for rapidly changing business, thereby developing the real time analytics system that enables us to analyze over thousands of event streams per second on a real time basis without latency, in order to be applicable to medical institution ERP systems. Keywords: Real-time analysis system, Big data, CEP 1. Introduction In the Smart era, social network, IoT and life log data are important elements to enter the Big Data era. Smart devices produce lots of data which is collected in distributed files and then processed into important information. Big Data is intelligence service-based technology, processing massive amounts of unstructured and understanding information that was not understandable or analyzable. This technology will raise the bar of service quality for users and change experts role and value in the near future [1]. Social media and Internet companies including Google, Facebook, Amazon and Yahoo are attempting to operate public services such as transport systems, marine resource systems, security systems and medical systems by using Big Data, proving the effect of Big Data-based social analyze, while those in private sector doing so for improved corporate managerial conditions and marketing efficiency in a hurry to develop Big Data-based new business models as well as increased profitability and efficient process. Individual data might be thought of as not that important for business purposes. However, Big Data analytics use cases are increasing, because individual data can become a valuable data aggregate from which any hidden information can be found, once collected into large volumes, Big Data. Known as one of conventional Big Data analytics technologies, Hadoop is a widely accepted technology to analyze structured/ unstructured Big Data to date. However, Hadoop has high possibilities for response time latency with larger data because of a batch processing system, which makes it difficult to do real time analysis for massive amounts of high speed event data under current business environment and market conditions. In addition, there are increasing needs for massive amount Big Data on a various corporate entity levels. In reality, however, access to Big Data use has been limited and not easy due to the high cost analytics system. ISSN: IJSEIA Copyright c 2015 SERSC

2 In this paper, open source CEP (Complex Event Processing)-based technologies are used as an alternative for rapidly changing business, thereby developing the real time analytics system that enables us to analyze thousands of event streams per second on a real time basis without latency, in order to be applicable to medical institution ERP systems. In this paper, the system allowing for efficient patient management and administrative management at medical institution was applicable and implemented to ERP systems of small-medium size hospitals, to analyze patient-oriented information and equipment data. 2. Related Works 2.1. Big Data Big Data refers to a technology for extracting value and analyzing results from the massive amount of structured and unstructured data set beyond the capacity of collecting, saving, managing and analyzing with any conventional database management tools [2]. In large, such Big Data s own attributes are called 3V: Volume, Velocity, and Variety. More recently, Value or Complexity can also be added. From the aspects of using Big Data, one of the most important attributes is Low Cost, because Big Data was first born in hope of saving and processing data at lower cost under the system different to the conventional one [3]. Big Data technologies are largely divided into data collection technology; saving technology; processing technology; analytics technology; expression and using technology; management (Infra, Biz) technology. Looking into technologies and issues required by each technical field, it is noted that the massive data loading time accounts for the entire time in the collection technologies with continuous increasing data, while it takes high cost in data saving and management in the saving technologies. In addition, there are discussions over insufficient timeliness due to processing for long time, as well as high cost for processing and operations in processing/ analytics technologies as well Hadoop Hadoop is a solution with its key focused on distributed processing technology, which is currently most favorable for Big Data processing. It is Java-based framework with Apache open source process to process massive data, using a relatively simple program model [4]. Hadoop is used as a core technology by Yahoo and Facebook, while being applied to many other companies own solution. Hadoop is composed of the distributed file system called HDFS (Hadoop Distributed File System) and the distributed processing system called MapReduce. HDFS is open source made based on the GFS (Google File System) model. Therefore, HDFS shared the same characteristics with the GFS. HDFS divides massive amounts of file into chunk (64MB) units, thereby distributing three into and saving to each data node. Meta data on where chunk is in the data node will be saved onto Namenode. The operation method is HDFS, but MapReduce is composed of the masters called Namenode and Datanode as well as multiple slaves in its structure. As shown in Fig. 1, Namenode manages the Metadata of file system, controlling data I/O between Client and Datanode. Datanode saves actual data, while directly performing the data I/O with Client and the block copying. 94 Copyright c 2015 SERSC

3 Figure 1. HDFS Structure MapReduce framework is the program model for distributed/parallel processing where previously time-consuming data can be operated in batch processing in short time. It supports the customizing for Key-Value from massive data, and performs high speed data processing based on Binary exploration and applicable Hash algorithm. With such characteristics, Hadoop is used to the fields with respect to saving and analyzing massive multimedia data, as well as log data [5] CEP(Complex Event Processing) CEP is complex event processing technology to extract meaningful data in real time basis from events from various event sources, thereby performing the corresponding actions[6]. Event data herein refers to stream data, which are data of continuous massive inputs, with important time sequences and endless data. It is impossible to process and analyze such stream data in real time basis into a conventional Relational Database. CEP is an event data processing solution that can provide real time analysis of such stream data. That is, it is possible to do real time processing of hundreds/millions of various high speed event stream based on In-Memory without saving it to database, file or Hadoop. The CEP engine is used for CEP processing. When you designate events generated by various systems and then register event patterns to be extracted, the CEP engine performs filtering, aggregation, gathering and joining of various event streams, and then provides the function to sensor the generated event patterns as you want through pattern matching. Basically, CEP-based analysis methods are applicable in order of Visibility Understanding Get Insight, and the visualized real time internal data let you understand connectivity and pattern between such events, based on which intuition can be acquired as necessary for the corporate environment, thereby responding to the business environment in real time basis [7]. Copyright c 2015 SERSC 95

4 Figure 2. CEP Architecture 2.4. Hadoop and CEP for Big Data Approach The Big Data-based methodologies have been mainly the Hadoop-oriented approaches from the prospects of saving. More recently, however, the approach to Big Data becomes more important. Indeed, the Hadoop ecological system-based perspectives of batch analysis is important, but the approach from the CEP-based perspectives of real time distributed is growing increasingly more important as in the Gartner report. With respect to the conventional DB, Hadoop-based batch processing and various high speed event streams, the differences in analytics mechanisms with the In-Memory-based real-time processing are shown in the Figure 3 [8]. In this study, we adopted the CEP technology using the In-Memory-based analytics processing mechanisms where data saved after analysis. Figure 3. Analysis Processing Mechanism of Hadoop and CEP 2.5. Definitions and Characteristics of NoSQL Unlikely conventional Relational Database (RDBMS), NoSQL refers to database differently designed [9]. The characteristics of NoSQL include the use of a model that is consistent that is less limiting than the conventional RDBMS, and far much more data can be saved, which can be used more effectively for handling big data. Other characteristics of NoSQL include horizontal scalability in a new type of database to overcome the limitations of the conventional Relational Database. From the aspects of CAP theory on distributed system, NoSQL adopts the partitioning-included AP or CP for distributed processing purpose, while the Relational Database adopted CA. NoSQL is focused on availability and instant response, as distributed processing database to provide massive amounts of data to users in a cloud environment. Therefore, 96 Copyright c 2015 SERSC

5 compared with NoSQL is much more efficient in the Connection Pool management and Fault-Tolerant management, than the conventional DBMS. Many enterprises use NoSQL in various types to provide users with cloud environment, including Google s Bigtable, Amazon s Dynamo and MongoDB. In particular, Cassandra [10] is the distributed-type database made in combination of characteristics of Google s Bigtable with those of Amazon s Dynamo, where each server is composed of ring-connected clusters, supporting defect permission and higher availability. Figure 4 shows the results of NoSQL product and consumer preferences survey, showing that Cassandra is the most popular due to its superior scalability, which is followed by CouchDB, MongoDB and HBase. Figure 4. Results of NoSQL Product and Consumer Preferences Survey NoSQL products are actively advanced. Cassandra is applied in this study based on the following figure, because applicable fields being different depending on each product. Table 1. Applicable Field of NoSQL Product Product CouchDB Redis MongoDB Membase Cassandra HBase Applicable field Erlang/Apache, Applied to master information which is not frequently accumulated/ changed. C/C++/BSD, Frequently changing information management Stock information/analysis/real-time data collection/ Real-time communications, etc. C++/AGPL, Massive DB/ Frequently changing information management, etc. When intending to replace the conventional RDBMS Erlang&C/Apache2.0, Fields where shorter delay time and higher simultaneity required/ First saved onto memory and disk used as the 2 nd repository./ Online gaming, etc. Java/Apache, Fields where there are more inputs than reading/ Information processing in banking and financial institutions/ Real-time information analytics fields Java/Apache, Real-time massive data processing fields Cassandra server is divided into multiple data centers that are networked. A node means one Cassandra server process in operation in one server, where one Cassandra logical storage is composed of such connected nodes. The structure of Cassandra logical storage is shown in Figure 5. In fact, Cassandra is distributed into multiple machines and operated. However, it is designed to look like, to an enduser, a single-instance with a cluster as outer structure. Cassandra aligns data to cluster and assigns such data to each node of cluster. Nodes in cluster can be added and deleted. When a new node is added, it requires for seed node that notifies cluster information. When any new node is added to the Cassandra cluster, seed node teaches how nodes-connected ring structured, to the new node. Either one or multiple number of seed node(s) can be designated in cluster. Copyright c 2015 SERSC 97

6 Figure 5. Cassandra Logical Storage Structure 3. System Compositions and Designs CEP technology is currently emerging across business fields, as the simplest and strongest method to implement the real time business intelligence based on timely analysis, providing new values including real time monitoring, with an early alarm and production field management by processing and analyzing various events. In this study, we aimed at providing a systematic and organized business environment for efficient patient management and administrative management at hospitals, by using CEP-based advantages in consideration of the low cost of Big Data and then establishing the Big Data-applicable real time analytics system in combination with medical institution ERP systems with yet a insufficient number of cases. In addition, the main Adaptor and data publisher/customizing functions were implemented, to allow to identify and develop UI screens - depending on the needs of each hospital. Real time analytics system architecture is composed as shown in Fig. 6. Incoming and outgoing real time data in varieties from/to analytics through Event Adaptor are converted into internal/external event types such as data protocol and type. While mapping with streams used for real time analysis, the Event Collector also saves the incoming system from each even onto NoSQL, which allows for time-series analysis as well. For NoSQL DB, the batch event processing was done through Cassandra. In the Big Data Analyzer, data collected from the Event Collector are analyzed using the open source-based real time analysis engine CEP and the Hive-based batch layer, before it performs the mapping function in Reporting-enabled form. The Event Generator performs the functions to convert the real time analyzed results into types, and protocols as user wants, as well as to save the analyzed results to database in real time basis. The Reporter provides the Webbased analysis functions that allows for a user to have visualized approach to the CEPbased real time analytics system, including Dashboard, Alarm, Analysis scheduling for real time analyzed results. 98 Copyright c 2015 SERSC

7 Figure 6. CEP-based Real Time Analytics System Architecture 4. System Implementation The processing of sequence for incoming Event Source on a real time basis during the system s implementation will be transferred to the Event Source through the Data Adapter as shown in Figure 7 if the Event Source was extracted by Legacy/ Batch Layer. Then the Real time processing layer will process such delivered Event Source and processed analytics results will be saved to the data repository or monitoring will be enabled if necessary, through external system API call. Figure 7. Processing of Sequence for Incoming Event Source To be brief on the processing of the entire system, Legacy saves log details generated in its own process as in Fig. 8, while performing real time analysis processing as far as major tasks among such processes. Batch layer collects data through data aggregation and extracts data through analysis works. Depending on such extracted results, there may be either storage to middle repository or request for real time analysis. Real time processing layer processes the requests from Legacy and from batch layers, requesting for data as necessary for processing and saving the results to the middle repository. For such processed results, there will be monitoring system API calls. Dashboard visualizes the data depending on the user s request, and then saved the Dashboard-processed data to the middle repository. Copyright c 2015 SERSC 99

8 Figure 8. Processing of the Entire System Figure 9 is the screenshot that shows the Web-based real time analysis provided by Reporter on selected items, when you set the Output Adaptor on items that are necessary for analysis among event stream data (please enter randomized diagnosis data) on the UI development screen of a hospital. Figure 9. UI Development Screen and Web-based Real Time Analysis Screen Figure 10 is the screenshot of monitoring analyzed of all data on Output Adaptor settings on Dashboard. Through filtering functions, Server/Adaptor/Event/Hour can also be set up in order to analyze necessary parts alone. Figure 11 is the screenshot of analysis, when filtering Server selected as , Output Adaptor selected as Event, Event selected as event_5 and Hour selected as 13:00 ~ 18: Copyright c 2015 SERSC

9 Figure 10. Screenshot of Monitoring Analyzed of All Data 5. Conclusion Figure 11. Filter Functions for Monitoring Screen According to a McKinsey report, it was found that value in the medical fields is closely connected to saving national health care expenditures and medical expenses, as well as to enabling clinical trials. In this regard, this study established the fundamentals where massive amounts of high speed event data in rapidly changing environments by developing hospital ERP system-applied CEP based real time analytics system, which can help use a wide range of Big Data - especially in health and medical fields and make a social-economic impact. Such Big Data analytics system was designed to be applicable to small-medium sized hospitals as well, which can provide faster and more accurate information that meets specific needs of each hospital and enable them to create economic value through efficient patient management and administrative management. There are still more needs for R&D activities with respect to data use at hospitals in the future as well as to additional system complementation that can be applicable to many corporate entities in shipping and logistics. References [1] 7 Top Mega Trends in IT Industry for 2013, The Federation of Korean Information Industries, Big Data Policy, (2013), pp. 1. [2] J. Gantz and D. Reinsel, << Extracting Value from Chaos >>, IDC IVIEW, (2011) June, pp. 6. [3] P. Russom, Big Data Analytics, TDWI Research Fourth Quarter, (2011), pp. 6. Copyright c 2015 SERSC 101

10 [4] K.-W. Park, K.-J. Ban, S.-H. Song and E.-K. Kim, Cloud-based Intelligent Management System for Photovoltaic Power Plants, Korea Institute of Electronic Communication Sciences, vol. 7, no. 3, (2012) June, pp [5] hdfs_design.html. [6] D. C. Luckham and B. Frasca, Complex event processing in distributed systems. Computer Systems Laboratory Technical Report CSL-TR , Stanford University, Stanford 28, [7] [8] = 108&boardStep=0&categoryUid, Lee Ho-cheol, No. 249, [9] Wikipedia NoSQL. [10] NoSQL Database Comparison, (2011). Authors Mi-Jin Kim Feb, 2004: Obtained Bachelor s Degree in Computer Engineering at Dongeui University Aug, 2008: Obtained Master s Degree in Computer Education at Education School of Pukyong National University Feb, 2011: Completed Doctoral degree Course of Computer Engineering & Applications at Dongeui University Sept, 2014 Present: Senior Researcher Convergence of IT Devices Institute Busan at Dongeui University Yun-Sik Yu Feb, 1990: Obtained Doctor s Degree in Physics at Pusan National University Mar, 1983 Present: Professor of Radiological Science/IT Convergence department, Dongeui University Mar, 2008 Present: Director Convergence of IT Devices Institute Busan at Dongeui University 102 Copyright c 2015 SERSC

Development of CEP System based on Big Data Analysis Techniques and Its Application

Development of CEP System based on Big Data Analysis Techniques and Its Application , pp.26-30 http://dx.doi.org/10.14257/astl.2015.98.07 Development of CEP System based on Big Data Analysis Techniques and Its Application Mi-Jin Kim 1, Yun-Sik Yu 1 1 Convergence of IT Devices Institute

More information

Develop Total IT Service Monitoring System of Agentless Method for Total Management of based on Cloud Service Demand

Develop Total IT Service Monitoring System of Agentless Method for Total Management of based on Cloud Service Demand , pp. 1-14 http://dx.doi.org/10.14257/ijseia.2016.10.1.01 Develop Total IT Service Monitoring System of Agentless Method for Total Management of based on Cloud Service Demand Mi-Jin Kim, Yong-Kee Kang

More information

On a Hadoop-based Analytics Service System

On a Hadoop-based Analytics Service System Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology

More information

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Cyber Forensic for Hadoop based Cloud System

Cyber Forensic for Hadoop based Cloud System Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Scalable Multiple NameNodes Hadoop Cloud Storage System

Scalable Multiple NameNodes Hadoop Cloud Storage System Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Cloud-based Distribute Processing of User-Customized Mobile Interface in U-Sensor Network Environment

Cloud-based Distribute Processing of User-Customized Mobile Interface in U-Sensor Network Environment , pp.18-22 http://dx.doi.org/10.14257/astl.2013.42.05 Cloud-based Distribute Processing of User-Customized Mobile Interface in U-Sensor Network Environment Changhee Cho 1, Sanghyun Park 2, Jadhav Yogiraj

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

Integration of Hadoop Cluster Prototype and Analysis Software for SMB

Integration of Hadoop Cluster Prototype and Analysis Software for SMB Vol.58 (Clound and Super Computing 2014), pp.1-5 http://dx.doi.org/10.14257/astl.2014.58.01 Integration of Hadoop Cluster Prototype and Analysis Software for SMB Byung-Rae Cha 1, Yoo-Kang Ji 2, Jong-Won

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

Google Bing Daytona Microsoft Research

Google Bing Daytona Microsoft Research Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

More information

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

How To Make Data Streaming A Real Time Intelligence

How To Make Data Streaming A Real Time Intelligence REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

BIG DATA IN BUSINESS ENVIRONMENT

BIG DATA IN BUSINESS ENVIRONMENT Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania olga.banica@upit.ro 2 Faculty

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft Big Data Architectures Tom Cahill, Vice President Worldwide Channels, Jaspersoft Jaspersoft + Big Data = Fast Insights Success in the Big Data era is more than about size. It s about getting insight from

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Cloud Computing based Livestock Monitoring and Disease Forecasting System

Cloud Computing based Livestock Monitoring and Disease Forecasting System , pp.313-320 http://dx.doi.org/10.14257/ijsh.2013.7.6.30 Cloud Computing based Livestock Monitoring and Disease Forecasting System Seokkyun Jeong 1, Hoseok Jeong 2, Haengkon Kim 3 and Hyun Yoe 4 1,2,4

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Impact of Big Data: Networking Considerations and Case Study

Impact of Big Data: Networking Considerations and Case Study 30 Impact of Big Data: Networking Considerations and Case Study Yong-Hee Jeon Catholic University of Daegu, Gyeongsan, Rep. of Korea Summary which exceeds the range possible to store, manage, and Due to

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis , 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Massive Cloud Auditing using Data Mining on Hadoop

Massive Cloud Auditing using Data Mining on Hadoop Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Design of Electric Energy Acquisition System on Hadoop

Design of Electric Energy Acquisition System on Hadoop , pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

A Survey of Distributed Database Management Systems

A Survey of Distributed Database Management Systems Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,

More information

Generic Log Analyzer Using Hadoop Mapreduce Framework

Generic Log Analyzer Using Hadoop Mapreduce Framework Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,

More information

Apriori-Map/Reduce Algorithm

Apriori-Map/Reduce Algorithm Apriori-Map/Reduce Algorithm Jongwook Woo Computer Information Systems Department California State University Los Angeles, CA Abstract Map/Reduce algorithm has received highlights as cloud computing services

More information

A Database Hadoop Hybrid Approach of Big Data

A Database Hadoop Hybrid Approach of Big Data A Database Hadoop Hybrid Approach of Big Data Rupali Y. Behare #1, Prof. S.S.Dandge #2 M.E. (Student), Department of CSE, Department, PRMIT&R, Badnera, SGB Amravati University, India 1. Assistant Professor,

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com.

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data

More information

Big Data Big Data/Data Analytics & Software Development

Big Data Big Data/Data Analytics & Software Development Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Big Data Use Case: Business Analytics

Big Data Use Case: Business Analytics Big Data Use Case: Business Analytics Starting point A telecommunications company wants to allude to the topic of Big Data. The established Big Data working group has access to the data stock of the enterprise

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05 Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information