Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution
|
|
- Brianna Jones
- 8 years ago
- Views:
Transcription
1 , pp Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik Yu Convergence of IT Devices Institute Busan(CIDI), Dong-Eui University agicap@deu.ac.kr, ysyu@deu.ac.kr Abstract Individual data might not be thought of as that important for business purposes. However, Big Data analytics use cases are increasing, because individual data can become a valuable data aggregate from which any hidden information can be found, once it s collected into large volumes. Big Data. Known as one of conventional Big Data analytics technologies, Hadoop is a widely accepted technology to analyze structured/ unstructured Big Data to date. However, Hadoop has a high possibility for response time latency with larger data because of batch processing systems, which makes it difficult to do real time analysis for massive amounts of high speed event data under the current business environment and market conditions. In this paper, open source CEP (Complex Event Processing)-based technologies are used as an alternative for rapidly changing business, thereby developing the real time analytics system that enables us to analyze over thousands of event streams per second on a real time basis without latency, in order to be applicable to medical institution ERP systems. Keywords: Real-time analysis system, Big data, CEP 1. Introduction In the Smart era, social network, IoT and life log data are important elements to enter the Big Data era. Smart devices produce lots of data which is collected in distributed files and then processed into important information. Big Data is intelligence service-based technology, processing massive amounts of unstructured and understanding information that was not understandable or analyzable. This technology will raise the bar of service quality for users and change experts role and value in the near future [1]. Social media and Internet companies including Google, Facebook, Amazon and Yahoo are attempting to operate public services such as transport systems, marine resource systems, security systems and medical systems by using Big Data, proving the effect of Big Data-based social analyze, while those in private sector doing so for improved corporate managerial conditions and marketing efficiency in a hurry to develop Big Data-based new business models as well as increased profitability and efficient process. Individual data might be thought of as not that important for business purposes. However, Big Data analytics use cases are increasing, because individual data can become a valuable data aggregate from which any hidden information can be found, once collected into large volumes, Big Data. Known as one of conventional Big Data analytics technologies, Hadoop is a widely accepted technology to analyze structured/ unstructured Big Data to date. However, Hadoop has high possibilities for response time latency with larger data because of a batch processing system, which makes it difficult to do real time analysis for massive amounts of high speed event data under current business environment and market conditions. In addition, there are increasing needs for massive amount Big Data on a various corporate entity levels. In reality, however, access to Big Data use has been limited and not easy due to the high cost analytics system. ISSN: IJSEIA Copyright c 2015 SERSC
2 In this paper, open source CEP (Complex Event Processing)-based technologies are used as an alternative for rapidly changing business, thereby developing the real time analytics system that enables us to analyze thousands of event streams per second on a real time basis without latency, in order to be applicable to medical institution ERP systems. In this paper, the system allowing for efficient patient management and administrative management at medical institution was applicable and implemented to ERP systems of small-medium size hospitals, to analyze patient-oriented information and equipment data. 2. Related Works 2.1. Big Data Big Data refers to a technology for extracting value and analyzing results from the massive amount of structured and unstructured data set beyond the capacity of collecting, saving, managing and analyzing with any conventional database management tools [2]. In large, such Big Data s own attributes are called 3V: Volume, Velocity, and Variety. More recently, Value or Complexity can also be added. From the aspects of using Big Data, one of the most important attributes is Low Cost, because Big Data was first born in hope of saving and processing data at lower cost under the system different to the conventional one [3]. Big Data technologies are largely divided into data collection technology; saving technology; processing technology; analytics technology; expression and using technology; management (Infra, Biz) technology. Looking into technologies and issues required by each technical field, it is noted that the massive data loading time accounts for the entire time in the collection technologies with continuous increasing data, while it takes high cost in data saving and management in the saving technologies. In addition, there are discussions over insufficient timeliness due to processing for long time, as well as high cost for processing and operations in processing/ analytics technologies as well Hadoop Hadoop is a solution with its key focused on distributed processing technology, which is currently most favorable for Big Data processing. It is Java-based framework with Apache open source process to process massive data, using a relatively simple program model [4]. Hadoop is used as a core technology by Yahoo and Facebook, while being applied to many other companies own solution. Hadoop is composed of the distributed file system called HDFS (Hadoop Distributed File System) and the distributed processing system called MapReduce. HDFS is open source made based on the GFS (Google File System) model. Therefore, HDFS shared the same characteristics with the GFS. HDFS divides massive amounts of file into chunk (64MB) units, thereby distributing three into and saving to each data node. Meta data on where chunk is in the data node will be saved onto Namenode. The operation method is HDFS, but MapReduce is composed of the masters called Namenode and Datanode as well as multiple slaves in its structure. As shown in Fig. 1, Namenode manages the Metadata of file system, controlling data I/O between Client and Datanode. Datanode saves actual data, while directly performing the data I/O with Client and the block copying. 94 Copyright c 2015 SERSC
3 Figure 1. HDFS Structure MapReduce framework is the program model for distributed/parallel processing where previously time-consuming data can be operated in batch processing in short time. It supports the customizing for Key-Value from massive data, and performs high speed data processing based on Binary exploration and applicable Hash algorithm. With such characteristics, Hadoop is used to the fields with respect to saving and analyzing massive multimedia data, as well as log data [5] CEP(Complex Event Processing) CEP is complex event processing technology to extract meaningful data in real time basis from events from various event sources, thereby performing the corresponding actions[6]. Event data herein refers to stream data, which are data of continuous massive inputs, with important time sequences and endless data. It is impossible to process and analyze such stream data in real time basis into a conventional Relational Database. CEP is an event data processing solution that can provide real time analysis of such stream data. That is, it is possible to do real time processing of hundreds/millions of various high speed event stream based on In-Memory without saving it to database, file or Hadoop. The CEP engine is used for CEP processing. When you designate events generated by various systems and then register event patterns to be extracted, the CEP engine performs filtering, aggregation, gathering and joining of various event streams, and then provides the function to sensor the generated event patterns as you want through pattern matching. Basically, CEP-based analysis methods are applicable in order of Visibility Understanding Get Insight, and the visualized real time internal data let you understand connectivity and pattern between such events, based on which intuition can be acquired as necessary for the corporate environment, thereby responding to the business environment in real time basis [7]. Copyright c 2015 SERSC 95
4 Figure 2. CEP Architecture 2.4. Hadoop and CEP for Big Data Approach The Big Data-based methodologies have been mainly the Hadoop-oriented approaches from the prospects of saving. More recently, however, the approach to Big Data becomes more important. Indeed, the Hadoop ecological system-based perspectives of batch analysis is important, but the approach from the CEP-based perspectives of real time distributed is growing increasingly more important as in the Gartner report. With respect to the conventional DB, Hadoop-based batch processing and various high speed event streams, the differences in analytics mechanisms with the In-Memory-based real-time processing are shown in the Figure 3 [8]. In this study, we adopted the CEP technology using the In-Memory-based analytics processing mechanisms where data saved after analysis. Figure 3. Analysis Processing Mechanism of Hadoop and CEP 2.5. Definitions and Characteristics of NoSQL Unlikely conventional Relational Database (RDBMS), NoSQL refers to database differently designed [9]. The characteristics of NoSQL include the use of a model that is consistent that is less limiting than the conventional RDBMS, and far much more data can be saved, which can be used more effectively for handling big data. Other characteristics of NoSQL include horizontal scalability in a new type of database to overcome the limitations of the conventional Relational Database. From the aspects of CAP theory on distributed system, NoSQL adopts the partitioning-included AP or CP for distributed processing purpose, while the Relational Database adopted CA. NoSQL is focused on availability and instant response, as distributed processing database to provide massive amounts of data to users in a cloud environment. Therefore, 96 Copyright c 2015 SERSC
5 compared with NoSQL is much more efficient in the Connection Pool management and Fault-Tolerant management, than the conventional DBMS. Many enterprises use NoSQL in various types to provide users with cloud environment, including Google s Bigtable, Amazon s Dynamo and MongoDB. In particular, Cassandra [10] is the distributed-type database made in combination of characteristics of Google s Bigtable with those of Amazon s Dynamo, where each server is composed of ring-connected clusters, supporting defect permission and higher availability. Figure 4 shows the results of NoSQL product and consumer preferences survey, showing that Cassandra is the most popular due to its superior scalability, which is followed by CouchDB, MongoDB and HBase. Figure 4. Results of NoSQL Product and Consumer Preferences Survey NoSQL products are actively advanced. Cassandra is applied in this study based on the following figure, because applicable fields being different depending on each product. Table 1. Applicable Field of NoSQL Product Product CouchDB Redis MongoDB Membase Cassandra HBase Applicable field Erlang/Apache, Applied to master information which is not frequently accumulated/ changed. C/C++/BSD, Frequently changing information management Stock information/analysis/real-time data collection/ Real-time communications, etc. C++/AGPL, Massive DB/ Frequently changing information management, etc. When intending to replace the conventional RDBMS Erlang&C/Apache2.0, Fields where shorter delay time and higher simultaneity required/ First saved onto memory and disk used as the 2 nd repository./ Online gaming, etc. Java/Apache, Fields where there are more inputs than reading/ Information processing in banking and financial institutions/ Real-time information analytics fields Java/Apache, Real-time massive data processing fields Cassandra server is divided into multiple data centers that are networked. A node means one Cassandra server process in operation in one server, where one Cassandra logical storage is composed of such connected nodes. The structure of Cassandra logical storage is shown in Figure 5. In fact, Cassandra is distributed into multiple machines and operated. However, it is designed to look like, to an enduser, a single-instance with a cluster as outer structure. Cassandra aligns data to cluster and assigns such data to each node of cluster. Nodes in cluster can be added and deleted. When a new node is added, it requires for seed node that notifies cluster information. When any new node is added to the Cassandra cluster, seed node teaches how nodes-connected ring structured, to the new node. Either one or multiple number of seed node(s) can be designated in cluster. Copyright c 2015 SERSC 97
6 Figure 5. Cassandra Logical Storage Structure 3. System Compositions and Designs CEP technology is currently emerging across business fields, as the simplest and strongest method to implement the real time business intelligence based on timely analysis, providing new values including real time monitoring, with an early alarm and production field management by processing and analyzing various events. In this study, we aimed at providing a systematic and organized business environment for efficient patient management and administrative management at hospitals, by using CEP-based advantages in consideration of the low cost of Big Data and then establishing the Big Data-applicable real time analytics system in combination with medical institution ERP systems with yet a insufficient number of cases. In addition, the main Adaptor and data publisher/customizing functions were implemented, to allow to identify and develop UI screens - depending on the needs of each hospital. Real time analytics system architecture is composed as shown in Fig. 6. Incoming and outgoing real time data in varieties from/to analytics through Event Adaptor are converted into internal/external event types such as data protocol and type. While mapping with streams used for real time analysis, the Event Collector also saves the incoming system from each even onto NoSQL, which allows for time-series analysis as well. For NoSQL DB, the batch event processing was done through Cassandra. In the Big Data Analyzer, data collected from the Event Collector are analyzed using the open source-based real time analysis engine CEP and the Hive-based batch layer, before it performs the mapping function in Reporting-enabled form. The Event Generator performs the functions to convert the real time analyzed results into types, and protocols as user wants, as well as to save the analyzed results to database in real time basis. The Reporter provides the Webbased analysis functions that allows for a user to have visualized approach to the CEPbased real time analytics system, including Dashboard, Alarm, Analysis scheduling for real time analyzed results. 98 Copyright c 2015 SERSC
7 Figure 6. CEP-based Real Time Analytics System Architecture 4. System Implementation The processing of sequence for incoming Event Source on a real time basis during the system s implementation will be transferred to the Event Source through the Data Adapter as shown in Figure 7 if the Event Source was extracted by Legacy/ Batch Layer. Then the Real time processing layer will process such delivered Event Source and processed analytics results will be saved to the data repository or monitoring will be enabled if necessary, through external system API call. Figure 7. Processing of Sequence for Incoming Event Source To be brief on the processing of the entire system, Legacy saves log details generated in its own process as in Fig. 8, while performing real time analysis processing as far as major tasks among such processes. Batch layer collects data through data aggregation and extracts data through analysis works. Depending on such extracted results, there may be either storage to middle repository or request for real time analysis. Real time processing layer processes the requests from Legacy and from batch layers, requesting for data as necessary for processing and saving the results to the middle repository. For such processed results, there will be monitoring system API calls. Dashboard visualizes the data depending on the user s request, and then saved the Dashboard-processed data to the middle repository. Copyright c 2015 SERSC 99
8 Figure 8. Processing of the Entire System Figure 9 is the screenshot that shows the Web-based real time analysis provided by Reporter on selected items, when you set the Output Adaptor on items that are necessary for analysis among event stream data (please enter randomized diagnosis data) on the UI development screen of a hospital. Figure 9. UI Development Screen and Web-based Real Time Analysis Screen Figure 10 is the screenshot of monitoring analyzed of all data on Output Adaptor settings on Dashboard. Through filtering functions, Server/Adaptor/Event/Hour can also be set up in order to analyze necessary parts alone. Figure 11 is the screenshot of analysis, when filtering Server selected as , Output Adaptor selected as Event, Event selected as event_5 and Hour selected as 13:00 ~ 18: Copyright c 2015 SERSC
9 Figure 10. Screenshot of Monitoring Analyzed of All Data 5. Conclusion Figure 11. Filter Functions for Monitoring Screen According to a McKinsey report, it was found that value in the medical fields is closely connected to saving national health care expenditures and medical expenses, as well as to enabling clinical trials. In this regard, this study established the fundamentals where massive amounts of high speed event data in rapidly changing environments by developing hospital ERP system-applied CEP based real time analytics system, which can help use a wide range of Big Data - especially in health and medical fields and make a social-economic impact. Such Big Data analytics system was designed to be applicable to small-medium sized hospitals as well, which can provide faster and more accurate information that meets specific needs of each hospital and enable them to create economic value through efficient patient management and administrative management. There are still more needs for R&D activities with respect to data use at hospitals in the future as well as to additional system complementation that can be applicable to many corporate entities in shipping and logistics. References [1] 7 Top Mega Trends in IT Industry for 2013, The Federation of Korean Information Industries, Big Data Policy, (2013), pp. 1. [2] J. Gantz and D. Reinsel, << Extracting Value from Chaos >>, IDC IVIEW, (2011) June, pp. 6. [3] P. Russom, Big Data Analytics, TDWI Research Fourth Quarter, (2011), pp. 6. Copyright c 2015 SERSC 101
10 [4] K.-W. Park, K.-J. Ban, S.-H. Song and E.-K. Kim, Cloud-based Intelligent Management System for Photovoltaic Power Plants, Korea Institute of Electronic Communication Sciences, vol. 7, no. 3, (2012) June, pp [5] hdfs_design.html. [6] D. C. Luckham and B. Frasca, Complex event processing in distributed systems. Computer Systems Laboratory Technical Report CSL-TR , Stanford University, Stanford 28, [7] [8] = 108&boardStep=0&categoryUid, Lee Ho-cheol, No. 249, [9] Wikipedia NoSQL. [10] NoSQL Database Comparison, (2011). Authors Mi-Jin Kim Feb, 2004: Obtained Bachelor s Degree in Computer Engineering at Dongeui University Aug, 2008: Obtained Master s Degree in Computer Education at Education School of Pukyong National University Feb, 2011: Completed Doctoral degree Course of Computer Engineering & Applications at Dongeui University Sept, 2014 Present: Senior Researcher Convergence of IT Devices Institute Busan at Dongeui University Yun-Sik Yu Feb, 1990: Obtained Doctor s Degree in Physics at Pusan National University Mar, 1983 Present: Professor of Radiological Science/IT Convergence department, Dongeui University Mar, 2008 Present: Director Convergence of IT Devices Institute Busan at Dongeui University 102 Copyright c 2015 SERSC
Development of CEP System based on Big Data Analysis Techniques and Its Application
, pp.26-30 http://dx.doi.org/10.14257/astl.2015.98.07 Development of CEP System based on Big Data Analysis Techniques and Its Application Mi-Jin Kim 1, Yun-Sik Yu 1 1 Convergence of IT Devices Institute
More informationDevelop Total IT Service Monitoring System of Agentless Method for Total Management of based on Cloud Service Demand
, pp. 1-14 http://dx.doi.org/10.14257/ijseia.2016.10.1.01 Develop Total IT Service Monitoring System of Agentless Method for Total Management of based on Cloud Service Demand Mi-Jin Kim, Yong-Kee Kang
More informationOn a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
More informationBIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
More informationManaging Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationVolume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
More informationCyber Forensic for Hadoop based Cloud System
Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationScalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationBig Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationCloud-based Distribute Processing of User-Customized Mobile Interface in U-Sensor Network Environment
, pp.18-22 http://dx.doi.org/10.14257/astl.2013.42.05 Cloud-based Distribute Processing of User-Customized Mobile Interface in U-Sensor Network Environment Changhee Cho 1, Sanghyun Park 2, Jadhav Yogiraj
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationCloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
More informationIntegration of Hadoop Cluster Prototype and Analysis Software for SMB
Vol.58 (Clound and Super Computing 2014), pp.1-5 http://dx.doi.org/10.14257/astl.2014.58.01 Integration of Hadoop Cluster Prototype and Analysis Software for SMB Byung-Rae Cha 1, Yoo-Kang Ji 2, Jong-Won
More informationApplications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
More informationGoogle Bing Daytona Microsoft Research
Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large
More informationR.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,
More informationBig Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
More informationBIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationHow To Make Data Streaming A Real Time Intelligence
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationBIG DATA IN BUSINESS ENVIRONMENT
Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania olga.banica@upit.ro 2 Faculty
More informationUPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
More informationBig Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft
Big Data Architectures Tom Cahill, Vice President Worldwide Channels, Jaspersoft Jaspersoft + Big Data = Fast Insights Success in the Big Data era is more than about size. It s about getting insight from
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationCloud Computing based Livestock Monitoring and Disease Forecasting System
, pp.313-320 http://dx.doi.org/10.14257/ijsh.2013.7.6.30 Cloud Computing based Livestock Monitoring and Disease Forecasting System Seokkyun Jeong 1, Hoseok Jeong 2, Haengkon Kim 3 and Hyun Yoe 4 1,2,4
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationImpact of Big Data: Networking Considerations and Case Study
30 Impact of Big Data: Networking Considerations and Case Study Yong-Hee Jeon Catholic University of Daegu, Gyeongsan, Rep. of Korea Summary which exceeds the range possible to store, manage, and Due to
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationHadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science
A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationProblem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationMassive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
More informationAnalytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationDesign of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationBig Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationA Survey of Distributed Database Management Systems
Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,
More informationGeneric Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
More informationApriori-Map/Reduce Algorithm
Apriori-Map/Reduce Algorithm Jongwook Woo Computer Information Systems Department California State University Los Angeles, CA Abstract Map/Reduce algorithm has received highlights as cloud computing services
More informationA Database Hadoop Hybrid Approach of Big Data
A Database Hadoop Hybrid Approach of Big Data Rupali Y. Behare #1, Prof. S.S.Dandge #2 M.E. (Student), Department of CSE, Department, PRMIT&R, Badnera, SGB Amravati University, India 1. Assistant Professor,
More informationComposite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationImproving Data Processing Speed in Big Data Analytics Using. HDFS Method
Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationData-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More informationSuresh Lakavath csir urdip Pune, India lsureshit@gmail.com.
A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data
More informationBig Data Big Data/Data Analytics & Software Development
Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department
More informationGAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationTalend Real-Time Big Data Sandbox. Big Data Insights Cookbook
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationBig Data Use Case: Business Analytics
Big Data Use Case: Business Analytics Starting point A telecommunications company wants to allude to the topic of Big Data. The established Big Data working group has access to the data stock of the enterprise
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationIntroduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05
Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970
More informationSnapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationEMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst
White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned
More information