Real time monitoring of system counters using Map Reduce Framework for effective analysis
|
|
|
- Angel Stevens
- 10 years ago
- Views:
Transcription
1 International Journal of Current Engineering and Technology E-ISSN , P-ISSN INPRESSCO, All Rights Reserved Available at Research Article Real time monitoring of system counters using Map Reduce Framework for effective analysis Sandeep B. Aher * and Poonam D. Lambhate Computer Engineering, JSPM s JSCOE, Pune, India. Information Technology, JSPM s JSCOE, Pune, India Accepted 27 July 2015, Available online 28 July 2015, Vol.5, No.4 (Aug 2015) Abstract Several monitoring software s provide flexibility to add various servers, performance system counters for real time monitoring. Application service managers use performance system counters to monitor servers and application health. It is constructive to detect anomalies earlier, take preventative actions and minimize the probability of failures of system to reduce uptime. The amount of data is huge for processing considering retention period and various global, process level system counters. The new approach, well known for offline processing, has been proposed for real-time data monitoring and analysis. The most notable solution that is proposed for managing and processing big data is the Map Reduce framework. Although Map-Reduce framework is very suitable for batch jobs but there is an increasing demand for non-batch processes on big data like: interactive jobs, real-time queries, and big data streams for faster and efficient monitoring. Keywords: Integrated monitoring, performance counters, distributed monitoring 1. Introduction 2. Literature Survey 1 With the advent of parallel processing on enormous data, processes are running with manifold instances and varied number of threads. Various processes like daemon process, batch processes, system process, backup processes etc. runs continuously on servers and utilize the resources. Some processes run for longer duration while some processes run for short duration. Monitoring helps in effectively manage system resources, making system decisions, evaluating and examining systems. If process level data is captured with data collection interval every second, the raw file size is in few GB s per hour on each server. There is need of cost-effective monitoring solution. For monitoring application, both offline and online processing is required. Big data is being used extensively in offline processing but its use in online processing is very limited. The use of big data framework for real-time processing is explored in this paper. In this paper, map reduce framework has been proposed for real time data processing. Instead of disk processing, memory processing is faster and it is proved with experiments that this approach is feasible to process huge offline volume as well as online volume. *Corresponding author: Sandeep B. Aher 2.1 Apache Spark Apache Spark is an open-source cluster computing framework originally developed in the AMP Lab at UC Berkeley ( In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms. Spark also supports a pseudodistributed local mode, usually used only for development or testing purposes, where distributed storage is not required and the local file system can be used instead; in this scenario, Spark is running on a single machine with one executor per CPU core. Spark had over 465 contributors in 2014, making it the most active project in the Apache Software Foundation and among Big Data open source projects. 2.2 Comparative Study with relevant Journals Various international journal papers were studied on monitoring system and implementation underlying technology. In (Masahiro Yoshizawa, et al, 2014), Most of the monitoring tasks are being performed, not by any single monitoring software, but by suitable combination of multiple monitoring software s. A 2609 International Journal of Current Engineering and Technology, Vol.5, No.4 (Aug 2015)
2 combination of two types of monitoring has been proposed: Global level monitoring and Log level monitoring. In (M. Yoshizawa, et al, 2011), Integrated monitoring software is effective for reducing the cost of monitoring SaaS-based systems. Data models for real-time data and system configurations for easier associations between the real-time data are also In (Rakesh Bhatnagar, et al, 2013), the author discussed the various performance parameters and criteria of the grid monitoring system, Ganglia. But it s not integrated monitoring system and Ganglia doesn t have a built in notification or alert system. In (Brian Laub, et al, 2014), Emerging research systems are focusing on scalable, distributed monitoring capable of quickly detecting and alerting administrators to anomalies. Nagios and Tivoli monitoring have centralized architectures and large data flow in network. In (Yassine Tabaa,, et al, 2012), author discussed about faster and efficient computing. Hadoop Map Reduce suffers significant troubles when dealing with iterative algorithms; as a consequence, many alternative frameworks that support this class of algorithms were created. The most important enhancement of Spark is that it allows some data to be shared between the mappers while they are computing, a fact that significantly reduces required data over the network. 2.3 Comparative Study with relevant Journals To monitor real time process level data, it has to be collected very frequently. Looking at number of running processes in environment and data collection interval, raw data will be very huge. Map reduce programming framework can be used for parallel processing of huge amount of data. Processing will work on different servers in distributed manner. The framework is scalable, fault tolerant and easily extendable. Map transforms a set of data into key value pairs and Reduce aggregates this data into a scalar. A reducer receives all the data for individual key from all the mappers. The Map-reduce framework operates exclusively on <key; value> pairs. To reduce the challenges in map reduce for realtime processing, In-memory computing solutions will be used. Some of them are: Apache Spark, GridGain and XAP. ( 3. Problem Statement There are restrictions to use Map-reduce (Hadoop) for real time process level monitoring due to following: 1. Map reduce is basically process offline files stored in file system. The proposed solution is for real time data analysis. 2. All of the input must be ready before job starts and this prevents Map Reduce from real time processing use cases. 3. Map-reduce cannot run continuous computations and queries. The processing is disk based and hence slower for online or real time data processing. Since Map Reduce framework (Hadoop) is not proper for these non-batch workloads, new solutions are proposed to these new challenges. 4. Proposed System 4.1 Objective The main objectives is to offer efficient and conclusive monitoring to different users in IT system including Monitoring team, Performance Test Team and Capacity planning team. Integrated monitoring system will have following objectives: Faster and efficient processing of data To identify the problematic process or application earlier To report the problem immediately to appropriate team To take temporary precautions such that minimal impact in current system To save cost, effort and time in fixing the issue To gain more confidence on the system and monitoring capabilities Improve high availability of applications Lower number of anomalies Proposed system is experimental and can be replacement of existing monitoring systems. Existing systems have mostly standalone global level statistic and use SNMP protocol for data processing (Anshul Kaushik, 2010). SNMP has support to standard performance counters and has security issues in previous versions. The proposed monitoring application with Map Reduce framework has been developed with Real time processing - Process level statistics. 4.2 System Architecture Spark applications run as independent sets of processes on a cluster, coordinated by the Spark Context object in main program as shown in Figure 1. Specifically, to run on a cluster, the Spark-Context can connect to several types of cluster managers, which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for the application. Next, it sends application code (defined by JAR or Python files passed to Spark- Context) to the executors. Finally, Spar-Context sends tasks for the executors to run (Matei. Zaharia). Current architecture diagram, performance is better up to 50 servers. If servers are more, then loosely coupled architecture option will have to be considered International Journal of Current Engineering and Technology, Vol.5, No.4 (Aug 2015)
3 5. Implementation Details Fig.1 Architecture Diagram Experimental system 4.3 Mathematical Model Let set S is defined as: S = {I, O, P, F, S} I = {Sn, Pc, t, It} Where: Sn Server name to be monitored. Pc Performance counter to be measured. t - Time period for which output should be displayed. It - Interval time, when output updates regularly. P = {P1, P2, P3, P4, P5} P = {Data collection, Data processing, Clustering, Data Conversion, Display) 1. Collect data using data collection agent on each server. 2. Process data as per input criteria received from application service manager 3. Cluster data by process ID of high resource consumption. 4. Convert received data to array format 5. Display output in graphical view. F = Fail to display real time streaming graph. S = Implementation of real time map reduce framework to display streaming process level performance counter data. Figure 2 shows state transition diagram representing major states of the proposed system and transitions between them The implemented system has mainly divided into two modules. First is data collection agent on servers to be monitored (offline processing) and second is data processing and display module. The data processing and display module is common to both offline and real-time processing. Data processing flow and display algorithm has been explained below in detail: ALGORITHM: Data Processing of Global and Process level performance monitors Start 2. INPUT parameters: Server name, Performance counter, TIME 3. IF [TIME is CURRENT] THEN CONTINUE TO step 4, GOTO step 5 4. [Ensure Data collection agent is running] IF [Data collection Agent = RUNNING] THEN GOTO step 6 MESSAGE Start data collection Agent GOTO step [Ensure historical data available] IF [Data is present on monitoring server] THEN CONTINUE to step 6, MESSAGE NO DATA PRESENT GOTO Step Read data from Server IF [TIME is CURRENT] THEN read streaming data Read off-line data from files 7. Invoke Map Reduce algorithm IF [PROCESS level data] Key process ID IF [GLOBAL level data] Key system time 8. IF [PROCESS level data] Clustering algorithm to cluster data by process ID. [GLOBAL level data] Filtering algorithm to filter data by selected time. 9. Convert selected data to JSON format 10. Display Graph 11. END Fig.2 State transition diagram 6. Experimental Setup and Results 6.1 Experimental Setup Real time processing The experimental setup has performed on Windows 7 operating system on desktop machines. A console application was developed using PDH libraries in visual studio to collect windows system level data. The frequency of data collection at regular interval of one 2611 International Journal of Current Engineering and Technology, Vol.5, No.4 (Aug 2015)
4 second and continues until real-time process level data requested by another machine or central server. The monitoring server has been developed in Java and for graph display Jfreechart has been used. Two different monitoring servers were developed for comparisons. First is with usual client-server communication with Scanner class to break input into tokens. The input then passed to graph until data collection agent is running. Another monitoring server has developed with Apache spark ( JavaReceiver- InputDStream interface has been used for defining input stream to connect and receive data over network. The process (key) and its dynamic value to monitor selected while making connection to data collection agent. flatmap transformations used instead Scanner to get sequence of tokens, which contain both key and runtime value. The tokens are collected in array and then displayed in the graph. Reduce job is to take input from above and then filter the dynamic value required to display in the graph. 6.2 Result Analysis The programs were run to display real-time monitoring data using apache spark framework. The system counter Free Memory in MB was analyzed realtime graphically using Figure 3. Fig. 3. Process level free memory (MB) monitoring system counter Fig.3 Process level free memory (MB) monitoring system counter The key contributions of the MapReduce framework are not the actual map and reduce functions, but the scalability and fault-tolerance achieved for a variety of applications by optimizing the execution engine once. As such, a single-threaded implementation of MapReduce (such as MongoDB) will usually not be faster than a traditional (non- MapReduce) implementation, any gains are usually only seen with multi-threaded implementations Conclusion The implemented system is experimental and can be replacement of existing monitoring systems. Big data (Hadoop) is widely used for offline processing but its use in real-time processing is very limited. 1) Map-reduce framework can be implemented in real time processing with Apache spark. 2) It will work on commodity hardware resulting in cost-effective application system. Future Work The monitoring system with map reduce framework can be further enhanced with multiple graphs and enriched features. It can be further commercialized with low cost offering for global and process level data monitoring. For real time implementation, the optimized tuning of the application has to be done with optimized number of threads. References Sandeep Aher, Poonam Lambhate, (2015) Integrated Monitoring System International Journal on Recent and Innovation Trends in Computing and Communication ISSN: Volume: 2 Issue: 12 Sandeep Aher, Poonam Lambhate, (2015) Data Mining and Information Retrieval track, cpgcon 2015-Paper sion= ;a= Masahiro Yoshizawa, Tatsuya Sato, Ken Naono (September 2014), Integrated Monitoring Software for Application Service Managers IEEE transactions on network and service management, vol. 11, No. 3 M. Yoshizawa and K. Naono, (2011) Design and evaluation of integrated monitoring software for SaaS-based systems, in Proc. APNOMS, pp. 14. Rakesh Bhatnagar, Dr. Jayesh Patel, (2013) Performance Analysis of A Grid Monitoring System - Ganglia International Journal of Emerging Technology and Advanced Engineering Volume 3,Issue 8 Brian Laub, Chengwei Wang, Karsten Schwan, Chad Huneycutt, (2014) Towards Combining Online and Offline Management for Big Data Applications, Autonomic Computing, th International Conference on Autonomic Computing (ICAC 14) Yassine Tabaa, Abdellatif Medou, (2012) Towards a next generation of scientific computing in the Cloud IJCSI International Journal of Computer Science Issues, Vol. 9, issue 6, no 3 Anshul Kaushik, (2010) Use of open source technologies for enterprise server monitoring using SNMP International Journal on Computer Science and Engineering (IJCSE) Vol. 02, No.07, Performance management across the application lifecycle Monitor A. van Hoorn et al., (2009) Continuous monitoring of software services: Design and application of the Kieker framework, Dept. Comput. Sci., Univ. Kiel, Kiel, Germany, TR0921. B. Krishnamurthy, A. Neogi, B. Sengupta, and R. Singh, (2008) Data tagging architecture for system monitoring in dynamic environments, in Proc. NOMS, pp International Journal of Current Engineering and Technology, Vol.5, No.4 (Aug 2015)
5 J.Wei, Y. Zhao, K. Jiang, R. Xie, and Y. Jin,(2011) Analysis farm: A cloudbased scalable aggregation and query platform for network log analysis, in Proc. Int. Conf. CSC, pp Dr. Evangelos Bekiaris, Christos Petsos, Kostas Kalogirou, (2012) Energy Management Web Application: Brokerage Agent Front End Application, 36 International Conference on Power and Energy Systems Lecture Notes in Information Technology, Vol.13 Manoj V,(2014) Comparative study of nosql document, column store databases and evaluation of CASSANDRA,International Journal of Database Management Systems ( IJDMS ) Vol.6,No.4 Jaspreet kaur, Harpreet Kaur, Kamaljeet kaur, (2013) A review on document oriented and column oriented, International Journal of Computer Trends and Technology - volume4 Issue3 MapReduce Programming Model for.netbased Cloud Computing, (2009) Architecture Governance Logical Architectural Diagramming Guidelines, pro.iankoenig.com/ docs/logicalarchitecturaldrawingguidelinesv2.pdf Saeed Shahrivari, Saeed Jalili, Beyond Batch Processing: Towards Real - Time and Streaming Big Data, Computer Engineering Department, Tarbiat Modares University (TMU) park/ /ch01.html,chapter 1. Introduction to Data Analysis with Spark Events over time chart using Flot Charts Jitendra Kumar, Richard T. Mills, Forrest M. Homan, William W. Hargrove, (2011) Parallel k-means Clustering for Quantitative Ecoregion Delineation using Large Data Sets, International Conference on Computational Science Weizhong Zhao, Huifang Ma, Qing He, Parallel K-Means Clustering Based on MapReduce, Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences Apache Spark, fast and general engine for large-scale data processing 2613 International Journal of Current Engineering and Technology, Vol.5, No.4 (Aug 2015)
Evaluation of Integrated Monitoring System for Application Service Managers
Evaluation of Integrated Monitoring System for Application Service Managers Prerana S. Mohod, K. P. Wagh, Dr. P. N. Chatur M.Tech Student, Dept. of Computer Science and Engineering, GCOEA, Amravati, Maharashtra,
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
Architectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet [email protected] October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
Optimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
Big Data Analysis: Apache Storm Perspective
Big Data Analysis: Apache Storm Perspective Muhammad Hussain Iqbal 1, Tariq Rahim Soomro 2 Faculty of Computing, SZABIST Dubai Abstract the boom in the technology has resulted in emergence of new concepts
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside
Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment Sanjay Kulhari, Jian Wen UC Riverside Team Sanjay Kulhari M.S. student, CS U C Riverside Jian Wen Ph.D. student, CS U
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
From GWS to MapReduce: Google s Cloud Technology in the Early Days
Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu [email protected] MapReduce/Hadoop
Detection of Distributed Denial of Service Attack with Hadoop on Live Network
Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Brave New World: Hadoop vs. Spark
Brave New World: Hadoop vs. Spark Dr. Kurt Stockinger Associate Professor of Computer Science Director of Studies in Data Science Zurich University of Applied Sciences Datalab Seminar, Zurich, Oct. 7,
Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model
Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model Condro Wibawa, Irwan Bastian, Metty Mustikasari Department of Information Systems, Faculty of Computer Science and
Unified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing
www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
Big Data Analytics Hadoop and Spark
Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
Storage and Retrieval of Data for Smart City using Hadoop
Storage and Retrieval of Data for Smart City using Hadoop Ravi Gehlot Department of Computer Science Poornima Institute of Engineering and Technology Jaipur, India Abstract Smart cities are equipped with
Self-Compressive Approach for Distributed System Monitoring
Self-Compressive Approach for Distributed System Monitoring Akshada T Bhondave Dr D.Y Patil COE Computer Department, Pune University, India Santoshkumar Biradar Assistant Prof. Dr D.Y Patil COE, Computer
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1
Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
Big Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
CSE-E5430 Scalable Cloud Computing Lecture 11
CSE-E5430 Scalable Cloud Computing Lecture 11 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 30.11-2015 1/24 Distributed Coordination Systems Consensus
Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
On a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
Big Data Analytics OverOnline Transactional Data Set
Big Data Analytics OverOnline Transactional Data Set Rohit Vaswani 1, Rahul Vaswani 2, Manish Shahani 3, Lifna Jos(Mentor) 4 1 B.E. Computer Engg. VES Institute of Technology, Mumbai -400074, Maharashtra,
Preprocessing Web Logs for Web Intrusion Detection
Preprocessing Web Logs for Web Intrusion Detection Priyanka V. Patil. M.E. Scholar Department of computer Engineering R.C.Patil Institute of Technology, Shirpur, India Dharmaraj Patil. Department of Computer
XMPP A Perfect Protocol for the New Era of Volunteer Cloud Computing
International Journal of Computational Engineering Research Vol, 03 Issue, 10 XMPP A Perfect Protocol for the New Era of Volunteer Cloud Computing Kamlesh Lakhwani 1, Ruchika Saini 1 1 (Dept. of Computer
Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics
Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2
HPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox [email protected] http://www.infomall.org
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, [email protected]
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.
An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [SPARK] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Streaming Significance of minimum delays? Interleaving
BIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
Processing of Hadoop using Highly Available NameNode
Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale
Managing large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Click Stream Data Analysis Using Hadoop
Governors State University OPUS Open Portal to University Scholarship Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors State University Sivakrishna
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Mining Interesting Medical Knowledge from Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
High Performance Cluster Support for NLB on Window
High Performance Cluster Support for NLB on Window [1]Arvind Rathi, [2] Kirti, [3] Neelam [1]M.Tech Student, Department of CSE, GITM, Gurgaon Haryana (India) [email protected] [2]Asst. Professor,
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Survey on Load
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
CiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
Lecture 10 - Functional programming: Hadoop and MapReduce
Lecture 10 - Functional programming: Hadoop and MapReduce Sohan Dharmaraja Sohan Dharmaraja Lecture 10 - Functional programming: Hadoop and MapReduce 1 / 41 For today Big Data and Text analytics Functional
A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS
A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS Abstract T.VENGATTARAMAN * Department of Computer Science, Pondicherry University, Puducherry, India. A.RAMALINGAM Department of MCA, Sri
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM
STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM Albert M. K. Cheng, Shaohong Fang Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
http://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Client Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System
Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment Sayalee Narkhede Department of Information Technology Maharashtra Institute
Survey on Scheduling Algorithm in MapReduce Framework
Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India
NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)
UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
Design of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
marlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
