# Implementing Graph Pattern Mining for Big Data in the Cloud

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 graph, we have to choose some initial points or nodes for If an isomorphism exists between two graphs, then the graphs mining. Here we have taken those points as initial points, are called isomorphic and we write. In the case which have the highest number of associativity. As we are when the bijection is a mapping of a graph onto itself, i.e., dealing with social networks, the nodes of graph will be the when G and H are one and the same graph, the bijection is people and edges will be the connection among people. called an automorphism of G. Associativity of a node points to the number of connections of that node. The top-k nodes having highest associativity will be Algorithm 3: Isomorphism Test chosen as the initial points to start mining. Require: g1, g2 Algorithm 1: Initial node set Require: k, number of initial nodes Ensure: node set V = {v i } 1: Sort the nodes in decreasing order of the no of time it appears 2: V = {v i 1 i k} 3: output V After discovering the initial k-points to start mining, the points should be extended to discover single-edged graph patterns. For this extension purpose, we have used Breadth First Search approach in our proposed system. Ensure: true or false 1: create a pattern edge between every pair of nodes in each graph. 2: match the pattern graphs 3: if pattern graph G p 1 is isomorphic to pattern graph G p 2 4: output true 5: else output false C. Pattern Pruning Phase Definition 1: Breadth-first search (BFS) is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph, sometimes referred to as a search key ) and explores the neighbour nodes first, before moving to the next level neighbours. Algorithm 2: Set Up Single-edged graphs Require: V initial node set Ensure: G init = {g 1, g 2,,g k }, k single edge graphs 1: for each v i V 2: G init {(v i, v o ) 1 i k, v o G Λ v o V} 3: Output G init B. Isomorphism test phase In this section, Isomorphism test will be conducted among the patterns mined. After getting the initial single-edged patterns, the next step is to perform isomorphism test among the singleedged patterns. Isomorphism test ensures that two isomorphic patterns are not considered to be two different patterns. Besides, the count of that pattern should be increased. Definition 2: In graph theory, an isomorphism of graphs G and H is a bijection between the vertex sets of G and H, f: V(G) V(H) such that any two vertices u and v of G are adjacent in G if and only if ƒ(u) and ƒ(v) are adjacent in H. This kind of bijection is generally called "edge-preserving bijection", in accordance with the general notion of isomorphism being a structure-preserving bijection. It is the last phase of our framework after mining and isomorphism test. Actually all the phases keep executing continuously step by step. After getting the initial single-edged patterns or sub-graphs, isomorphism test is conducted on the resulted patterns. The isomorphism test results in increasing the count for those patterns which are isomorphic and the count of remaining patterns remains the same. As there is a minimum support count or threshold is attached with our approach, we will choose those patterns which have support count greater than the minimum support count or threshold. In any pattern discovery mechanism this approach is taken as we all are mostly interested with frequently occurring patterns. The patterns or sub-graphs with higher count are more interesting to study. Thus in our approach in pruning phase, all the patterns or sub-graphs with support count less than the minimum support count or threshold will be pruned or discarded. After pruning of infrequent patterns, again the frequent patterns are extended, isomorphism tests are done and the infrequent patterns are pruned. This process continues until we reach the stopping conditions. There are two stopping conditions, occurrence of either one terminates our approach. 1. All the graph patterns discovered in the previous step are infrequent. 2. All the nodes of the graph are visited. Once a node is visited by some pattern while extending, it is marked as visited and cannot be visited again. Algorithm 4: Prune Infrequent Patterns Require: G intermediate = intermediate set of graphs, t threshold 3198

4 Ensure: G frequent frequent set of graphs No. of frequent patterns 1: for each g i G intermediate : calculate Sup g, the support count of g i 3: if Sup g > t 4: G g i 5: output G frequent init_nodes=10 init_nodes=15 init_nodes=20 Algorithm 5: Edge Extension Require: G i, the input graph Fig 2: No. of frequent patterns against number of initial nodes. 2) Effect of threshold value chosen on result: Ensure: G ext or null 1: if any nodes are left in the graph 2: G ext = add an edge to the subgraph using BFS 3: if all nodes are exhausted 4: prune the subgraph 5: output G ext IV. EXPERIMENTAL RESULTS AND ANALYSIS In this section, we present the performance evaluation of proposed approach with real data sets [14]. The setting of the environment will be introduced in section IV A. Result analysis will be discussed in section IV B. A. Experimental environment This paper implements our proposed approach on Hadoop in a cloud computing environment consisting of 4 virtual machines. One node acts as master node and others serve as slave nodes. We are using AWS EC2 instances of type t2.macro, each one is having 1 vcpu, and 1 GiB memory. B. Result analysis Users have to give the input partitioned graphs, the number of initial nodes, the feature based on which the patterns will be discovered (work id, education type etc.) and the threshold value. Depending upon the feature chosen, the number of initial nodes and threshold, different patterns will be discovered. 1) Effect of number of initial nodes chosen on result: As the initial nodes for mining can be specified by users, the number of patterns retrieved will vary if the selected feature and threshold are kept constant. In figure 2, we have recorded number of frequent patterns against number of initial nodes chosen for mining. (Feature: education;type, threshold:6). The threshold value also can be adjusted by user. The number of patterns will vary if the number of initial nodes for mining and feature is kept constant. In figure 3, we have recorded the number of frequent patterns against threshold value. (Feature: education;type, number of initial nodes:20). Fig 3: No. of frequent patterns against number of threshold values V. CONCLUSION Big data mining is now a hot research question, MapReduce brings a simple distributed programming. This paper proposes an approach based on Hadoop and cloud computing in handling big graph data. In the face of mining large data sets, the parallelization is a good solution. Our proposed approach is an approximation-based approach which will results in frequent graph patterns ranging from single-edged to n-edged graph patterns based on the initial node set, feature based on which patterns are searched for and the threshold value. The graph patterns can be studied for different purposes. Also we can get patterns from combination of feature sets from the patterns listed in the input graph. The proposed approach can be used for other social network graphs if the data format is same as the input dataset. In future work, more real big data sets can be examined for this approach. Moreover, we can improve the way of finding the initial nodes for pattern mining, pattern extension and stopping conditions No. of frequent patterns threshold=4 threshold=6 threshold=8 3199

5 REFERENCES [1] P. Anchuri, M. J. Zaki, O. Barkol, R. Bergman, Y. Felder, S. Golan, and A. Sityon, Graph mining for discovering infrastructure patterns in configuration management databases, Knowledge and information systems, vol. 33, no. 3, pp , [2] V. Bonnici, R. Giugno, A. Pulvirenti, D. Shasha, and A. Ferro, A subgraph isomorphism algorithm and its application to biochemical data, BMC Bioinformatics, vol. 14, no. Suppl7, p. S13, [3] S. Ghazizadeh and S. S. Chawathe, Seus: Structure extraction using summaries, in Discovery science. Springer, 2002, pp [4] C. Borgelt, T. Meinl, and M. Berthold, Moss: a program for molecular substructure mining, in Proceedings of the 1 st international workshop on open source data mining: frequent pattern mining implementations. ACM, 2005, pp [5] L. B. Holder, D. J. Cook, and S. Djoko, Substucture discovery in the subdue system. in KDD Workshop, 1994, pp [6] Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis, Mizan: a system for dynamic load balancing in large-scale graph processing, in Proceedings of the 8 th ACM European Conference on Computer Systems. ACM, 2013, pp [7] U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos, Gbase: an efficient analysis platform for large graphs, The VLDB Journal, vol. 21, no. 5, pp , [8] [9] Chun-Chien Chen, Kuan-Wei Lee, Chi-Chieh Chang, De- Nian Yang and Ming-Syan Chen Efficient Large Graph Pattern Mining for Big Data in the Cloud. [10] [11] [12] [13] oud. [14] Standford large network dataset collection, 3200

### A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

### Improving Apriori Algorithm to get better performance with Cloud Computing

Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become

### CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

### International Journal of Innovative Research in Computer and Communication Engineering

FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

### Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

### Cloud Computing Summary and Preparation for Examination

Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination

### LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT Samira Daneshyar 1 and Majid Razmjoo 2 1,2 School of Computer Science, Centre of Software Technology and Management (SOFTEM),

### Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

### Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

### Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

### Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

### CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila

### Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

### Visualisation in the Google Cloud

Visualisation in the Google Cloud by Kieran Barker, 1 School of Computing, Faculty of Engineering ABSTRACT Providing software as a service is an emerging trend in the computing world. This paper explores

Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

### Evaluating partitioning of big graphs

Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed

### Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

### Advances in Natural and Applied Sciences

AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Clustering Algorithm Based On Hadoop for Big Data 1 Jayalatchumy D. and

### HadoopRDF : A Scalable RDF Data Analysis System

HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn

### CiteSeer x in the Cloud

Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

### Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

### Early Cloud Experiences with the Kepler Scientific Workflow System

Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow

### Operating Stoop for Efficient Parallel Data Processing In Cloud

RESEARCH INVENTY: International Journal of Engineering and Science ISBN: 2319-6483, ISSN: 2278-4721, Vol. 1, Issue 10 (December 2012), PP 11-15 www.researchinventy.com Operating Stoop for Efficient Parallel

### A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools

### A Novel Cloud Based Elastic Framework for Big Data Preprocessing

School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

### Performance Analysis of Hadoop for Query Processing

211 Workshops of International Conference on Advanced Information Networking and Applications Performance Analysis of Hadoop for Query Processing Tomasz Wiktor Wlodarczyk, Yi Han, Chunming Rong Department

### III Big Data Technologies

III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

### Cloud Computing based on the Hadoop Platform

Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.

### Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

### R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

### Graph Algorithms using Map-Reduce

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web 1/7 Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society.

### A Professional Big Data Master s Program to train Computational Specialists

A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions

### Distributed Framework for Data Mining As a Service on Private Cloud

RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

### Mobile Storage and Search Engine of Information Oriented to Food Cloud

Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

### Large-Scale Data Processing

Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

### Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

### International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

### INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

### imgraph: A distributed in-memory graph database

imgraph: A distributed in-memory graph database Salim Jouili Eura Nova R&D 435 Mont-Saint-Guibert, Belgium Email: salim.jouili@euranova.eu Aldemar Reynaga Université Catholique de Louvain 348 Louvain-La-Neuve,

### Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

### Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics

Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics Sabeur Aridhi Aalto University, Finland Sabeur Aridhi Frameworks for Big Data Analytics 1 / 59 Introduction Contents 1 Introduction

### International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

### Cloud Computing Now and the Future Development of the IaaS

2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.

### Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

### Survey on Load Rebalancing for Distributed File System in Cloud

Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university

### Management & Analysis of Big Data in Zenith Team

Management & Analysis of Big Data in Zenith Team Zenith Team, INRIA & LIRMM Outline Introduction to MapReduce Dealing with Data Skew in Big Data Processing Data Partitioning for MapReduce Frequent Sequence

### Scalable Multiple NameNodes Hadoop Cloud Storage System

Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai

### A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal lazymesh@gmail.com,

### Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

### Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

### Resource Scalability for Efficient Parallel Processing in Cloud

Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the

### Mining Interesting Medical Knowledge from Big Data

IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from

Investigating Hadoop for Large Spatiotemporal Processing Tasks David Strohschein dstrohschein@cga.harvard.edu Stephen Mcdonald stephenmcdonald@cga.harvard.edu Benjamin Lewis blewis@cga.harvard.edu Weihe

### UPS battery remote monitoring system in cloud computing

, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

### Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

### Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

### Big Data on Microsoft Platform

Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

### Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

### NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

### CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

### marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

### Energy Efficient MapReduce

Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

### Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

### Efficient Analysis of Big Data Using Map Reduce Framework

Efficient Analysis of Big Data Using Map Reduce Framework Dr. Siddaraju 1, Sowmya C L 2, Rashmi K 3, Rahul M 4 1 Professor & Head of Department of Computer Science & Engineering, 2,3,4 Assistant Professor,

### Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

### Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

### An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus Tadashi Ogino* Okinawa National College of Technology, Okinawa, Japan. * Corresponding author. Email: ogino@okinawa-ct.ac.jp

### Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

### BSPCloud: A Hybrid Programming Library for Cloud Computing *

BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,

### The big data revolution

The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

### Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least

+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data

### A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

### Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2

### Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

### Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee Technology in Pedagogy, No. 8, April 2012 Written by Kiruthika Ragupathi (kiruthika@nus.edu.sg) Computational thinking is an emerging

### Optimization and analysis of large scale data sorting algorithm based on Hadoop

Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,

### Distributed Apriori in Hadoop MapReduce Framework

Distributed Apriori in Hadoop MapReduce Framework By Shulei Zhao (sz2352) and Rongxin Du (rd2537) Individual Contribution: Shulei Zhao: Implements centralized Apriori algorithm and input preprocessing

### Keywords: Big Data, HDFS, Map Reduce, Hadoop

Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

### A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

### Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

### MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration Hoi-Wan Chan 1, Min Xu 2, Chung-Pan Tang 1, Patrick P. C. Lee 1 & Tsz-Yeung Wong 1, 1 Department of Computer Science

### Graph Database Proof of Concept Report

Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

### Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying

### Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

### A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

### Analysis of MapReduce Algorithms

Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model

### Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

### Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load Pooja.B. Jewargi Prof. Jyoti.Patil Department of computer science and engineering,

### Big Data and Scripting Systems beyond Hadoop

Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid