MapReduce Approach to Collective Classification for Networks
|
|
|
- Annis Cook
- 10 years ago
- Views:
Transcription
1 MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty of Computer Science and Management {wojciech.indyk,tomasz.kajdanowicz,kazienko,slawomir.plamowski}@pwr.wroc.pl Abstract. The collective classification problem for big data sets using MapReduce programming model was considered in the paper. We introduced a proposal for implementation of label propagation algorithm in the network. The method was examined on real dataset in telecommunication domain. The results indicated that it can be used to classify nodes in order to propose new offerings or tariffs to customers. 1 Keywords: MapReduce, collective classification, classification in networks, label propagation 1 Introduction Relations between objects in many various systems are commonly modelled by networks. For instance, those are hyperlinks connecting web pages, papers citations, conversations via or social interaction in social portals. Network models are further a base for different types of processing and analyses. One of them is node classification (labelling of the nodes in the network). Node classification has a deep theoretical background, however, due to new phenomenon appearing in artificial environments like social networks on the Internet, the problem of node classification is being recently re-invented and re-implemented. Nodes may be classified in networks either by inference based on known profiles of these nodes (regular concept of classification) or based on relational information derived from the network. This second approach utilizes information about connections between nodes (structure of the network) and can be very useful in assigning labels to the nodes being classified. For example, it is very likely that a given web page x is related to sport (label sport), if x is linked by many other web pages about sport. Hence, a form of collective classification should be provided, with simultaneous decision making on every node s label rather than classifying each node separately. Such approach allows taking into account correlations between connected nodes, which deliver usually undervalued knowledge. 1 This is not the final version of this paper. You can find the final version on the publisher web page.
2 2 Moreover, arising trend of data explosion in transactional systems requires more sophisticated methods in order to analyse enormous amount of data. There is a huge need to process big data in parallel, especially in complex analysis like collective classification. MapReduce approach to collective classification which is able to perform processing on huge data is proposed and examined in the paper. Section 2 covers related work while in Section 3 appears a proposal of MapReduce approach to label propagation in the network. Section 4, contain description of the experimental setup and obtained results. The paper is concluded in Section 5. 2 Related Work 2.1 Collective classification Collective classification problems, may be solved using two main approaches: within-network and across-network inference. Within-network classification, for which training entities are connected directly to entities, whose labels are to be classified, stays in contrast to across-network classification, where models learnt from one network are applied to another similar network [8]. Overall, the networked data have several unique characteristics that simultaneously complicate and provide leverage to learning and classification. Among others, statistical relational learning (SRL) techniques were introduced, including probabilistic relational models, relational Markov networks, and probabilistic entity-relationship models [9, 10]. Two distinct types of classification in networks may be distinguished: based on collection of local conditional classifiers and based on the classification stated as one global objective function. The most known implementations of the first approach are iterative classification (ICA) and Gibbs sampling algorithm (GS), whereas example of the latter are loopy belief propagation (LBP) and mean-field relaxation labeling (MF) [11]. 2.2 MapReduce programming model MapReduce is a programming model for data processing derived from functional language[3]. MapReduce breaks the processing into two consecutive phases: the map and the reduce phase. Usually, big data processing requires parallel execution and MapReduce provides and manages such functions. It starts with data splitting into separate chunks. Each data chunk must meet the requirement of < key, value > format, according to input file configuration. Then each data chunk is processed by a Map function. Map, takes an input pair and results with a set of < key, value > pairs. All values associated with the same key are grouped together and propagated to Reduce phase. The Reduce function, accepts a key and a set of values for that key. The function performs some processing of entered values and returns a new pair < key, value > to be saved as an output of processing. Usually reducers results in one < key, value > pair. Both, Map and Reduce phases need to be specified and implemented by user[1, 2]. The aforementioned process is presented in figure 2.2.
3 3 Fig. 1. The MapReduce programming model The MapReduce is able to process very large datasets thanks to initial split of data into small chunks. The most common open-source implementation of MapReduce model is Apache Hadoop library[4]. Apache Hadoop is a framework that allows distributed processing of large data sets. It can be done across clusters of computers and offers local computation and storage. The architectural properties of Hadoop deliver high-availability not due to hardware but application layer failures handling. The single MapReduce phase in Hadoop is named Job. The Job consist of map method, reduce method, data inputfiles and configuration. 3 Collective Classification by Means of Label Propagation Using MapReduce The most common way to utilize the information of labelled and unlabelled data is to construct a graph from data and perform a Markov random walk on it. The idea of Markov random walk has been used multiple times [5 7] and involves defining a probability distribution over the labels for each node in the graph. In case of labelled nodes the distribution reflects the true labels. The aim then is to recover this distribution for the unlabelled nodes. Using such a Label Propagation approach allows performing classification based on relational data. Let G(V, E, W ) denote a graph with vertices V, edges E and an n n edge weight matrix W. According to [6] in a weighted graph G(V,E,W) with n = V vertices, label propagation may be solved by linear equations 1 and 2. i, j V w ij F i = w ij F j (1) (i,j) E i V c classes(i) (i,j) E F i = 1 (2) where F i denotes the probability density of classes for node i. Let assume the set of nodes V is partitioned into labelled V L and unlabelled V U vertices, V = V L V U. Let F u denote the probability distribution over the labels associated
4 4 with vertex u V. For each node v V L, for which F v is known, a dummy node v is inserted such that w vv = 1 and F v = F v. This operation is equivalent to clamping discussed in [6]. Let V D be the set of dummy nodes. Then solution of equations 1 and 2 can be performed according to Iterative Label Propagation algorithm 3. Algorithm 1 The pseudo code of Iterative Label Propagation algorithm. 1: repeat 2: for all v (V V D) do 3: F v = (u,v) E wuvfu (u,v)wuv 4: end for 5: until convergence As it can be observed, at each iteration of Iterative Label Propagation certain operations on each of nodes are performed. These operations are calculated basing on local information only, namely node s neighbourhoods. This fact can be utilized in parallel version of algorithm, see algorithm 2. Algorithm 2 The pseudo code of MapReduce approach to Iterative Label Propagation algorithm. 1: map < node; adjacencylist > 2: for all n adjacencylist do 3: propagate< n; node.label, n.weight > 4: end for 1: reduce < n, list(node.label, weight) > 2: propagate< n, node.label weigth weight > MapReduce version of Iterative Label Propagation algorithm consist of two phase. The Map phase gets all labelled and dummy nodes and propagate their labels to all nodes in adjacency list taking into account edge weights between nodes. The Reduce phase calculates new label for each node with at least one labelled neighbour. Reducers calculates new label for nodes based on a list of labelled neighbours and relation strength between nodes (weight). The final result, namely a new label for a particular node, is computed as weighted sum of labels probabilities from neighbourhood. 4 Experiments and Results For the purpose of experimental setup the telecommunication network was built over 3 months history of phone calls from leading European telecommunication
5 5 company. The original dataset consisted of about phone calls and more than 16 million unique users. All communication facts (phone calls) were performed using one of 38 tariffs, of 4 types each. In order to limit the amount of data and simplify the task to meet hardware environment limitations only two types of phone calls were extracted and utilized in experiments. Users were labelled with class conditional probability of tariffs, namely sum of outcoming phone calls durations in particular tariff was divided by summarized duration of all outcoming calls. Eventually, final dataset consisted of users. Afterwards, the users network was calculated, where connection strength between particular users was calculated according to equation 3. e ij = 2 d ij d i + d j (3) where d ij denotes summarized duration of calls between user i and j, d i - summarized duration of ith outcoming calls and d j - summarized duration of jth incoming calls. Obtained network was composed of weighted edges between aforementioned users. The goal of the experiment was to predict class conditional probability of tariff for unlabelled users. Initial amount of labelled nodes (training set) for collective prediction was established to 37% randomly chosen users, according to uniform distribution. The rest of nodes should potentially belong to test set, however due to the property of examined algorithm some of nodes were unable to be reached and this same to have a label assigned. This mean that some of nodes did not posses incoming edges and the algorithm was not able to propagate the probability of labels to them. Eventually, the final test set was composed of only 2% of users distributed over the whole network. Nevertheless, the rest of nodes were utilized to keep the structure of network and propagation of labels, please see figure 4. The collective classification algorithm was implemented in MapReduce programming model. It consists of six Jobs, each accomplishing map-reduce phases. Detailed description of Jobs is presented in table 1. The convergence criterion in the algorithm has been controlled by ɛ change of conditional probability for each node. The algorithm was iterating until the this change was greater than ɛ. The experiment was organised in order to examine the computational time devoted for each of map-reduce steps as well as the number of iterations of the algorithm. The time was measured for three distinct values of ɛ = {0.01, 0.001, }. The final assessment of implemented algorithm was measured using mean square error between predicted label probability and known (true) label probability. The Mean Square Error (MSE) equals for all three ɛ values. Therefore we did not observe significant changes in the performance of algorithm while examining different values of convergence criterion ɛ. However, as presented in table 2 and figure 3 the value of convergence criterion ɛ has an impact on number of executions of implemented jobs. The less restrictive it is, the less executions of jobs to be performed.
6 6 Fig. 2. Types of nodes that have been utilized in the experiments: labelled and unlabelled, training and testing ones, used only for label propagation and omitted Fig. 3. Execution time in [s] of map-reduce jobs for distinct convergence criterion ɛ
7 7 Table 1. MapReduce jobs implemented in the algorithm. Job name Job description adjacencylist the job takes edge list as an input and returns an adjacency list for all nodes dummyadjlistandlabels the job creates a list of dummy nodes with labels according to algorithm [9] and updates an adjacency list by newly created edges from dummy nodes mergeadjlistandlabel the job merges a list of nodes labels with adjacency list resulting in collective classification input collectiveclassification the job processes collective classification data according to algorithm and results in new label list singlelabelscomparison the job results with absolute difference of class conditional probability of labels from actual iteration and previous iteration alllabelcomparison the job returns maximal difference of input list (absolute difference of class conditional probability) Table 2. Execution time in [s] and number of executions of map-reduce jobs for distinct convergence criterion ɛ Job name ɛ = 0.01 ɛ = ɛ = No. exec. Time No. exec. Time No. exec. Time adjacencylist dummyadjlistandlabels mergeadjlistandlabel collectiveclassification singlelabelscomparison alllabelcomparison The results obtained during experiments (MSE, execution time) indicate that proposed MapReduce approach for implementation of Iterative Label Propagation algorithm correctly performs parallel computation and results with satisfactory prediction results. Moreover it is able to accomplish prediction on big dataset, impossible to achieve in single thread version of algorithm in reasonable time. 5 Conclusions The problem collective classification using MapReduce programming model was considered in the paper. We introduced a proposal for implementation of Iterative Label Propagation algorithm in the network. Thanks to that, the method can perform complicated calculation using big data sets.
8 8 The proposed method was examined on real dataset in telecommunication domain. The results indicated that it can be used to classify nodes in order to propose new offerings or tariffs to customers. Further experimentation will consider a comparison of the presented method with other approaches. Moreover, further studies with much bigger data will be conducted. Acknowledgement This work was supported by The Polish National Center of Science the research project , and Fellowship co-financed by The European Union within The European Social Fund. References 1. Ekanayake, J., Pallickara, S., Fox, G., MapReduce for Data Intensive Scientific Analyses, Proceedings of the 2008 Fourth IEEE International Conference on escience, Dean, J., Ghemawat, S., Mapreduce: simplified data processing on large clusters. Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, Berkeley, USA, 2004, USENIX Association, pp. 1024, White, T., Hadoop: The Definitive Guide, O Reilly, Hadoop official web site: hadoop.apache.org, Szummer, M., Jaakkola, T., Clustering and efficient use of unlabeled examples. In Proceedings of Neural Information Processing Systems (NIPS), Zhu, X., Ghahramani, Z., Lafferty, J., Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning (ICML), Azran, A., The rendezvous algorithm: Multiclass semi-supervised learning with markov random walks. In Proceedings of the International Conference on Machine Learning (ICML), Jensen, D., Neville, J., Gallagher, B. Why collective inference improves relational classification. In the proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , Desrosiers C. and Karypis G., Within-network classification using local structure similarity. Lecture Notes in Computer Science 5781, pp , Knobbe, A., dehaas, M., and Siebes, A. Propositionalisation and aggregates. In proceedings of the Fifth European Conference on Principles of Data Mining and Knowledge Discovery, pp , Kramer, S., Lavrac, N., and Flach, P. Propositionalization approaches to relational data mining. In: Dezeroski S. (ed.) Relational Data Mining, Springer-Verlag, pp , 2001.
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
PLAANN as a Classification Tool for Customer Intelligence in Banking
PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department
Analysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 [email protected] ABSTRACT MapReduce is a programming model
Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
BSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China [email protected],
A Study on Data Analysis Process Management System in MapReduce using BPM
A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING
Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila
A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application
2012 International Conference on Information and Computer Applications (ICICA 2012) IPCSIT vol. 24 (2012) (2012) IACSIT Press, Singapore A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
MOBILE SOCIAL NETWORKS FOR LIVE MEETINGS
Computer Science 13 (4) 2012 http://dx.doi.org/10.7494/csci.2012.13.4.87 Michał Wrzeszcz Jacek Kitowski MOBILE SOCIAL NETWORKS FOR LIVE MEETINGS Abstract In this article, we present an idea of combining
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Evaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist [email protected], [email protected], [email protected] Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce
Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce Delip Rao Dept. of Computer Science Johns Hopkins University [email protected] David Yarowsky Dept. of Computer Science
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Apache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
Classification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal [email protected] Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal [email protected]
Final Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Mining Interesting Medical Knowledge from Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from
The Performance Characteristics of MapReduce Applications on Scalable Clusters
The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 [email protected] ABSTRACT Many cluster owners and operators have
Map/Reduce Affinity Propagation Clustering Algorithm
Map/Reduce Affinity Propagation Clustering Algorithm Wei-Chih Hung, Chun-Yen Chu, and Yi-Leh Wu Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology,
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *
Send Orders for Reprints to [email protected] 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766-771 Open Access Research on Application of Neural Network in Computer Network
Optimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Big Data: Big N. V.C. 14.387 Note. December 2, 2014
Big Data: Big N V.C. 14.387 Note December 2, 2014 Examples of Very Big Data Congressional record text, in 100 GBs Nielsen s scanner data, 5TBs Medicare claims data are in 100 TBs Facebook 200,000 TBs See
Tracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
http://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
A Stock Pattern Recognition Algorithm Based on Neural Networks
A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo [email protected] Xun Liang [email protected] Xiang Li [email protected] Abstract pattern respectively. Recent
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Ensembles and PMML in KNIME
Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany [email protected]
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Distributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich [email protected] T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Research on the Performance Optimization of Hadoop in Big Data Environment
Vol.8, No.5 (015), pp.93-304 http://dx.doi.org/10.1457/idta.015.8.5.6 Research on the Performance Optimization of Hadoop in Big Data Environment Jia Min-Zheng Department of Information Engineering, Beiing
Hadoop Scheduler w i t h Deadline Constraint
Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,
Cloud Computing based on the Hadoop Platform
Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.
MapReduce/Bigtable for Distributed Optimization
MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. [email protected] Scott Gilpin Google Inc. [email protected] Gideon Mann Google Inc. [email protected] Abstract With large data
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, and Elio Piccolo Dipartimento di Automatica e Informatica, Politecnico di Torino,
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Large Scale Multi-view Learning on MapReduce
Large Scale Multi-view Learning on MapReduce Cibe Hariharan Ericsson Research India Email: [email protected] Shivashankar Subramanian Ericsson Research India Email: [email protected] Abstract
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing
Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing Hsin-Wen Wei 1,2, Che-Wei Hsu 2, Tin-Yu Wu 3, Wei-Tsong Lee 1 1 Department of Electrical Engineering, Tamkang University
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example
MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Resource Scalability for Efficient Parallel Processing in Cloud
Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the
Analysis and Modeling of MapReduce s Performance on Hadoop YARN
Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University [email protected] Dr. Thomas C. Bressoud Dept. of Mathematics and
Stabilization by Conceptual Duplication in Adaptive Resonance Theory
Stabilization by Conceptual Duplication in Adaptive Resonance Theory Louis Massey Royal Military College of Canada Department of Mathematics and Computer Science PO Box 17000 Station Forces Kingston, Ontario,
Neural Networks and Back Propagation Algorithm
Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland [email protected] Abstract Neural Networks (NN) are important
Introduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant
Data Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].
1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very
Implementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya [email protected]
