Online Fraud Detection Model Based on Social Network Analysis

Size: px
Start display at page:

Download "Online Fraud Detection Model Based on Social Network Analysis"

Transcription

1 Journal of Information & Computational Science :7 (05) May, 05 Available at Online Fraud Detection Model Based on Social Network Analysis Peng Wang a,b,, Ji Li a,b, Bigui Ji a,b a College of Computer Science, Chongqing University, Chongqing , China b Key Laboratory for Dependable Service Computing in Cyber Physics Society, Ministry of Education Chongqing , China Abstract With the rapid development of the Internet, the way of our living and thinking has changed. Because of the anonymity and low-cost legal sanctions, e-commerce has been booming on the Internet. Unfortunately, rapid commercial success has made e-commerce sites a lucrative medium for committing fraud. Therefore, we proposed a method that used for fraud detection and prevention on platform. First, we implemented a parallel web crawling agent to collect real users and transaction data. Second, we proposed Reverse Graph and Common Trade Cumulative Graph (CTCG) theory to extract features of common transaction. Third, we extracted the features of graph-level based on the Page-Rank and K-core clustering algorithm and replaced the PageRank values with reasonable TrustRank values, added BadRank values for identifying potential fraud users. Finally, we conducted a series of experiments using the Random Forest and verified the performance of our method by applying it to real transaction cases. In summary, our proposed model is effective in identifying potential fraudulent users on the fraud platform. Keywords: Fraudulent Platform; Social Network Analysis; Reverse Graph; CTCG; Random Forest Introduction With the rapid development of the Internet, the way of our living and thinking has changed. Because of the anonymity and low-cost legal sanctions, e-commerce has been booming on the Internet, people from around the world engaged in commodities trading millions of dollars every day. The world s largest e-commerce online trading platform Bay ( announced its 03 third quarter earnings report [], the report shows revenue for the third quarter was $3.9 billion. Unfortunately, according to an Internet Fraud Report issued by the Internet Crime Complaint Center (IC3) [], a joint operation between the FBI and the National White Collar Crime Center (NW3C), the number of complaints about Internet, fraud increased from 4,4 per month in 0 to 4,5 per month in 0. From January, 0 to December 3, 0, IC3 received a total Corresponding author. address: [email protected] (Peng Wang) / Copyright 05 Binary Information Press DOI: 0.733/jics00590

2 554 P. Wang et al. / Journal of Information & Computational Science :7 (05) of 9,74 complaints, representing a 39.4% (4,90) increase over the previous year. The total amount lost increased from $7. million in 00 to $55 million in 0. Among the assorted fraud types, non-delivery of merchandise ranked number one (.%). These figures indicate that online fraud causes significant losses. Despite the prevalence of online transaction fraud, but cannot make systematic solution used to identify fraudsters in the transaction, they just use the user s transactions and personal information to determine the user s credibility. The most popular e-commerce trading platforms, such as ebay and domestic TaoBao, what they used is based on evaluation mechanism of feedback accumulated. Users can take advantage of the anonymity and the low online auction fees to create multiple accounts and increase their rating scores via sham transactions. In this way, they can deceive the buyer with their high rating score. In addition to fraudulent groups, currently there is a more popular fraudulent way, fraud through the fraudulent platform to enhance their credibility. Therefore, the feedback mechanism promotes fraud in a sense. Despite these facts, Rubin et al. [3] proposed a new reputation system for the online trading site in 005. Wang & Chiu et al. s [5] studies on Internet auction fraud focused on the detection of abnormal rating behaviors of known auction fraud groups. J. S. Chang et al. [4] proposed a segment model based on time-line. S. J. Lin et al. [] proposed a ranking concept and social network analysis to detect collusive groups in online auctions. Thus, they did not provide a way to detect the fraud on platform. Therefore, we proposed a method which can detect fraud on platform. First, we implemented a parallel web crawling agent to collect real users and transaction data. Second, we proposed a Reverse Graph and Common Trade Cumulative Graph theory to extract features of common transaction. Third, we extracted features of graph-level based on the Page-Rank and K-core clustering algorithm and replaced the PageRank values with reasonable TrustRank values, added BadRank values for identifying potential fraud users. Finally, we conducted experiments using the Random Forest and verified the performance of our method by applying it to real transaction cases. In summary, our proposed model is effective in identifying a potential fraudulent user on fraud platform. Related Work. Social Network Analysis PageRank As we all know, PageRank algorithm is the use of link information between pages, and given the global importance score finally [7]. The main idea of PageRank algorithm is the importance of a page is closely related to page pointing to it, but also the importance of pages is interactive and mutually reinforcing. The definition of PageRank score of a page is: r(q) r(p) = α () o(q) q:(q,p) E wherein α is called damping or attenuation coefficient, N is the number of pages. With equal matrix equation can be expressed as: r = α M r + ( α) N N ()

3 P. Wang et al. / Journal of Information & Computational Science :7 (05) where, N is a size N N unit matrix. Seen from Eq. (), the PageRank score of a page p consists of two parts: one part comes from pointing to the page p; the other score are equal for all pages, and are static. All pages PageRank score can be calculated by Eq. (), iteration will tend to converge in the strict mathematical sense. However, in the actual calculation, usually tend to set a fixed number of iterations M. For this ordinary PageRank calculation process, to initial each page with equal score, and does not change in the iterative process. But may also be given to the page initially unequal scores that can be obtained or statistical analysis from prior knowledge, the formula can be expressed as: r = α T r + ( α) d (3) where, d is a static vector satisfied d(i) 0, i d(i) = (i =,..., N), d(i) represents the initial PageRank score of i-th page. This calculation method is called Biased PageRank. TrustRank TrustRank is a link analysis-based technique for semi-automated detection of spam pages, co-sponsored published by Yahoo! and Stanford University researchers in 004 [], and applied for a patent in 00. TrustRank value calculation formula used in the page can be represented as follows: t = β T t + ( β) s (4) where, t is a vector of all pages TrustRank value, β is the attenuation coefficient, s is a vector of static TrustRank, it is the initial value of all pages. BadRank BadRank value can be represented as follows: b = β U b + ( β) s (5) where U is the transpose of original web connection graph, β and s with the same meaning as in Eq. (5). BadRank algorithm is basically the same process with TrustRank algorithm. K-core K-core is a hot topic in social network research. In order to extract highly relevant sub-structures from complex social networks, such as community, groups, and core, and to find the relationship between these sub-structures, but also helps to describe complex network topology of the real world using this decomposition. In these respects, K-core is a basic and important concept. K-core is used widely in the social and behavioral sciences as well as social network clustering [0] and describes the evolution of sparse graphs [], it is also being used in bioinformatics and network visualization []. K-Core is a maximal sub-graph in which each node is adjacent to at least k other points. It is also thought to be an essential complement to the measurement of density, which may not capture many of the features of the global network. The mathematic definition of k-core is: Definition Let G = (V, L) be a graph. V is the set of vertices and L is the set of lines (edges or arcs). We will denote n = V, m = L. A sub-graph H = (W, L W ) induced by the set W is a k-core or a core of order k if and only if W : deg H (v) k, and H is the maximum sub-graph with this property. K-core can be used to describe the position of the vertex (the core or edge) in graph G, the greater the values of the vertices of the K-core, indicating that the closer the central vertex of graph G [3]. As shown in Fig., it represents simple graph K-core values decomposition. In the drawing the red vertex is 3, and are the core of graph.

4 55 P. Wang et al. / Journal of Information & Computational Science :7 (05) core -core 3-core corenness corenness corenness 3 Fig. : Sketch of the K-core decomposition for a small graph. Development of E-commerce Fraud E-commerce fraud can be divided into multiple accounts fraud, groups of fraudulent accomplices and fraudulent platform. Fraudsters created a lot of accounts for avoid fraud detection. Fraudsters made use of accomplices, who behave like honest users, except that they interact heavily with a small set of fraudsters in order to boost their reputation. Fraudulent platform provided a common place for users. Fraudsters Accom plices Honest Fraudulent platform Transaction platform Fig. : Fraudsters and accomplices form a near Bipartite Cores graph Fig. 3: The relationship between the users with different platforms 3 Detection Model 3. Reverse Graph (RG) Theory Through the analysis, the most popular form of fraud is the platform of fraud, in order to be able to more clearly reflect this fact and to quantify reasonably, in this paper we creatively proposed Reverse Graph (RG) theory, RG can well reflect the common transactions between users. Definition Graph G = (V, E, W ), e i, e j E, e i = (v i,, v i, ), e j = (v j,, v j, ), v i,, v i,, v j,, v j, V. If and only if v i, = v j,, then there are vertices v i,, v j, V and edges e i,j = (v i,, v j, ) E in the Reverse Graph generated by Graph G, and W (e i,j) accumulate min{w (e i ), W (e j )}, where W (e (i,j) ) denotes the weights of e i,j, W (e i )W (e j ) denotes the weights of e i and e j in graph G respectively. As shown in Fig. 4, it is a processes of simple undirected graph transformed to reverse graph, when generating reverse graph from original graph, consider the massive data and the time complexity, we choose a parallel method by using MapReduce. According to the weighted the edges E of original graph G to generate weighted adjacency matrix A, then generate Reverse Graph G. For the vertex weights in reverse graph can be expressed as the sum of edges weights

5 P. Wang et al. / Journal of Information & Computational Science :7 (05) e e, ' 3 e 3 (a) G (b) Reverse graph G' Fig. 4: The transformation to reverse graph on a simple undirected graph Original graph Adjacency matrix (edge weighted) Reverse graph (edge weighted) 9 3 [(,),(9,),(3,4)] 9 [] 4 [(,3)] 9 [(,),(,)] 3 3 [(,),(,)] 3 Fig. 5: The process of generating reverse graph from original graph connected to the vertex. We found that the weight equals of an edge is no sense, so we should minus when calculate the vertex weights in reverse graph. Reverse graph vertex v can be expressed as the weight W (v ), calculated by the above rules, mathematically: W (υ ) = (W (υ, υt) ) () (υ,υ t ) E W (υ, υ t) is the weight of edge W (υ ). 3. Common Trade Cumulative Graph As the analysis of above, the reverse graph theory can play a good role in detecting fraud accomplices, but this feature is not a good measure of behavior of fraudsters. In fraud groups, fraudsters will have a number of accomplices to improve the credibility of fraudsters, while their accomplices will trade with the honesty to improve their credibility. For the above phenomenon, we can accumulate the common transactions users when the user has common transactions, it will measure the sellers in transaction. In order to express clearly and have a reasonable quantification we proposed Common Trade Cumulative Graph (CTCG). Definition 3 Directed graph G = (V, E, W ), v k V, if and only if exist edges e i,k, e j,k connected with vertex v k and e i,k, e j,k E, v i, v j V, where e i,k = v i, v k, e j,k = (v j, v k ), then CTCG can be express as G = (V, E, W ), W (v k ) cumulative max{0, (W ((v i, v j ))-C)}. C is constant, W (v i, v j ) is the weight of reverse graph of directed graph G, W (v k ) is the weight of v k V in graph G, W (v i, v j ) is weight of edge (v i, v j ) G (E ) in reverse graph of directed graph. As shown in Fig., it is a processes of simple undirected graph transformed to CTCG.

6 55 P. Wang et al. / Journal of Information & Computational Science :7 (05) (a) Original graph G 5 0 (b) Reverse graph G' (c) CTCG G' Fig. : The transformation to CTCG on a simple directed graph 3.3 The Feature Set By previous introduction of Reverse Graph, Common Trade Cumulative Graph and features of SNA and user-level, the main attributes is designed for the currently popular fraudulent platform but also can be used to detect fraud groups. Meanwhile, the feature set is relatively streamlined, and selected a number of common attributes for buyers and sellers, also designed a number of differentiated features, so that the features can be a good representation of transaction behavior for buyer/seller in platform. Table : Feature set Feature attributes Seller Buyer Tips W (u) Y The weights of vertex in reverse graph W (u) Y The weights of vertex in CTCG TrustRank Y BadRank Y K-core Y Y Mean Y Y Variance Y Y Frequency Y Ratio of trading Y Shop conversion rate Y Getting from trading platform 4 Experiments 4. Datasets By crawlers to obtain user information and transaction records, then cleanse data, conversion and statistics, using artificial means to get black and white lists in experimental procedures, these data form the entire experimental data set, but for each statistical time there is a corresponding subdata sets. As shown in Fig. 7, which indicates stitches of data changes with month in the data set. Among them, the number of buy/seller and the number of transactions corresponds to the

7 P. Wang et al. / Journal of Information & Computational Science :7 (05) vertical axis on the left, while the blacklist and cumulative blacklists number corresponds to the vertical axis on the right. Users Cumulative blacklists Number of transactions 000 Buyers/sellers Blacklist Month Fig. 7: The size of dataset among months Blacklist 4. Model Evaluation and Analysis Accuracy is often used evaluation criteria in classification. It can reflect overall classification performance on data set classifier, but cannot reflect the excellent performance of the unbalanced datasets classification. For example, the data set contains 000 samples, of which 0 is positive type, the rest of data set is negative. If there is a classification of all samples will be divided into the negative type, although this can be obtained 99% accuracy, but in fact this classification is without any effect. Therefore, the classification of unbalanced data sets, we need to put forward a more reasonable evaluation criterion. Commonly used classification evaluation criteria are: Precision, Recall and F-measure values. These standards are calculated as follows. During the experiment, we select data sets for training in November 0, using the training model to predict December data, by changing the values of M (the number of months) and N (the number of trees of random forest) to observe the predicted F-measure value, and model training and prediction time. Fig. (a) shows the tree in the forest when the random number N = 300, the relationship between classification performance and M. And the amount of trees in the random forest Fig. 9 shows the use of the proposed model for detection of users behavior on the ebay platform in 0. Fig. 9 shows that, with increase of positive samples (fraudster) the accuracy of current model for the detection of future user behavior increasing obviously. Detection model obtain precision Precision = 5.% and recall Recall =.%, therefore, it is reasonable and effective for using cumulative feature vector of fraudsters. In this paper, we proposed Reverse Graph and Common Trade Cumulative Graph, it reflected the common trading behavior of buyers and sellers, and common transaction users of sellers. As shown in Fig. 0, it indicated the impaction of these two important features. We can know that these two features will improve the performance of model Table : The confusion matrix of binary classification Collusive account Normal account Collusive account tp (true positive) fp (false positive) Normal account fn (false negative) tn (true negative)

8 50 P. Wang et al. / Journal of Information & Computational Science :7 (05) N=300 M=3 F-measure F-measure Time (h) (a) Month Time (h) F-measure F-measure Time (h) (b) The numberof trees 5 Time (h) 4 3 Fig. : The size of dataset among months F-measure Precision F-measure Recall Feb Apr JunAugOct Dec F-measure CTCG Without Feb Apr JunAugOct Dec Fig. 9: The performance of user behavior detection on ebay in 0 Fig. 0: Effect of the introduction of two attributes to the detection performance We choose model of random forest as classification algorithm, due to the random forest for unbalanced classification data set has inherent advantages, in this part we compared the other classification algorithms, and Table 3 shows the performance comparison of various algorithms, the neural network using BP network, SVM polynomial kernel function selected, compared to other algorithm random forests has good performance. C5.0 decision tree algorithm was worst performance mainly due to the number of samples and artificial quantization. Table 3: The performance of comparing different algorithms Method Precision Recall F-measure Random Forest Neural Network SVM C5.0 Decision Tree Detection model presented in this paper is mainly based on SNA, it can detect fraud of platform. In order to prove the superiority of the performance of the proposed fraud detection models, we compared the other SNA detection algorithms. We compared the F-measure, Precision and recall of Wang & Chiu [5], Lin [], Wang et al. [5] proposed a method can achieve high precision, this is mainly due to the stringent sub-graph on the larger transaction graph. However, fraud and allies distributed all network and it not easy to form sub-graph on current fraudulent transaction platform, so Wang will miss a lot of fraudsters, greatly reduce the recall. For the model of

9 P. Wang et al. / Journal of Information & Computational Science :7 (05) Lin et al. [], they did not consider the imbalance classification, there were a lot of limitations on massive imbalance trading platform. The proposed model not only can detect traditional fraudulent groups, but also can detect the fraudulent groups on popular fraudulent platform in current, this model can also be used in imbalance classification, and on this basis we achieved good detection performance. We included the K-core attributes used by Wang, but also includes the value of the deformation TrustRank value used by Lin, in addition to this we introduced the BadRand [9], proposed attributes of common transaction behavior for detection of the fraud platform. We designed a parallel algorithm based on MapReduce for calculating the massive data. In summary, the time complexity of this study is a bottleneck in the model, but due to the feature extraction and model training can be computed by a good parallel algorithm, so the machine can be expanded to reduce the time complexity easily Recall Wang Lin Our Sept Oct Nov Dec Performance comparing Precision Wang Lin Our Sept Oct Nov Dec Performance comparing F-measure Wang Lin Our Sept Oct Nov Dec Performance comparing Fig. : The performance compared to the related approaches 5 Conclusions A series of cost-effective measures have been proposed in this study for improving the efficiency of fraud detection in e-commerce. By analyzing and researching the behavior of fraudulent users on the trading platform, we proposed the Reverse Graph theory and CTCG to extract common features of transactions. We designed the MapReduce parallel algorithms to calculate the eigenvalues based on these two theories in experiments. In theory and practice of systems on the basis of other scholars, we selected some important features of the user diagram level, including SNA in TrustRank values, BadRank values and K-core values. The TrustRank and BadRank values are used to measure the user s credibility, while K-cores value is used to find the close connected sub-graphs in the transactions data. This paper presents an efficient parallel crawler system for obtaining user and transaction data. The crawler system is mainly based on Java distributed design and multi-threading mechanism, and it uses the trade blacklist published on the web site (fraudster) as the initial crawler users and hierarchical traversal priority to access other user data, so that the system can get more practical values of user data in the shortest time. References [] ebay Inc. Reports Strong Third Quarter 03 Results, [] Internet Crime Complaint Center Annual Report,

10 5 P. Wang et al. / Journal of Information & Computational Science :7 (05) [3] S. Rubin, M. Christodorescu, An auctioning reputation system based on anomaly, Proceedings of the th ACM Conference on Computer and Communications Security, Alexandria, 005, [4] J. S. Chang, W. H. Chang, An early fraud detection mechanism for online auctions based on phased modeling, Proceedings of the 009 International Workshop on Mobile Systems E-Commerce and Agent Technology, Taipei, Taiwan, 009, [5] J. C. Wang, C. C. Chiu, Recommending trusted online auction sellers using social network analysis, Expert Systems with Applications, 34(3), 00, -79 [] S. J. Lin, Y. Y. Jheng, C. H. Yu, Combining ranking concept and social network analysis to detect collusive groups in online auctions, Expert Systems with Applications, 39(0), 03, [7] L. Page, S. Brin, The PageRank Citation Ranking: Bringing Order to the Web, Tech. Rep., Stanford University, 99 [] Zoltn Gyngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with TrustRank, Proceedings of the 30th International Conference on Very Large Databases (VLDB), 004, [9] J. Botelho, C. Antunes, Combining social network analysis with semi-supervised clustering: A case study on fraud detection, Proceeding of Mining Data Semantics (MDS 0) in Conjunction with SIGKDD, 0, -7 [0] Seidman B. Stephen, Network structure and minimum degree, Social Networks, 5(3), 93, 9-7 [] B. Bollobas, The Evolution of Sparse Graphs, Graph Theory and Combinatorics, Academic Press, London, 94, [] Marco Gaertler, Patrignani Maurizio, Dynamic analysis of the autonomous system graph, The nd International Workshop on Inter-Domain Performance and Simulation (IPS), 004, 3-4 [3] Yanchao Zhang, Research on Information Dissemination and Opinion Evolutionin the Social Networking Service, Beijing Jiaotong University, 0

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

A Novel Classification Approach for C2C E-Commerce Fraud Detection

A Novel Classification Approach for C2C E-Commerce Fraud Detection A Novel Classification Approach for C2C E-Commerce Fraud Detection *1 Haitao Xiong, 2 Yufeng Ren, 2 Pan Jia *1 School of Computer and Information Engineering, Beijing Technology and Business University,

More information

Computer Forensics Application. ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification

Computer Forensics Application. ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification Computer Forensics Application ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification Project Overview A new framework that provides additional clues extracted from images

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

1604 JOURNAL OF SOFTWARE, VOL. 9, NO. 6, JUNE 2014

1604 JOURNAL OF SOFTWARE, VOL. 9, NO. 6, JUNE 2014 1604 JOURNAL OF SOFTWARE, VOL. 9, NO. 6, JUNE 2014 Combining various trust factors for e-commerce platforms using Analytic Hierarchy Process Bo Li a, Yu Zhang b,, Haoxue Wang c, Haixia Xia d, Yanfei Liu

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Creditworthiness Analysis in E-Financing Businesses - A Cross-Business Approach

Creditworthiness Analysis in E-Financing Businesses - A Cross-Business Approach Creditworthiness Analysis in E-Financing Businesses - A Cross-Business Approach Kun Liang 1,2, Zhangxi Lin 2, Zelin Jia 2, Cuiqing Jiang 1,Jiangtao Qiu 2,3 1 Shcool of Management, Hefei University of Technology,

More information

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected] ABSTRACT This

More information

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the

More information

Using Clustering Techniques to Analyze Fraudulent Behavior Changes in Online Auctions

Using Clustering Techniques to Analyze Fraudulent Behavior Changes in Online Auctions 201O International Conference on Networking and Information Technology Using Clustering Techniques to Analyze Fraudulent Behavior Changes in Online Auctions Wen-Hsi Chang/TamKang University Graduate Institute

More information

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Subordinating to the Majority: Factoid Question Answering over CQA Sites Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

More information

Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

Data Mining Application for Cyber Credit-card Fraud Detection System

Data Mining Application for Cyber Credit-card Fraud Detection System , July 3-5, 2013, London, U.K. Data Mining Application for Cyber Credit-card Fraud Detection System John Akhilomen Abstract: Since the evolution of the internet, many small and large companies have moved

More information

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico Data Science Center Eindhoven Big Data: Challenges and Opportunities for Mathematicians Alessandro Di Bucchianico Dutch Mathematical Congress April 15, 2015 Contents 1. Big Data terminology 2. Various

More information

A Service Revenue-oriented Task Scheduling Model of Cloud Computing

A Service Revenue-oriented Task Scheduling Model of Cloud Computing Journal of Information & Computational Science 10:10 (2013) 3153 3161 July 1, 2013 Available at http://www.joics.com A Service Revenue-oriented Task Scheduling Model of Cloud Computing Jianguang Deng a,b,,

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

More information

Design call center management system of e-commerce based on BP neural network and multifractal

Design call center management system of e-commerce based on BP neural network and multifractal Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce

More information

Spam Host Detection Using Ant Colony Optimization

Spam Host Detection Using Ant Colony Optimization Spam Host Detection Using Ant Colony Optimization Arnon Rungsawang, Apichat Taweesiriwate and Bundit Manaskasemsak Abstract Inappropriate effort of web manipulation or spamming in order to boost up a web

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Research on the Performance Optimization of Hadoop in Big Data Environment

Research on the Performance Optimization of Hadoop in Big Data Environment Vol.8, No.5 (015), pp.93-304 http://dx.doi.org/10.1457/idta.015.8.5.6 Research on the Performance Optimization of Hadoop in Big Data Environment Jia Min-Zheng Department of Information Engineering, Beiing

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

DON T FOLLOW ME: SPAM DETECTION IN TWITTER

DON T FOLLOW ME: SPAM DETECTION IN TWITTER DON T FOLLOW ME: SPAM DETECTION IN TWITTER Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA [email protected] Keywords: Abstract: social

More information

Detection of Collusion Behaviors in Online Reputation Systems

Detection of Collusion Behaviors in Online Reputation Systems Detection of Collusion Behaviors in Online Reputation Systems Yuhong Liu, Yafei Yang, and Yan Lindsay Sun University of Rhode Island, Kingston, RI Email: {yuhong, yansun}@ele.uri.edu Qualcomm Incorporated,

More information

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:

More information

Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model

Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model Trung Le 1, Ba Quy Tran 2, Hanh Dang Thi My 3, Thanh Hung Ngo 4 1 GSR, Information System Lab., University of

More information

SGL: Stata graph library for network analysis

SGL: Stata graph library for network analysis SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Enhancing the Ranking of a Web Page in the Ocean of Data

Enhancing the Ranking of a Web Page in the Ocean of Data Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India [email protected] In today

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

TRTML - A Tripleset Recommendation Tool based on Supervised Learning Algorithms

TRTML - A Tripleset Recommendation Tool based on Supervised Learning Algorithms TRTML - A Tripleset Recommendation Tool based on Supervised Learning Algorithms Alexander Arturo Mera Caraballo 1, Narciso Moura Arruda Júnior 2, Bernardo Pereira Nunes 1, Giseli Rabello Lopes 1, Marco

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Big Data Analytics Process & Building Blocks

Big Data Analytics Process & Building Blocks Big Data Analytics Process & Building Blocks Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 10, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos

More information

An Imbalanced Spam Mail Filtering Method

An Imbalanced Spam Mail Filtering Method , pp. 119-126 http://dx.doi.org/10.14257/ijmue.2015.10.3.12 An Imbalanced Spam Mail Filtering Method Zhiqiang Ma, Rui Yan, Donghong Yuan and Limin Liu (College of Information Engineering, Inner Mongolia

More information

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY

More information

The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

An Efficient Load Balancing Technology in CDN

An Efficient Load Balancing Technology in CDN Issue 2, Volume 1, 2007 92 An Efficient Load Balancing Technology in CDN YUN BAI 1, BO JIA 2, JIXIANG ZHANG 3, QIANGGUO PU 1, NIKOS MASTORAKIS 4 1 College of Information and Electronic Engineering, University

More information

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015 Virtual Site Event Predictive Analytics: What Managers Need to Know Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015 1 Ground Rules Virtual Site Ground Rules PMI Code of Conduct applies for this

More information

Fault Analysis in Software with the Data Interaction of Classes

Fault Analysis in Software with the Data Interaction of Classes , pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]

More information

Research and Implementation of View Block Partition Method for Theme-oriented Webpage

Research and Implementation of View Block Partition Method for Theme-oriented Webpage , pp.247-256 http://dx.doi.org/10.14257/ijhit.2015.8.2.23 Research and Implementation of View Block Partition Method for Theme-oriented Webpage Lv Fang, Huang Junheng, Wei Yuliang and Wang Bailing * Harbin

More information

Map/Reduce Affinity Propagation Clustering Algorithm

Map/Reduce Affinity Propagation Clustering Algorithm Map/Reduce Affinity Propagation Clustering Algorithm Wei-Chih Hung, Chun-Yen Chu, and Yi-Leh Wu Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology,

More information

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, [email protected] Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents

More information

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal [email protected] Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal [email protected]

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative

More information

Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme

Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme Chunyong Yin 1,2, Yang Lei 1, Jin Wang 1 1 School of Computer & Software, Nanjing University of Information Science &Technology,

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information