Part III: Graph-based spam/fraud detection algorithms and apps
|
|
|
- Darcy Mathews
- 10 years ago
- Views:
Transcription
1 Part III: Graph-based spam/fraud detection algorithms and apps L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 159
2 Part III: Outline Algorithms: relational learning Collective classification Relational inference Applications: fraud and spam detection Online auction fraud Accounting fraud Fake review spam Web spam L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 160
3 Collective classification (CC) Anomaly detection as a classification problem spam/non-spam , malicious/benign web page, fraud/legitimate transaction, etc. Often connected objects guilt-by-association Label of object o in network may depend on: Attributes (features) of o Labels of objects in o s neighborhood Attributes of objects in o s neighborhood CC: simultaneous classification of interlinked objects using above correlations L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 161
4 Problem sketch Graph (V, E) Nodes as variables X: observed Y: TBD Edges observed relations Goal: label Y nodes?? nodes; web pages, edges; hyperlinks, labels; SH or CH: student/course page; features nodes are keywords; ST: student, CO: course, CU: curriculum, AI: artificial intelligence L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 162
5 Collective classification applications Document classification Part of speech tagging Link prediction Optical character recognition Image/3Ddata segmentation Chakrabarti+ 98, Taskar+ 02 Entity resolution in sensor networks Spam and fraud detection Lafferty+ 01 Taskar+ 03 Taskar+ 03 Anguelov+ 05, Chechetka+ 10 Chen+ 03 Pandit+ 07, Kang+ 11 L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 163
6 Taxonomy Graph Anomaly Detection Static graphs Dynamic graphs Graph algorithms Plain Feature based Structural features Recursive features Community based Attributed Structure based Substructures Subgraphs Community based Plain Distance based Feature-distance Structure distance Structure based phase transition Learning models RMNs PRMs RDNs MLNs Inference Relational netw. classification Iterative classification Gibbs sampling Belief propagation L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 164
7 Collective classification models Relational Markov Networks (RMNs) Taskar, Abbeel, Koller 03 Relational Dependency Networks (RDNs) Neville&Jensen 07 Probabilistic Relational Models (PRMs) Friedman, Getoor, Koller, Pfeffer+ 99 Markov Logic Networks (MLNs) Richardson&Domingos 06 L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 165
8 Collective classification inference Exact inference is NP hard for arbitrary networks Approximate inference techniques [in this tutorial] Relational classifier Macskassy&Provost 03,07 Iterative classification alg. (ICA) Neville&Jensen 00, Lu&Getoor 03, McDowell+ 07 Gibbs sampling IC Gilks et al. 96 Loopy belief propagation Yedidia et al. 00 Note: All the above are iterative L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 166
9 (prob.) Relational network classifier A simple relational classifier Class probability of Yi is a weighted average of class probabilities of its neighbors Repeat for each Yi and label c prn challenges: P( Y i c) 1 Z ( Y, Y i j w( Y ) E Macskassy&Provost 03 Convergence not guaranteed Some initial class probabilities should be biased or no propagation Cannot use attribute info i, Y j ) P( Y L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 167 j c)
10 Iterative classification Main idea: classify node Yi based on its attributes as well as neighbor set Ni s labels Convert each node Yi to a flat vector ai Various #neighbors aggregation count mode proportion mean exists L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 168
11 Iterative classification a 1 t=0 count a 2 t=0 L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 169
12 IC Bootstrap Iterative classification Main idea: classify Yi based on Ni Convert each node Yi to a flat vector ai Various #neighbors aggregation Use local classifier f(ai) (e.g., SVM, knn, ) to compute best value for yi Repeat for each node Yi Reconstruct feature vector ai Update label to f(ai) (hard assignment) Until class labels stabilize or max # iterations Note: convergence not guaranteed L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 170
13 Iterative classification 1 a 1 t=1 2 f(a 1 t=1 )=SH f(a 2 t=0 )=CH L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 171
14 Iterative classification f(a 1 t=1 )=SH a 2 t=1 1 1 f(a 2 t=1 )=CH L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 172
15 Sample Burn-in Bootstrap Gibbs sampling Main idea: Convert each node Yi to a flat vector ai Use local classifier f(ai) to compute best value for yi Repeat B times for each node Yi Reconstruct feature vector ai Update label to f(ai) (hard assignment) Repeat S times for each node Yi Sample yi from f(ai) Increase count c(i, yi) by 1 Assign to each Yi label L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 173
16 IC and GS challenges Feature construction for local classifier f f often needs fixed-length vector choice of aggregation (avg, mode, count, ) choice of relations (in-, out-links, both) choice of neighbor attributes (all?, top-k confident?) Local classifier f requires training choice of classifier (LR, NB, knn, SVM, ) Node ordering for updates (random, diversity based) Convergence Run time (many iterations for GS) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 174
17 Collective classification inference Exact inference is NP hard for arbitrary networks Approximate inference techniques [in this tutorial] Relational classifier Macskassy&Provost 03,07 Iterative classification alg. (ICA) Neville&Jensen 00, Lu&Getoor 03, McDowell+ 07 Gibbs sampling IC Gilks et al. 96 Loopy belief propagation Yedidia et al. 00 Note: All the above are iterative L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 175
18 Relational Markov Nets Undirected dependencies Potentials on cliques of size 1 Potentials on cliques of size 2 (label-attribute) (label-observed label) (label-label) For pairwise RMNs max clique size is 2 L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 176
19 pairwise Markov Random Field For an assignment y to all unobserved Y, pmrf is associated with probability distr: known potential Node labels as random variables prior belief (1-clique potentials) compatibility potentials (label-label) observed potentials (label-observed label) (label-attribute) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 177
20 L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 178
21 pmrf interpretation Defines a joint pdf of all unknown labels P(y x) is the probability of a given world y Best label yi for Yi is the one with highest marginal probability Computing one marginal probability P(Yi = yi) requires summing over exponential # terms #P problem approximate inference loopy belief propagation L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 179
22 Loopy belief propagation Invented in 1982 [Pearl] to calculate marginals in Bayes nets. Also used to estimate marginals (=beliefs), or most likely states (e.g. MAP) in MRFs Iterative process in which neighbor variables talk to each other, passing messages I (variable x1) believe you (variable x2) belong in these states with various likelihoods When consensus reached, calculate belief L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 180
23 Loopy belief propagation details j 1) Initialize all messages to 1 2) Repeat for each node: i k k k 3) When messages stabilize : L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 181
24 m12(sh) = (0.0096* *0.1) / (m12(sh) + m12(ch)) ~0.35 m12(ch) = (0.0096* *0.9) / (m12(sh) + m12(ch)) ~0.65 L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 182
25 Loopy belief propagation Advantages: Easy to program & parallelize General: can apply to any graphical model w/ any form of potentials (higher order than pairwise) Challenges: Convergence is not guaranteed (when to stop) esp. if many closed loops Potential functions (parameters) require training to estimate learning by gradient-based optimization: convergence issues during training L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 183
26 Taxonomy Graph Anomaly Detection Static graphs Dynamic graphs Graph algorithms Plain Feature based Structural features Recursive features Community based Attributed Structure based Substructures Subgraphs Community based Plain Distance based Feature-distance Structure distance Structure based phase transition Learning models RMNs PRMs RDNs MLNs Inference Relational netw. classification Iterative classification Gibbs sampling Belief propagation Applications Fraud detection Spam detection L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 184
27 Part III: Outline Algorithms: relational learning Collective classification Relational inference Applications: fraud and spam detection (1) Online auction fraud (2) Accounting fraud (3) Fake review spam (4) Web spam L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 185
28 (1) Online auction fraud Chau et al. 06 Auction sites: attractive target for fraud 63% complaints to Federal Internet Crime Complaint Center in U.S. in 2006 Average loss per incident: = $385 Often non-delivery fraud: $$$ Seller Buyer L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 186 Chau+PKDD 06 modified with permission
29 Online auction fraud detection Insufficient solution: Look at individual features, geographic locations, login times, session history, etc. Hard to fake: graph structure Capture relationships between users Q: How do fraudsters interact with other users and among each other? in addition to buy/sell relations, there is a feedback mechanism L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 187
30 Feedback mechanism Each user has a reputation score Users rate each other via feedback $$$ Reputation score: = 71 Reputation score: 15-1 = 14 Q: How do fraudsters game the feedback system? L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 188 Chau+PKDD 06 modified with permission
31 Auction roles Do they boost each other s reputation? They form near-bipartite cores (2 roles) accomplice trades w/ honest, looks legit fraudster trades w/ accomplice fraud w/ honest L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) Chau+PKDD modified with permission
32 Detecting online fraud How to find near-bipartite cores? How to find roles (honest, accomplice, fraudster)? Use Belief Propagation! How to set BP parameters (potentials)? prior beliefs: prior knowledge, unbiased if none compatibility potentials: by insight L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 190 Chau+PKDD 06 modified with permission
33 BP in action Initialize prior beliefs of fraudsters to P(f)=1 At each iteration, for each node, compute messages to its neighbors Initialize other nodes as unbiased Continue till convergence Compute beliefs, use most likely state L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 191 Chau+PKDD 06 modified with permission
34 Computing beliefs roles P(honest) P(accomplice) P(fraudster) A E B C D L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 192 Chau+PKDD 06 modified with permission
35 (2) Accounting fraud McGlohon et al. 09 Problem: Given accounts and their transaction relations, find most risky ones Inventory Accounts Payable Cash Bad Debt Non-Trade A/R Accounts Receivable Revenue Orange Cty Revenue Los Angeles Revenue San Diego Revenue San Francisco Revenue Other L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 193
36 Accounting fraud detection Domain knowledge to flag certain nodes Assume homophily ( guilt by association ) Use belief propagation prior beliefs 2 states (risky R, normal NR) final beliefs end risk scores compatibility potentials L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 194
37 Social Network Analytic Risk Evaluation Prior beliefs (noisy domain knowledge) Round-dollar entries Many late postings Large number of returns Many entries reversed late in period f # Flags Compatibility potentials (by homophily) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 195 McGlohon+KDD 09 modified with permission
38 Social Network Analytic Risk Evaluation Before After Ignore, no corroborating evidence SNARE Focus on staff posting to A/R from headquarters True positive rate Baseline (flags only) False positive rate 1380 accounts 3820 transactions L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 196 McGlohon+KDD 09 modified with permission
39 (3) Fake review spam Review sites: attractive target for spam Often hype/defame spam Paid spammers L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 197
40 Fake review spam detection Behavioral analysis [Jindal & Liu 08] individual features, geographic locations, login times, session history, etc. Language analysis [Ott et al. 11] use of superlatives, many self-referencing, rate of misspell, many agreement words, Hard to fake: graph structure Capture relationships between reviewers, reviews, stores L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 198
41 Graph-based detection Reviewer r trustiness T(r) [Wang et al. 11] Trustiness T(r) Honesty Hr L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 199
42 Graph-based detection Store s reliability R(s) review v rating L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 200
43 Graph-based detection Review v honesty H(v) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 201
44 Graph-based detection Reviewer r trustiness T(r) Store s reliability R(s) Review v honesty H(v) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 202
45 Graph-based detection Algorithm: iterate trustiness, reliability, and honesty scores in a mutual recursion similar to Kleinberg s HITS algorithm non-linear relations Challenges: Convergence not guaranteed Cannot use attribute info Parameters: agreement time window t, review similarity threshold (for dis/agreement) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 203
46 Part III: Outline Algorithms: relational learning Collective classification Relational inference Applications: fraud and spam detection Online auction fraud Accounting fraud Fake review spam Web spam L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 204
47 (4) Web spam Spam pages: pages designed to trick search engines to direct traffic to their websites L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 205
48 Web spam Challenges: pages are not independent what features are relevant? small training set noisy labels (consensus is hard) content very dynamic L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 206
49 Web spam Many graph-based solutions TrustRank SpamRank Anti-trustRank Propagating trust and distrust Know your neighbors Guilt-by-association [Gyöngyi et al. 04] [Benczur et al. 05] [Krishnan et al. 06] [Wu et al. 06] [Castillo et al. 07] [Kang et al. 11] L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 207
50 Web spam Main idea: exploit homophily and reachability L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 208
51 TrustRank: combating web spam Main steps: Find seed set S of good pages (e.g. using oracle) Compute trust scores by biased (personalized) PageRank from good pages Intuition: spam pages are hardly reachable from trustworthy pages Hard to acquire direct inlinks from good pages [Gyöngyi et al. 04] L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 209
52 TrustRank mathematically Remember PageRank score of a page p: details In closed form: damping factor Personalized PageRank: Transition matrix 1/ S for S nodes of interest (seeds) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 210
53 SpamRank: link spam detection [Benczur et al. 05] Intuition: PageRank distribution of good set of supporters should be power law (as in entire Web) Page v is a supporter of page i if: PPRi(v) > 0 For each page i get PageRank scores of all supporters of i test PageRank histogram for power law calculate irregularity score s(i) SpamRank PPR(s) Note: no user labeling (as for TrustRank) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 211 i
54 Know your neighbors [Castillo et al. 07] Graph-based techniques can help improve feature-based classifiers Clustering Feature Extraction Classification Smoothing Propagation Graph features: reciprocity, assortativity, TrustRank, PageRank, Content features: fraction visible text, compression rate, entropy of trigrams, Stack Graphical Learning L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 212
55 Smoothing clustering Split graph into many clusters. (e.g. by METIS) If majority of nodes in cluster are spam, then all pages in cluster are spam. L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 213
56 Smoothing propagation Propagate predictions using random walks. PPR(s); s(i): spamicity score by baseline classifier (backward and/or forwards steps) L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 214
57 Smoothing stacked learning Create additional features by combining predictions for related nodes e.g., avg. spamicity score p of neighbors r(h) of h h similar to prn classifier by Macskassy&Provost can repeat, although 1-2 steps add most gain L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 215
58 Part III: References (alg.s and app.s) P. Sen,G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective Classification in Network Data. AI Magazine, 29(3):93-106, S. A. Macskassy and F. Provost. A Simple Relational Classifier. KDD Workshops, S. Pandit, D. H. Chau, S. Wang, C. Faloutsos. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. WWW, M. McGlohon, S. Bay, M. G. Anderle, D. M. Steier, C. Faloutsos: SNARE: a link analytic system for graph labeling and risk detection. KDD, Zhongmou Li, Hui Xiong, Yanchi Liu, Aoying Zhou. Detecting Blackhole and Volcano Patterns in Directed Networks. ICDM, pp , L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 216
59 Part III: References (alg.s and app.s) (2) G. Wang, S. Xie, B. Liu, P. S. Yu. Review Graph based Online Store Review Spammer Detection. ICDM, Zoltán Gyöngyi, Hector Garcia-molina, Jan Pedersen. Combating web spam with TrustRank. VLDB, Andras A. B., Karoly C., Tamas S., Mate U. SpamRank - Fully Automatic Link Spam Detection. AIRWeb, Vijay Krishnan, Rashmi Raj: Web Spam Detection with Anti-Trust Rank. AIRWeb, pp , Baoning W., Vinay G., and Brian D. D.. Propagating Trust and Distrust to Demote Web Spam. WWW, Castillo, C. and Donato, D. and Gionis, A. and Murdock, V. and Silvestri, F. Know your neighbors: web spam detection using the web topology. SIGIR, pp , L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 217
60 Other references Matthew J. Rattigan, David Jensen: The case for anomalous link discovery. SIGKDD Explorations 7(2): (2005) Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, Philip S. Yu. Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs. SDM, Shetty, J. and Adibi, J. Discovering Important Nodes through Graph Entropy: The Case of Enron Database. KDD, workshop, Boden B., Günnemann S., Hoffmann H., Seidl T. Mining Coherent Subgraphs in Multi-Layer Graphs with Edge Labels. KDD K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, L. Li, C. Faloutsos. RolX: Structural Role Extraction & Mining in Large Graphs. KDD, J. Neville, O. Simsek, D. Jensen, J. Komoroske, K. Palmer, and H. Goldberg. Using Relational Knowledge Discovery to Prevent Securities Fraud. KDD, pp , H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Rec. Let., 19(3-4): , L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 218
61 Other references B. T. Messmer and H. Bunke. A new algorithm for error-tolerant subgraph isomorphism detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(5): , P. J. Dickinson, H. Bunke, A. Dadej, M. Kraetzl: Matching graphs with unique node labels. Pattern Anal. Appl. 7(3): (2004) Peter J. Dickinson, Miro Kraetzl, Horst Bunke, Michel Neuhaus, Arek Dadej: Similarity Measures For Hierarchical Representations Of Graphs With Unique Node Labels. IJPRAI 18(3): (2004) Kraetzl, M. and Wallis, W. D., Modality Distance between Graphs. Utilitas Mathematica, 69, , Gaston, M. E., Kraetzl, M. and Wallis, W. D., Graph Diameter as a Pseudo-Metric for Change Detection in Dynamic Networks, Australasian Journal of Combinatorics, 35, , B. Pincombe. Detecting changes in time series of network graphs using minimum mean squared error and cumulative summation. ANZIAM J. 48, pp.c450 C473, L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 219
62 Tutorial Outline Motivation, applications, challenges Part I: Anomaly detection in static data Overview: Outliers in clouds of points Anomaly detection in graph data Part II: Event detection in dynamic data Overview: Change detection in time series Event detection in graph sequences Part III: Graph-based algorithms and apps Algorithms: relational learning Applications: fraud and spam detection L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 220
63 Conclusions Graphs are powerful tools to detect Anomalies Events Fraud/Spam in complex real-world data (attributes, (noisy) side information, weights, ) Nature of the problem highly dependent on the application domain Each problem formulation needs a different approach L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 221
64 Open challenges Anomalies in dynamic graphs dynamic attributed graphs (definitions, formulations, real-world scenarios) temporal effects: node/edge history (not only updates) Fraud/spam detection adversarial robustness cost (to system in measurement, to adversary to fake, to user in exposure) detection timeliness and other system design aspects; e.g. dynamicity, latency L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 222
65 Q & A [email protected] anomalies events fraud/spam L. Akoglu & C. Faloutsos Anomaly detection in graph data (ICDM'12) 223
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Online Fraud Detection Model Based on Social Network Analysis
Journal of Information & Computational Science :7 (05) 553 5 May, 05 Available at http://www.joics.com Online Fraud Detection Model Based on Social Network Analysis Peng Wang a,b,, Ji Li a,b, Bigui Ji
Spam Detection with a Content-based Random-walk Algorithm
Spam Detection with a Content-based Random-walk Algorithm ABSTRACT F. Javier Ortega Departamento de Lenguajes y Sistemas Informáticos Universidad de Sevilla Av. Reina Mercedes s/n 41012, Sevilla (Spain)
Big Data Analytics Process & Building Blocks
Big Data Analytics Process & Building Blocks Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 10, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
Top Top 10 Algorithms in Data Mining
ICDM 06 Panel on Top Top 10 Algorithms in Data Mining 1. The 3-step identification process 2. The 18 identified candidates 3. Algorithm presentations 4. Top 10 algorithms: summary 5. Open discussions ICDM
Top 10 Algorithms in Data Mining
Top 10 Algorithms in Data Mining Xindong Wu ( 吴 信 东 ) Department of Computer Science University of Vermont, USA; 合 肥 工 业 大 学 计 算 机 与 信 息 学 院 1 Top 10 Algorithms in Data Mining by the IEEE ICDM Conference
Insider Threat Detection Using Graph-Based Approaches
Cybersecurity Applications & Technology Conference For Homeland Security Insider Threat Detection Using Graph-Based Approaches William Eberle Tennessee Technological University [email protected] Lawrence
SOCIAL NETWORK DATA ANALYTICS
SOCIAL NETWORK DATA ANALYTICS SOCIAL NETWORK DATA ANALYTICS Edited by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA Kluwer Academic Publishers Boston/Dordrecht/London
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.
Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital
Comparison of Ant Colony and Bee Colony Optimization for Spam Host Detection
International Journal of Engineering Research and Development eissn : 2278-067X, pissn : 2278-800X, www.ijerd.com Volume 4, Issue 8 (November 2012), PP. 26-32 Comparison of Ant Colony and Bee Colony Optimization
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
AFRAID: Fraud Detection via Active Inference in Time-evolving Social Networks
AFRAID: Fraud Detection via Active Inference in Time-evolving Social Networks Véronique Van Vlasselaer Tina Eliassi-Rad Leman Akoglu Monique Snoeck Bart Baesens Department of Decision Sciences and Information
CMU SCS Large Graph Mining Patterns, Tools and Cascade analysis
Large Graph Mining Patterns, Tools and Cascade analysis Christos Faloutsos CMU Roadmap Introduction Motivation Why big data Why (big) graphs? Patterns in graphs Tools: fraud detection on e-bay Conclusions
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman 2 Intuition: solve the recursive equation: a page is important if important pages link to it. Technically, importance = the principal
Dan French Founder & CEO, Consider Solutions
Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Multi-Class and Structured Classification
Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output
The Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Spam Host Detection Using Ant Colony Optimization
Spam Host Detection Using Ant Colony Optimization Arnon Rungsawang, Apichat Taweesiriwate and Bundit Manaskasemsak Abstract Inappropriate effort of web manipulation or spamming in order to boost up a web
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
DON T FOLLOW ME: SPAM DETECTION IN TWITTER
DON T FOLLOW ME: SPAM DETECTION IN TWITTER Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA [email protected] Keywords: Abstract: social
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model
Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model Trung Le 1, Ba Quy Tran 2, Hanh Dang Thi My 3, Thanh Hung Ngo 4 1 GSR, Information System Lab., University of
Mimicking human fake review detection on Trustpilot
Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
How To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
AN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS
Chapter 1 AN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS Charu C. Aggarwal IBM T. J. Watson Research Center Hawthorne, NY 10532 [email protected] Abstract The advent of online social networks has been
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Social Prediction in Mobile Networks: Can we infer users emotions and social ties?
Social Prediction in Mobile Networks: Can we infer users emotions and social ties? Jie Tang Tsinghua University, China 1 Collaborate with John Hopcroft, Jon Kleinberg (Cornell) Jinghai Rao (Nokia), Jimeng
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
A Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
Introduction to Deep Learning Variational Inference, Mean Field Theory
Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos [email protected] Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap
Investigating Clinical Care Pathways Correlated with Outcomes
Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges
Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook
Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook Part 2:Mining using MapReduce Mining algorithms using MapReduce
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN
Big Data Analytics Building Blocks; Simple Data Storage (SQLite)
Big Data Analytics Building Blocks; Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Jan 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Introduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
Removing Web Spam Links from Search Engine Results
Removing Web Spam Links from Search Engine Results Manuel EGELE [email protected], 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
Web Mining using Artificial Ant Colonies : A Survey
Web Mining using Artificial Ant Colonies : A Survey Richa Gupta Department of Computer Science University of Delhi ABSTRACT : Web mining has been very crucial to any organization as it provides useful
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Combating Web Spam with TrustRank
Combating Web Spam with TrustRank Zoltán Gyöngyi Hector Garcia-Molina Jan Pedersen Stanford University Stanford University Yahoo! Inc. Computer Science Department Computer Science Department 70 First Avenue
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
