The Use of Merging Algorithm to Real Ranking for Graph Search
|
|
|
- Christina Hines
- 10 years ago
- Views:
Transcription
1 The Use of Merging Algorithm to Real Ranking for Graph Search A. Mohammad Reza Nami, B. Mehdi Ebadian Faculty of Electrical, Computer, and IT Engineering, Islamic Azad University- Qazvin Branch, Qazvin, IRAN ABSTRACT Ranking problem is becoming an important issue in many fields, especially in information retrieval. This paper presents an automatic technique for spam monitoring in the graph. The technique is based on combining information from two different sources: Truncated page rank and Semi-Streaming Graph Algorithms. In this paper we conduct further study on the heuristically ranking framework and provide measuring page rank of link farm. Twenty-six articles from 15 venues have been reviewed and classified within the taxonomy in order to organize and structure existing work in the field of Information Retrieval. Keywords Information retrieval (IR), Page rank (PR), Streaming Algorithms, Internet Marketing, Spam and Search Engine Optimization. Any attempt to deceive search engine's relevancy algorithm or "would not be done if search engines did not exist" So ethical attempt is different between SPAM and SEO (Search Engine Optimization). The relation between website and search engine administrator is adversarial. Stream graph algorithm: Suppose that we have a very large undirected, un-weighted graph (starting at hundreds of millions of vertices, ~10 edges per vertex), non-distributed and processed by single thread only and that I want to do breadth-first searches on it. I expect them to be I/O-bound, thus I need a good-for-bfs disk page layout, and disk space is not an issue. The searches can start on every vertex with equal probability. Intuitively that means minimizing the number of edges between vertices on different disk pages, which is a graph partitioning problem. The graph itself looks like spaghetti think of random set of points randomly interconnected, with some bias towards shorter edges. 1 Introduction Search engines have being become the most lucrative thing over the internet. Search engines are mediated between Web platform and information seeker. Search engines then rank Web pages to create short list of highquality result. On the other hand, large visits originate from search engines that most users just click on first few results. Therefore, creating high score page independently of their real merit. SPAM: Each new communication Media creates opportunity for sending unsolicited messages. Type of electronic spam includes spam, instant messaging (SPIM), internet telephony (SPIT), spamming by mobile phone, by fax, and so on. The request responses paradigms of HTTP so goal is deceive search engines. Figure 1. Link farm (Link-Base Web Spam (Topological Spam))
2 Web spam techniques classified two groups: content (keyword) spam, and link spam. Link spam changes the sites structure by creating link farm. Link farm is densely connected pages to deceiving ranking algorithm by improving one user in group. Our spam-detection algorithm target are pages which receive most link-base ranking by participating in link farms but little relationship with rest of the graph. Links may not be spam, by buying advertising or buying expired domains that used legitimate purposes. Topological spamming is spamming which achieved by using Link farm. Link-based and content-based analysis offers two orthogonal approaches. Weakness of link-based: For some pages that statistically close to non spam pages. Threats of link -based: Hybrid spam structure. Opportunity of link-based: Link farms are expensive. Weakness of content -based: less resilient to changes in spammer strategies. Threats of content -based: Hybrid spam structure, copy entire Web site (change few out-link) is inexpensive. So they should be used together. Figure 2. Web Graph and supporter Distribution. Distribution of the fraction of distinct supporters found at varying distances (normalized), obtained by backward breadth-first visits from a sample of nodes, in four large Web graphs. Number of new distinct supporter increases up to certain distance, and the decreases, graph is limit in size and we approach effective diameter. 2 Algorithm Framework Fetterly et al. [2004] hypothesized statistical distribution about pages is a good way to detecting spam pages, "in a number of these distribution, outlier values are web spam". Baeza-Yates et al. [2006] introduce damping function for rank propagation. We want to explore the neighborhood of page and link structure artificially generated or not. Two algorithm challenges: 1. how to simultaneously compute statistics neighborhood of each page in huge web graph 2. how use it to detect and demote web spam 2.1 Supporter If there is a link page x to page y, the author of page x is recommending page y, the x is supporter of page y at distance d, if shortest path from x to y formed by links in E has length d. Figure 3. Different Bucket's page ranks. Calculate Page Rank (PR) of pages in the eu.int sub domain to showing different distribution in high and low ranked sites. Breadth-first search (BFS) instead of computing the distribution for all nodes of sample of large Web graphs. Advantage: inexpensive Disadvantage: memory for each marked nodes (N 2 ) time to repeat BFS.Solution: compute supporters only for subset of suspicious nodes constraint: we do not know a prior node is suspicious of being spam or not.
3 C is normalization constant is damping factor Algorithm 1: Link-analysis algorithm Link-analysis algorithm using semi-stream model, metric is score vector that uses O(N log N) bits memory. PR algorithm instead of BFS for web spam detection, for measure the centrality of nodes outcomes tree a specific node and not all nodes, whereas PR compute a score for all nodes in the graph at same time. 2.2 TRUNKATED PAGERANK A link-based ranking function that reduces importance of neighbors which topologically close to the target node. Damping function ignores direct contribution of the first levels of links. Spam pages should be very sensitive to changes in damping factor of PR calculation. A N N be citation matrix of G = (V, E), xy = 1 (x, y) E (1) P be row-normalized citation matrix, that all rows sum up to one, and rows of zeros replaced 1/N to avoid sink rank. W= [damping(t) N]P t Damping(t)={ 0 t T C t t > T (2) Algorithm 2
4 Bit propagation Algorithm for estimating number of distinct supporters at distance d of all nodes. Figure 4. 4times truncated page rank. With comparing PR and TPR, for value from 1 to 4, both closely correlated, an correlation decreases as more level truncated. 2.3 ESTIMATION SUPPORTERS Use probabilistic counting to compute estimation the number of supporter for all vertices in the graph at the same time. Figure 6. Distances of supporter in 3 types. Comparison of estimation average number of supporters against observed value in a sample of nodes, by assuming є = 1/N (3) Figure 5. Propagation of having supporter 1 and Not 0. Bit propagation algorithm. Page y has a link to page x, then vector of page x is updated: x x OR y
5 Table 1. Performance of this Article classifier UK2012 UK2013 True False True False F1 Metrics Positive Positive Positive Positive F1 Degree (D) D + Page Rnk (P) D+P +Trust Rank D + P+ Trunc. PR D + P +Est. Supporters All attributes And Estimation with adaptive Bit propagation, by dividing є two at each iteration b 3 Classification Precision P = tp/(tp + fp) P = #spam hosts classified as spam /(#hosts classified) Recall R = tp/(tp + fn) R = #spam hosts classified as spam/(#spam hosts) Fp False positive rate = #normal hosts classified as spam / (#normal hosts) Fn False negative rate = # spam host classified as spam / (#spam hosts) Table 3. Performance Using Page Rank Supporters degree Experimental Result Previouse F- True False F- Measure from Dataset Positive Positive Measure Table IV UK pages hosts UK pages hosts Table 2. Criterion "F" (Web spam techniques classification) Retrieved Relevant Spam hosts tp #spam hosts classified as spam Nonrelevant Normal hosts fp Not Retrieved fn tn #normal hosts not classified as spam
6 Figure 7. Best Iteration to find suitable distance 4 Conclusions The technique used for link analysis assigns to every node in Page Rank the web graph a numerical score between 0 and 1, known as its Page Rank. With the help of this paper the website owners and webmasters can decide which SEO practice is worth and will give a good return on investment. Finally, the use of regularization methods that exploit the topology of the graph and the locality hypothesis [Davison 2000b] is promising, as it has been shown that those methods are useful for general Web classification tasks [Zhang et al. 2006; Angelova and Weikum 2006; Qi and Davison 2006] and that can be used to improve the accuracy of Web spam detection systems [Castillo et al. 2007].
7 REFERENCES [1] Alexa Inc., last accessed on may 17, 2011 [2] Antoniol, G. and Guéhéneuc, Y. G., "Feature Identification: An Epidemiological Metaphor", IEEE Transactions on Software Engineering, vol. 32, no. 9, 2006, pp [3] Binkley D, Gold G, Harman M, Li Z, Mahdavi K (2008) An empirical study of the relationship between the concepts expressed in source code and dependence. J Syst Software 81: [4] Cornelissen B, Zaidman A, van Deursen A, Moonen L, Koschke R (2009) A systematic survey of program comprehension through dynamic analysis. IEEE Trans Software Eng (TSE) 35(5): [5] De Alwis B, Murphy GC (2008) Answering conceptual queries with Ferret. 30th International Conference on Software Engineering (ICSE 08), Leipzig, Germany, [6] De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G., "Recovering Traceability Links in Software Artefact Management Systems", ACM Transactions on Software Engineering and Methodology, [7] Egyed, A., Binder, G., and Grunbacher, P., "STRADA: A Tool for Scenario-Based Feature-to-Code Trace Detection and Analysis", in Proc. of IEEE/ACM 29th International Conference on Software Engineering (ICSE'07), 2007, pp [8] Eaddy M, Aho AV, Antoniol G, Guéhéneuc YG (2008a) CERBERUS: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. 16th IEEE International Conference on Program Comprehension (ICPC 08), Amsterdam, The Netherlands, [9] Eaddy M, Zimmermann T, Sherwood K, Garg V, Murphy G, Nagappan N, Aho AV (2008b) Do crosscutting concerns cause defects? IEEE Trans Software Eng 34(4): [10] Gay G, Haiduc S, Marcus M, Menzies T (2009) On the use of relevance feedback in IR-based concept location. 25th IEEE International Conference on Software Maintenance (ICSM 09), Edmonton, Canada, [11] Grant S, Cordy JR, Skillicorn DB (2008) Automated concept location using independent component analysis 15th Working Conference on Reverse Engineering (WCRE 08), Antwerp, Belgium, [12] Hayes, J. H., Dekhtyar, A., and Sundaram, S. K., "Advancing candidate link generation for requirements tracing: the study of methods", IEEE Transactions on Software Engineering, vol. 32, no. 1, January , pp [13] Hill E, Pollock L, Vijay-Shanker K (2009) Automatically capturing source code context of NL-queries for software maintenance and reuse. 31st IEEE/ACM International Conference on Software Engineering (ICSE 09), Vancouver, British Columbia, Canada [14] Kothari, J., Denton, T., Mancoridis, S., and Shokoufandeh, A., "On Computing the Canonical Features of Software Systems", in 13th IEEE Working Conference on Reverse Engineering (WCRE'06), Benevento, Italy, [15] Kuhn, A., Ducasse, S., and Gîrba, T., "Semantic Clustering: Identifying Topics in Source Code", Information and Software Technology, vol. 49, no. 3, March 2006, pp [16] Lawrance J, Bellamy R, Burnett M (2007) Scents in programs: does information foraging theory apply to program maintenance? IEEE Symposium on Visual Languages and Human-Centric Computing (VL/ HCC 07), IEEE, [17] Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 07), Atlanta, Georgia, [18] Li Z (2009) Identifying high-level dependence structures using slice-based dependence analysis. King s College London, University of London. Ph.D [19] Lukins S, Kraft N, Etzkorn L (2008) Source code retrieval for bug location using latent dirichlet allocation. 15th Working Conference on Reverse Engineering (WCRE 08), Antwerp, Belgium, [20] Poshyvanyk, D., Guéhéneuc, G. Y., Marcus, A., Antoniol, G., and Rajlich, V., "Feature Location using Probabilistic Ranking of Methods based on Execution Scenarios and Information Retrieval", IEEE Transactions on Software Engineering, vol. 33, no. 6, June 2007, pp [21] Rajlich, V., "Changing the Paradigm of Software Engineering", in Communications of ACM, vol. August, 2006, pp [22] Salah, M., Mancoridis, S., Antoniol, G., and Di Penta, M., "Scenario-driven dynamic analysis for comprehending large software systems", in Proc. of 10th European Conference on Software Maintenance and Reengineering (CSMR'06), [23]Shepherd, D., Fry, Z., Gibson, E., Pollock, L., and Vijay- Shanker, K., "Using Natural Language Program Analysis to Locate and Understand Action-Oriented Concerns", in Proc. of International Conference on Aspect Oriented Software Development (AOSD'07), 2007, pp [24] Simmons, S., Edwards, D., Wilde, N., Homan, J., and Groble, M., "Industrial tools for the feature location problem: an exploratory study", Journal of Software Maintenance: Research and Practice, vol. 18, no. 6, 2006, pp [25]WordStreamTools, on May 10, 2011 [26] Zhao, W., Zhang, L., Liu, Y., Sun, J., and Yang, F., "SNIAFL: Towards a Static Non-interactive Approach to Feature Location", ACM Transactions on Software Engineering and Methodologies, vol. 15, no. 2, 2006, pp
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Comparison of Ant Colony and Bee Colony Optimization for Spam Host Detection
International Journal of Engineering Research and Development eissn : 2278-067X, pissn : 2278-800X, www.ijerd.com Volume 4, Issue 8 (November 2012), PP. 26-32 Comparison of Ant Colony and Bee Colony Optimization
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.
RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,
Spam Host Detection Using Ant Colony Optimization
Spam Host Detection Using Ant Colony Optimization Arnon Rungsawang, Apichat Taweesiriwate and Bundit Manaskasemsak Abstract Inappropriate effort of web manipulation or spamming in order to boost up a web
Enhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
A Case Study of Calculation of Source Code Module Importance
A Case Study of Calculation of Source Code Module Importance Takaaki Goto 1, Setsuo Yamada 2, Tetsuro Nishino 1, and Kensei Tsuchida 3 1 Graduate School of Informatics and Engineering, The University of
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
Change Impact Analysis for the Software Development Phase: State-of-the-art
Change Impact Analysis for the Software Development Phase: State-of-the-art Nazri Kama Advanced Informatics School, Universiti Teknologi Malaysia, Malaysia [email protected] Abstract Impact analysis
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Mining Textual Data for Software Engineering Tasks
Mining Textual Data for Software Engineering Tasks Latifa Guerrouj Benjamin C. M. Fung McGill University McGill University 3661 Peel St., Canada H3A 1X1 3661 Peel St., Canada H3A 1X1 Mobile: (+1) 514-791-0085
Using Library Dependencies for Clustering
Using Library Dependencies for Clustering Jochen Quante Software Engineering Group, FB03 Informatik, Universität Bremen [email protected] Abstract: Software clustering is an established approach
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis
Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Derek Foo 1, Jin Guo 2 and Ying Zou 1 Department of Electrical and Computer Engineering 1 School of Computing 2 Queen
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
Web Application Regression Testing: A Session Based Test Case Prioritization Approach
Web Application Regression Testing: A Session Based Test Case Prioritization Approach Mojtaba Raeisi Nejad Dobuneh 1, Dayang Norhayati Abang Jawawi 2, Mohammad V. Malakooti 3 Faculty and Head of Department
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational
Simply Mining Data Jilles Vreeken So, how do you pronounce Exploratory Data Analysis Jilles Vreeken Jilles Yill less Vreeken Fray can 17 August 2015 Okay, now we can talk. 17 August 2015 The goal So, what
QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES
QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES SWATHI NANDURI * ZAHOOR-UL-HUQ * Master of Technology, Associate Professor, G. Pulla Reddy Engineering College, G. Pulla Reddy Engineering
Topical Authority Identification in Community Question Answering
Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95
Obtaining Optimal Software Effort Estimation Data Using Feature Subset Selection
Obtaining Optimal Software Effort Estimation Data Using Feature Subset Selection Abirami.R 1, Sujithra.S 2, Sathishkumar.P 3, Geethanjali.N 4 1, 2, 3 Student, Department of Computer Science and Engineering,
Spam Detection with a Content-based Random-walk Algorithm
Spam Detection with a Content-based Random-walk Algorithm ABSTRACT F. Javier Ortega Departamento de Lenguajes y Sistemas Informáticos Universidad de Sevilla Av. Reina Mercedes s/n 41012, Sevilla (Spain)
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
The PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization
Regression Testing Based on Comparing Fault Detection by multi criteria before prioritization and after prioritization KanwalpreetKaur #, Satwinder Singh * #Research Scholar, Dept of Computer Science and
Towards a Big Data Curated Benchmark of Inter-Project Code Clones
Towards a Big Data Curated Benchmark of Inter-Project Code Clones Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy, Mohammad Mamun Mia Department of Computer Science, University of Saskatchewan,
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
DYNAMIC QUERY FORMS WITH NoSQL
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH
SERG. Reconstructing Requirements Traceability in Design and Test Using Latent Semantic Indexing
Delft University of Technology Software Engineering Research Group Technical Report Series Reconstructing Requirements Traceability in Design and Test Using Latent Semantic Indexing Marco Lormans and Arie
A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
SEO Techniques for various Applications - A Comparative Analyses and Evaluation
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 20-24 www.iosrjournals.org SEO Techniques for various Applications - A Comparative Analyses and Evaluation Sandhya
Framework for Intelligent Crawler Engine on IaaS Cloud Service Model
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1783-1789 International Research Publications House http://www. irphouse.com Framework for
W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
Design and Experiments of small DDoS Defense System using Traffic Deflecting in Autonomous System
Design and Experiments of small DDoS Defense System using Traffic Deflecting in Autonomous System Ho-Seok Kang and Sung-Ryul Kim Konkuk University Seoul, Republic of Korea [email protected] and [email protected]
IMPROVING JAVA SOFTWARE THROUGH PACKAGE STRUCTURE ANALYSIS
IMPROVING JAVA SOFTWARE THROUGH PACKAGE STRUCTURE ANALYSIS Edwin Hautus Compuware Europe P.O. Box 12933 The Netherlands [email protected] Abstract Packages are an important mechanism to decompose
SIP Service Providers and The Spam Problem
SIP Service Providers and The Spam Problem Y. Rebahi, D. Sisalem Fraunhofer Institut Fokus Kaiserin-Augusta-Allee 1 10589 Berlin, Germany {rebahi, sisalem}@fokus.fraunhofer.de Abstract The Session Initiation
A Change Impact Analysis Tool for Software Development Phase
, pp. 245-256 http://dx.doi.org/10.14257/ijseia.2015.9.9.21 A Change Impact Analysis Tool for Software Development Phase Sufyan Basri, Nazri Kama, Roslina Ibrahim and Saiful Adli Ismail Advanced Informatics
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, [email protected] Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD
Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated
Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, [email protected] Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents
Mining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services
Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Ms. M. Subha #1, Mr. K. Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
Optimizing Configuration and Application Mapping for MPSoC Architectures
Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : [email protected] 1 Multi-Processor Systems on Chip (MPSoC) Design Trends
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
Removing Web Spam Links from Search Engine Results
Removing Web Spam Links from Search Engine Results Manuel EGELE [email protected], 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features
Feature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
Application of Data Mining Techniques for Improving Software Engineering
Application of Data Mining Techniques for Improving Software Engineering Wahidah Husain 1, Pey Ven Low 2, Lee Koon Ng 3, Zhen Li Ong 4, School of Computer Sciences, Universiti Sains Malaysia 11800 USM,
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Conclusions and Future Directions
Chapter 9 This chapter summarizes the thesis with discussion of (a) the findings and the contributions to the state-of-the-art in the disciplines covered by this work, and (b) future work, those directions
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, [email protected] Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
