1 o Semestre 2007/2008
|
|
- Eugenia Sutton
- 7 years ago
- Views:
Transcription
1 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008
2 Outline
3 Outline
4 Exploiting Text How is text exploited? Two main directions Extraction
5 Extraction Entity and relationship (link) extraction Entity resolution/matching Other types of extraction: Events Opinions Sentiments IE bibliography
6 Goals: Representation, organization, storage and access to information items in order to provide the user with easy access to information The emphasis is on information
7 vs. Data Data retrieval Given a specified condition (e.g. {lab, ethics} document), find all items that satisfy the condition retrieval Given a user query, find all items that contain information relevant to the user s needs However how do you characterize the user s information need?
8 vs. Data Data retrieval Given a specified condition (e.g. {lab, ethics} document), find all items that satisfy the condition retrieval Given a user query, find all items that contain information relevant to the user s needs However how do you characterize the user s information need?
9 Translating the user information need An example Find all pages containing information on the ethical treatment of animals for medical experiments. The pages should contain references to recent related scientific articles, together with an enumeration of known existing alternatives for different medical fields. try this on Google Usually this is translated to ethics animals medical experiments but is this a convenient translation?
10 Translating the user information need An example Find all pages containing information on the ethical treatment of animals for medical experiments. The pages should contain references to recent related scientific articles, together with an enumeration of known existing alternatives for different medical fields. try this on Google Usually this is translated to ethics animals medical experiments but is this a convenient translation?
11 Outline
12 IR Tasks Document processing Indexing Crawling Query processing Distributed IR String processing... processing Ad-hoc retrieval Classification Clustering Filtering Question answering...
13 The Process
14 s IR s Classic models Boolean Vector Probabilistic Fuzzy Extended Boolean... LSI Neural Networks... Belief Network Language s... Alternative models
15 Outline
16 Index Terms In the classic IR models, documents are represented by index terms full text/selected keywords structure/no structure Not all terms are equally useful index terms can be weighted We assume that terms are mutually independent this is, of course, a simplification
17 An Example Example document I heartily accept the motto, That government is best which governs least ; and I should like to see it acted up to more rapidly and systematically. Carried out, it finally amounts to this, which also I believe That government is best which governs not at all ; and when men are prepared for it, that will be the kind of government which they will have.
18 An Example Index terms I accept acted all also amounts and are at be believe best carried finally for government governs have heartily is it kind least like men more motto not of out prepared rapidly see should systematically that the they this to up when which will
19 An Example Index terms I 3 accept 1 acted 2 all 3 also 1 amounts 1 and 3 are 1 at 1 be 1 believe 1 best 2 carried 1 finally 1 for 1 government 3 governs 2 have 1 heartily 1 is 2 it 3 kind 1 least 1 like 1 men 1 more 1 motto 1 not 1 of 1 out 1 prepared 1rapidly 1 see 1 should 1 systematically 1 that 3 the 2 they 1 this 1 to 3 up 1 when 1 which 4 will 2
20 An Example Index terms I 3 accept 1 acted 2 all 3 also 1 amounts 1 and 3 are 1 at 1 be 1 believe 1 best 2 carried 1 finally 1 for 1 government 3 governs 2 have 1 heartily 1 is 2 it 3 kind 1 least 1 like 1 men 1 more 1 motto 1 not 1 of 1 out 1 prepared 1rapidly 1 see 1 should 1 systematically 1 that 3 the 2 they 1 this 1 to 3 up 1 when 1 which 4 will 2
21 An Example Logical view of the documents accept acted all... government governs... d d d d
22 Documents as Vectors Documents are represented as vectors d j = (w 1,j,w 2,j,...,w t,j ) w i,j is the weight of term i in document j Queries are also vectors q = (w 1,q,w 2,q,...,w t,q ) Vector operations cab be used to compare queries documents (or documents documents)
23 An example Example Suppose the vocabulary has two terms k 1 = men, k 2 = government Two documents, d 1 and d 2 can be defined as, for instance d 1 = (2.2,5.2) d 2 = (4.9,1.0)
24 An example d 1 d 1 = (2.2, 5.2) d 2 = (4.9, 1.0) government d 2 men
25 Defining Document Vectors Two questions are still unanswered: 1 How do we define term weights? 2 How do we compare documents to queries?
26 Defining Term Weights TF Term frequency Term frequency is a measure of term importance within a document Definition Let N be the total number of documents in the system and n i be the number of documents in which term k i appears. The normalized frequency of a term k i in document d j is given by: f i,j = freq i,j max l freq l,j where freq i,j is the number of occurrences of term k i in document d j.
27 Defining Term Weights IDF (Inverse) Document frequency Document frequency is a measure of term importance within a collection Definition The inverse document frequency of a term k i is given by: idf i = log N n i
28 Defining Term Weights TF-IDF Definition The weight of a term k i in document d j for the vector space model is given by the tf-idf formula: w i,j = f i,j log N n i
29 Document Similarity Similarity between documents and queries is a measure of the correlation between their vectors Documents/queries that share the same terms, with similar weights, should be more similar Thus, as similarity a measure, we use the cosine of the angle between the vectors sim(d j, q) = d j q d j q = t i=1 w i,j w i,q t i=1 w2 i,j t i=1 w2 i,q
30 An example government α d 1 q cos(α) = 0.9 cos(θ) = 0.8 θ d 2 men
31 Outline
32 Traditional IR vs. IR Traditional IR systems Worth of a document regarding a query is intrinsic to the document. Documents are self-contained units Documents are descriptive and truthful The World Wide Indefinitely growing Non-textual content Documents are not self-complete No coherence of style, vocabulary, language,... Most web queries 2 words long
33 IR More information to explore Multimedia Images Video Sound (Semi-)Structured content Hyperlinks
34 Hyperlink graph analysis Hypermedia is a social network Social network theory Extensive research in applying graph notions Centrality and prestige Co-citation (relevance judgment) Applications search: HITS, Google Classification and topic distillation
35 Ranking Through Link Analysis Ranking search results Problems: Keyword queries are not selective enough Documents do not have enough text Solution: Use graph notions of popularity/prestige E.g., use the algorithm
36 Outline
37 Link Each page is a node without any textual properties Each hyperlink is an edge connecting two nodes with possibly only a positive edge weight property
38 Two perspectives The prestige of a page is proportional to the sum of the prestige scores of pages linking to it Idea of a random surfer on a strongly connected web graph
39 Overview of Pre-computes a rank-vector Provides a-priori (offline) importance estimates for all pages on Independent of search query In-degree prestige Not all votes are worth the same Prestige of a page depends on the prestige of citing pages Pre-compute query independent prestige score Query time: prestige scores used in conjunction with query-specific IR scores
40 The algorithm: E is adjacency matrix of the { 1 iff there is a link from u to v E[u, v] = 0 otherwise The out-degree of node u is given by N u = v E[u, v] Start with an initial prestige vector p 0 [u] Compute p i+1 [v] = (u,v) E p i [u] N u
41 Computing
42 Computing
43 Computing
44 Computing
45 Problems of Convergence graph is not strongly connected Only a fourth of the graph is! graph is not aperiodic Rank-sinks Pages without out-links Directed cyclic paths
46 A simple fix Two way choice at each node With a certain probability d (0.1 < d < 0.2), the surfer jumps to a random page on the With probability 1 d the surfer decides to choose, uniformly at random, an out-neighbor p i+1 [v] = d N + (1 d) (u,v) E p i [u] N u
47 architecture at Google Ranking of pages more important than exact values of p Convergence of page ranks in 52 iterations for a crawl with 322 million links. Pre-compute and store the of each page. independent of any query or textual content. Ranking scheme combines with textual match Unpublished Many empirical parameters, human effort and regression testing.
48 Questions?
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
More informationAn Analysis of Factors Used in Search Engine Ranking
An Analysis of Factors Used in Search Engine Ranking Albert Bifet 1 Carlos Castillo 2 Paul-Alexandru Chirita 3 Ingmar Weber 4 1 Technical University of Catalonia 2 University of Chile 3 L3S Research Center
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationSupervised Learning Evaluation (via Sentiment Analysis)!
Supervised Learning Evaluation (via Sentiment Analysis)! Why Analyze Sentiment? Sentiment Analysis (Opinion Mining) Automatically label documents with their sentiment Toward a topic Aggregated over documents
More informationWeb Search. 2 o Semestre 2012/2013
Dados na Dados na Departamento de Engenharia Informática Instituto Superior Técnico 2 o Semestre 2012/2013 Bibliography Dados na Bing Liu, Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2nd
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationTF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
More informationSearch engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
More informationHomework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9
Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;
More informationThe PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
More informationSocial Business Intelligence Text Search System
Social Business Intelligence Text Search System Sagar Ligade ME Computer Engineering. Pune Institute of Computer Technology Pune, India ABSTRACT Today the search engine plays the important role in the
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationInformation Retrieval Models
Information Retrieval Models Djoerd Hiemstra University of Twente 1 Introduction author version Many applications that handle information on the internet would be completely inadequate without the support
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
More informationRanking on Data Manifolds
Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname
More informationEng. Mohammed Abdualal
Islamic University of Gaza Faculty of Engineering Computer Engineering Department Information Storage and Retrieval (ECOM 5124) IR HW 5+6 Scoring, term weighting and the vector space model Exercise 6.2
More informationDevelopment of an Enhanced Web-based Automatic Customer Service System
Development of an Enhanced Web-based Automatic Customer Service System Ji-Wei Wu, Chih-Chang Chang Wei and Judy C.R. Tseng Department of Computer Science and Information Engineering Chung Hua University
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
More informationAn Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
More informationYifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University
Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Presented by Qiang Yang, Hong Kong Univ. of Science and Technology 1 In a Search Engine Company Advertisers
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More informationInternational Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.
RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationIntroduction to Information Retrieval http://informationretrieval.org
Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:
More informationQuery Recommendation employing Query Logs in Search Optimization
1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationdm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationMedical Information-Retrieval Systems. Dong Peng Medical Informatics Group
Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval
More informationIndex Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationIncorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationIntelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives
Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos
More informationInteractive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
More informationTrust and Reputation Management
Trust and Reputation Management Omer Rana School of Computer Science and Welsh escience Centre, Cardiff University, UK Omer Rana (CS, Cardiff, UK) CM0356/CMT606 1 / 28 Outline 1 Context Defining Trust
More informationExam in course TDT4215 Web Intelligence - Solutions and guidelines -
English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed
More informationThe world s largest matrix computation. (This chapter is out of date and needs a major overhaul.)
Chapter 7 Google PageRank The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.) One of the reasons why Google TM is such an effective search engine is the PageRank
More informationData Pre-Processing in Spam Detection
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationInformation Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
More informationMining Web Informative Structures and Contents Based on Entropy Analysis
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 1, JANUARY 2004 1 Mining Web Informative Structures and Contents Based on Entropy Analysis Hung-Yu Kao, Shian-Hua Lin, Member, IEEE Computer
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationIC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com>
IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
More informationDepartment of Cognitive Sciences University of California, Irvine 1
Mark Steyvers Department of Cognitive Sciences University of California, Irvine 1 Network structure of word associations Decentralized search in information networks Analogy between Google and word retrieval
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationAnalysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model
More informationRecommending Web Pages using Item-based Collaborative Filtering Approaches
Recommending Web Pages using Item-based Collaborative Filtering Approaches Sara Cadegnani 1, Francesco Guerra 1, Sergio Ilarri 2, María del Carmen Rodríguez-Hernández 2, Raquel Trillo-Lado 2, and Yannis
More informationRecommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationBig Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time
Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time Edward Bortnikov & Ronny Lempel Yahoo! Labs, Haifa Class Outline Link-based page importance measures Why
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
More informationEnhancing the Ranking of a Web Page in the Ocean of Data
Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today
More informationWeb Graph Analyzer Tool
Web Graph Analyzer Tool Konstantin Avrachenkov INRIA Sophia Antipolis 2004, route des Lucioles, B.P.93 06902, France Email: K.Avrachenkov@sophia.inria.fr Danil Nemirovsky St.Petersburg State University
More informationIBM SPSS Modeler Social Network Analysis 15 User Guide
IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationText and Web Mining A big challenge for Data Mining. Nguyen Hung Son Warsaw University
Text and Web Mining A big challenge for Data Mining Nguyen Hung Son Warsaw University Outline Text vs. Web mining Search Engine Inside: Why Search Engine so important Search Engine Architecture Crawling
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationA Comparison Framework of Similarity Metrics Used for Web Access Log Analysis
A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey
More informationPageRank Conveniention of a Single Web Pageboard
Local Approximation of PageRank and Reverse PageRank Ziv Bar-Yossef Department of Electrical Engineering Technion, Haifa, Israel and Google Haifa Engineering Center Haifa, Israel zivby@ee.technion.ac.il
More informationPractical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationAn Effective Risk Avoidance Scheme for the EigenTrust Reputation Management System
An Effective Risk Avoidance Scheme for the EigenTrust Reputation Management System Takuya Nishikawa and Satoshi Fujita Department of Information Engineering, Hiroshima University Kagamiyama 1-4-1, Higashi-Hiroshima,
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationSpontaneous Code Recommendation based on Open Source Code Repository
Spontaneous Code Recommendation based on Open Source Code Repository Hidehiko Masuhara masuhara@acm.org Tokyo Tech joint work with Takuya Watanabe, Naoya Murakami, Tomoyuki Aotani Do you program with Google?
More informationAn ontology-based approach for semantic ranking of the web search engines results
An ontology-based approach for semantic ranking of the web search engines results Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name
More informationEfficient Identification of Starters and Followers in Social Media
Efficient Identification of Starters and Followers in Social Media Michael Mathioudakis Department of Computer Science University of Toronto mathiou@cs.toronto.edu Nick Koudas Department of Computer Science
More informationSocial Search. Communities of users actively participating in the search process
Chapter 1 Social Search Social Search Social search Communities of users actively participating in the search process Goes beyond classical search tasks Key differences Users interact with the system Users
More informationTutorial, IEEE SERVICE 2014 Anchorage, Alaska
Tutorial, IEEE SERVICE 2014 Anchorage, Alaska Big Data Science: Fundamental, Techniques, and Challenges (Data Mining on Big Data) 2014. 6. 27. By Neil Y. Yen Presented by Incheon Paik University of Aizu
More informationIntroduction to Information Retrieval http://informationretrieval.org
Introduction to Information Retrieval http://informationretrieval.org IIR 7: Scores in a Complete Search System Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-05-07
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationA survey on the use of relevance feedback for information access systems
A survey on the use of relevance feedback for information access systems Ian Ruthven Department of Computer and Information Sciences University of Strathclyde, Glasgow, G1 1XH. Ian.Ruthven@cis.strath.ac.uk
More informationTopics in basic DBMS course
Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch
More informationEffective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted
More informationTop Online Activities (Jupiter Communications, 2000) CS276A Text Information Retrieval, Mining, and Exploitation
Top Online Activities (Jupiter Communications, 2000) CS276A Text Information Retrieval, Mining, and Exploitation Lecture 11 12 November, 2002 Email Web Search 88% 96% Special thanks to Andrei Broder, IBM
More informationWeb Mining Techniques for Query Log Analysis and Expertise Retrieval
Web Mining Techniques for Query Log Analysis and Expertise Retrieval DENG, Hongbo A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy in Computer Science
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationStemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System
Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,
More informationWhy? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
More informationContent-Based Image Retrieval
Content-Based Image Retrieval Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image retrieval Searching a large database for images that match a query: What kind
More informationOutline. for Making Online Advertising Decisions. The first banner ad in 1994. Online Advertising. Online Advertising.
Modeling Consumer Search for Making Online Advertising Decisions i Alan Montgomery Associate Professor Carnegie Mellon University Tepper School of Business Online Advertising Background Search Advertising
More informationFault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
More informationIntroduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
More informationWeb Data Extraction: 1 o Semestre 2007/2008
Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationUsing LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
More information