Identity Matching and Geographical Movement of Open-Source Software Mailing List Participants

Size: px
Start display at page:

Download "Identity Matching and Geographical Movement of Open-Source Software Mailing List Participants"

Transcription

1 and Geographical Movement of Open-Source Software Mailing List Participants Midterm presentation Erik Kouters Eindhoven University of Technology June 19, /20 Erik Kouters and Geographical Movement of Open-Sourc

2 Geographical Movement Case Study Introduction Geographical Movement Skilled workers typically travel more than average person. Open-source software (OSS) contributors Mailing list participants New technique to uncover migration flows of OSS mailing list participants. represents person being at a location at a certain point in time. Movement/migration flows of OSS mailing list participants compared to the average person. Mobility of OSS mailing list participants. 2/20 Erik Kouters and Geographical Movement of Open-Sourc

3 Introduction Geographical Movement Case Study OSS mailing list participants( 12.7%) often use multiple addresses. 27.1% of the addresses have an owner who uses multiple addresses. Data aggregated on address by default Needs to be aggregated on person (). 3/20 Erik Kouters and Geographical Movement of Open-Sourc

4 Introduction Case Study Geographical Movement Case Study Case study: GNOME Well-known amongst researchers GNOME s mailing list archives extracted on April 11, Mailing list archives include 2,202,746 s 73, 920 addresses 4/20 Erik Kouters and Geographical Movement of Open-Sourc

5 Process Overview Introduction Parsing Archives IP Resolution extract parse Person1 Address1 - IP1, Timestamp1 - IP2, Timestamp2 Person2 Address2 - IP3, Timestamp3 - IP4, Timestamp4 IP Resolution Person1 Address1 - Loc1, Timestamp1 - Loc2, Timestamp2 Person2 Address2 - Loc3, Timestamp3 - Loc4, Timestamp4 Person1 Address1 - Loc1, Timestamp1 - Loc2, Timestamp2 Address2 - Loc3, Timestamp3 - Loc4, Timestamp4 5/20 Erik Kouters and Geographical Movement of Open-Sourc

6 Raw s Introduction Parsing Archives IP Resolution Raw s contain headers: From, To, Subject, Date, Received xena storm 2 3 localhost gnome 4 6/20 Erik Kouters and Geographical Movement of Open-Sourc

7 Parsed s Introduction Parsing Archives IP Resolution Each yields name address IP address timestamp Filter automated s sent: Announcements (gnome-announce-list) Request Tracker ( via RT ) Git repository (commits-list) New tarballs on FTP site (ftp-release-list) Aggregate parsed data on address. 7/20 Erik Kouters and Geographical Movement of Open-Sourc

8 IP Resolution Introduction Parsing Archives IP Resolution Resolve IP address to location at city level using multiple (commercial and non-commercial) solutions. IP address to location at city level is not 100% accurate: IP2Location Country level: > 99.5% City level: 77% MaxMind Country level: 99.8% City level: 64.33% IPInfoDB Country level: 99.5% City level: 60% HostIP Country level: 100% City level: unknown 8/20 Erik Kouters and Geographical Movement of Open-Sourc

9 IP Resolution (Contd.) Parsing Archives IP Resolution We use the location which most solutions agree on. Our data set includes 227, 751 unique IP addresses IP addresses resolved to country: 226, 005( 99.23%) IP addresses resolved to city: 189, 879( 83.37%) High number of resolved locations, precision unknown. 9/20 Erik Kouters and Geographical Movement of Open-Sourc

10 Introduction Other Algorithms New Algorithm Alias: name, address : identifying which aliases belong to same individual. Existing algorithms: Simple algorithm Bird et al. s algorithm Developed a new algorithm which is More customisable (i.e. more parameters/thresholds to fit algorithm to data set) More robust w.r.t. noisy and inconsistent data 10/20 Erik Kouters and Geographical Movement of Open-Sou

11 Other Algorithms New Algorithm Simple & Bird et al. s Algorithms Simple Algorithm Bird et al. s Algorithm Our data set contains 146 david@... addresses. 11/20 Erik Kouters and Geographical Movement of Open-Sou

12 New Algorithm Overview Other Algorithms New Algorithm Aliases Convert to vector space model Document-term matrix Add terms with levenshtein similarity Levenshteindocument-term matrix Apply TF IDF model Tfidfdocument-term matrix Apply SVD Decomposed matrices Cosine similarity Matched aliases 12/20 Erik Kouters and Geographical Movement of Open-Sou

13 Other Algorithms New Algorithm New Algorithm Memory Independent Previous version (ICSM ERA 12) depended on memory: 4GB RAM supported data set up to 10,000 aliases. Current version is Memory-independent Parallelised Optimised for speed (using profiler) 13/20 Erik Kouters and Geographical Movement of Open-Sou

14 Other Algorithms New Algorithm New Algorithm Vector Space Model John Travolta, John Joseph Travolta, A = johnt@doma j.travolta@domb john 1... johnt 1... joseph 1... jtravolta 0... travolta /20 Erik Kouters and Geographical Movement of Open-Sou

15 Other Algorithms New Algorithm New Algorithm Levenshtein Similarity John Travolta, John Joseph Travolta, A = johnt@doma j.travolta@domb john 1... johnt 1... joseph 1... jtravolta 8/9... travolta /20 Erik Kouters and Geographical Movement of Open-Sou

16 New Algorithm TF IDF Other Algorithms New Algorithm Reduces value of common terms. Increases value of rare terms. Original scales to number of documents m: m idf (term i ) = log m j=1 a ij New model scales to most frequent term t max : idf (term i ) = log df (t max) m j=1 a ij df = document frequency; computes #documents in which a term occurs 16/20 Erik Kouters and Geographical Movement of Open-Sou

17 New Algorithm SVD Other Algorithms New Algorithm Singular Value Decomposition mathematically uncovers topics. Rank reduction removes the least important topics, theoretically removing noise. Memory independent version was challenging; Will take 8 days on our data set. Slight difference in results without SVD: Skip computing SVD. 17/20 Erik Kouters and Geographical Movement of Open-Sou

18 New Algorithm Overview Other Algorithms New Algorithm Aliases Convert to vector space model Document-term matrix Add terms with levenshtein similarity Levenshteindocument-term matrix Apply TF IDF model Tfidfdocument-term matrix Apply SVD Decomposed matrices Cosine similarity Matched aliases 18/20 Erik Kouters and Geographical Movement of Open-Sou

19 Introduction Recently performed manual identity matching on mailing list data. Compare our algorithm to Simple and Bird et al. s with new, bigger data set. Will send questionnaire to confirm Identity matching Migrations uncovered from s Analyse confirmed migration flows of mailing list participants. 19/20 Erik Kouters and Geographical Movement of Open-Sou

20 Questions Introduction Thank you for listening! Questions? 20/20 Erik Kouters and Geographical Movement of Open-Sou

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

More information

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean

More information

Text Analytics Illustrated with a Simple Data Set

Text Analytics Illustrated with a Simple Data Set CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to

More information

MIMO CHANNEL CAPACITY

MIMO CHANNEL CAPACITY MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe

More information

Google Analytics: Tracking Where a Visitor Goes on Your Web Site

Google Analytics: Tracking Where a Visitor Goes on Your Web Site Tutorial Google Analytics: Tracking Where a Visitor Goes on Your Web Site Overview: My Books and More Mail s integration with Google Analytics allows you to track web site activity that results from My

More information

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Detection of Distributed Denial of Service Attack with Hadoop on Live Network Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,

More information

Detecting Network Anomalies. Anant Shah

Detecting Network Anomalies. Anant Shah Detecting Network Anomalies using Traffic Modeling Anant Shah Anomaly Detection Anomalies are deviations from established behavior In most cases anomalies are indications of problems The science of extracting

More information

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article

More information

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error

More information

CHAPTER VII CONCLUSIONS

CHAPTER VII CONCLUSIONS CHAPTER VII CONCLUSIONS To do successful research, you don t need to know everything, you just need to know of one thing that isn t known. -Arthur Schawlow In this chapter, we provide the summery of the

More information

Load Testing at Yandex. Alexey Lavrenuke

Load Testing at Yandex. Alexey Lavrenuke Load Testing at Yandex Alexey Lavrenuke Load Testing at Yandex What is Yandex Yet another indexer Yet another indexer Yandex Yandex s mission is to help people discover new opportunities in their lives.

More information

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes

More information

Integrated Migration Tool

Integrated Migration Tool IceWarp Unified Communications Integrated Migration Tool Version 10.4 Printed on 16 April, 2012 Contents Integrated Migration Tool 1 How It Works... 2 Performing Migration... 3 Set up the Domain in IceWarp

More information

Studying E-mail Graphs for Intelligence Monitoring and Analysis in the Absence of Semantic Information

Studying E-mail Graphs for Intelligence Monitoring and Analysis in the Absence of Semantic Information Studying E-mail Graphs for Intelligence Monitoring and Analysis in the Absence of Semantic Information Petros Drineas, Mukkai S. Krishnamoorthy, Michael D. Sofka Bülent Yener Department of Computer Science,

More information

Possibilities of Automation of the Caterpillar -SSA Method for Time Series Analysis and Forecast. Th.Alexandrov, N.Golyandina

Possibilities of Automation of the Caterpillar -SSA Method for Time Series Analysis and Forecast. Th.Alexandrov, N.Golyandina Possibilities of Automation of the Caterpillar -SSA Method for Time Series Analysis and Forecast Th.Alexandrov, N.Golyandina theo@pdmi.ras.ru, nina@ng1174.spb.edu St.Petersburg State University, Russia

More information

Integrated Migration Tool

Integrated Migration Tool IceWarp Unified Communications Version 11.3 Published on 1/6/2015 Contents... 4 Performing Migration... 5 Set up the Domain in IceWarp Server... 5 Create Migrator Email Account... 6 Configure Migration

More information

Horizontal Traceability for Just-In-Time Requirements: The Case for Open Source Feature Requests

Horizontal Traceability for Just-In-Time Requirements: The Case for Open Source Feature Requests JOURNAL OF SOFTWARE: EVOLUTION AND PROCESS J. Softw. Evol. and Proc. 2014; xx:xx xx Published online in Wiley InterScience (www.interscience.wiley.com). Horizontal Traceability for Just-In-Time Requirements:

More information

Overview. Accessing the User Interface. Logging In. Resetting your Password

Overview. Accessing the User Interface. Logging In. Resetting your Password Overview The message filtering service lets a company easily provide real-time spam and virus filtering, attack blocking, and email-traffic monitoring across a user deployment of any size. Users receive

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Text mining & Information Retrieval Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

More information

BIG Biomedicine and the Foundations of BIG Data Analysis

BIG Biomedicine and the Foundations of BIG Data Analysis BIG Biomedicine and the Foundations of BIG Data Analysis Insider s vs outsider s views (1 of 2) Ques: Genetics vs molecular biology vs biochemistry vs biophysics: What s the difference? Insider s vs outsider

More information

Chapter 8 Monitoring and Logging

Chapter 8 Monitoring and Logging Chapter 8 Monitoring and Logging This chapter describes the SSL VPN Concentrator status information, logging, alerting and reporting features. It describes: SSL VPN Concentrator Status Active Users Event

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Intellicus Enterprise Reporting and BI Platform

Intellicus Enterprise Reporting and BI Platform Intellicus Cluster and Load Balancer Installation and Configuration Manual Intellicus Enterprise Reporting and BI Platform Intellicus Technologies info@intellicus.com www.intellicus.com Copyright 2012

More information

Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems

Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems Giovanni S Alberti (joint with H Ammari) DMA, École Normale Supérieure, Paris June 16, 2015 Giovanni S Alberti

More information

Kiwi SyslogGen. A Freeware Syslog message generator for Windows. by SolarWinds, Inc.

Kiwi SyslogGen. A Freeware Syslog message generator for Windows. by SolarWinds, Inc. Kiwi SyslogGen A Freeware Syslog message generator for Windows by SolarWinds, Inc. Kiwi SyslogGen is a free Windows Syslog message generator which sends Unix type Syslog messages to any PC or Unix Syslog

More information

Development of an Enhanced Web-based Automatic Customer Service System

Development of an Enhanced Web-based Automatic Customer Service System Development of an Enhanced Web-based Automatic Customer Service System Ji-Wei Wu, Chih-Chang Chang Wei and Judy C.R. Tseng Department of Computer Science and Information Engineering Chung Hua University

More information

UBUNTU DISK IO BENCHMARK TEST RESULTS

UBUNTU DISK IO BENCHMARK TEST RESULTS UBUNTU DISK IO BENCHMARK TEST RESULTS FOR JOYENT Revision 2 January 5 th, 2010 The IMS Company Scope: This report summarizes the Disk Input Output (IO) benchmark testing performed in December of 2010 for

More information

Automation Engine 14. Troubleshooting

Automation Engine 14. Troubleshooting 4 Troubleshooting 2-205 Contents. Troubleshooting the Server... 3. Checking the Databases... 3.2 Checking the Containers...4.3 Checking Disks...4.4.5.6.7 Checking the Network...5 Checking System Health...

More information

Service Performance Management: Pragmatic Approach by Jim Lochran

Service Performance Management: Pragmatic Approach by Jim Lochran www.pipelinepub.com Volume 3, Issue 12 Service Performance Management: Pragmatic Approach by Jim Lochran As the mix of service provider offerings become more IP centric, the need to overhaul existing service

More information

AccuRead OCR. Administrator's Guide

AccuRead OCR. Administrator's Guide AccuRead OCR Administrator's Guide April 2015 www.lexmark.com Contents 2 Contents Overview...3 Supported applications...3 Supported formats and languages...3 OCR performance...4 Sample documents...6 Configuring

More information

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper

More information

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage

More information

Comparison of Standard and Zipf-Based Document Retrieval Heuristics

Comparison of Standard and Zipf-Based Document Retrieval Heuristics Comparison of Standard and Zipf-Based Document Retrieval Heuristics Benjamin Hoffmann Universität Stuttgart, Institut für Formale Methoden der Informatik Universitätsstr. 38, D-70569 Stuttgart, Germany

More information

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

CREATING REPORTS AND EXPORTING DATA

CREATING REPORTS AND EXPORTING DATA CREATING REPORTS AND EXPORTING DATA in HP Web Jetadmin CONTENTS Overview... 3 Create reports... 4 Data collection basics... 4 Data collection types... 4 Report generation basics... 5 Report types... 6

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

Ensim WEBppliance 3.1 for Linux (LH) Release Notes

Ensim WEBppliance 3.1 for Linux (LH) Release Notes Ensim WEBppliance 3.1 for Linux (LH) Release Notes June 04, 2002 These release notes cover the following topics of Ensim WEBppliance 3.1 for Linux (LH). About WEBppliance 3.1 for Linux (LH) New features

More information

Data Mining Techniques

Data Mining Techniques 15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

More information

Redmine: A project management software tool. January, 2013

Redmine: A project management software tool. January, 2013 Redmine: A project management software tool January, 2013 Outline Introduction to Redmine. Important concepts of Redmine. How to use Redmine. 1 Introduction: What is Redmine? Redmine is a project management

More information

Requirements Traceability Recovery

Requirements Traceability Recovery MASTER S THESIS Requirements Traceability Recovery - A Study of Available Tools - Author: Lina Brodén Supervisor: Markus Borg Examiner: Prof. Per Runeson April 2011 Abstract This master s thesis is focused

More information

Introduction Installation firewall analyzer step by step installation Startup Syslog and SNMP setup on firewall side firewall analyzer startup

Introduction Installation firewall analyzer step by step installation Startup Syslog and SNMP setup on firewall side firewall analyzer startup Introduction Installation firewall analyzer step by step installation Startup Syslog and SNMP setup on firewall side firewall analyzer startup Configuration Syslog server add and check Configure SNMP on

More information

AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS

AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-19-25 Available online at http://www.bioinfo.in/contents.php?id=32 AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS

More information

Detecting E-mail Spam Using Spam Word Associations

Detecting E-mail Spam Using Spam Word Associations Detecting E-mail Spam Using Spam Word Associations N.S. Kumar 1, D.P. Rana 2, R.G.Mehta 3 Sardar Vallabhbhai National Institute of Technology, Surat, India 1 p10co977@coed.svnit.ac.in 2 dpr@coed.svnit.ac.in

More information

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

WP36: Composed Service Accounting Architecture Definition

WP36: Composed Service Accounting Architecture Definition WP36: Composed Service Accounting Architecture Definition D36.4: A set of Accounting Building Blocks for Automatically Composed Services Project funded by the European Community under the Information Society

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Stellar Phoenix Exchange Server Backup

Stellar Phoenix Exchange Server Backup Stellar Phoenix Exchange Server Backup Version 1.0 Installation Guide Introduction This is the first release of Stellar Phoenix Exchange Server Backup tool documentation. The contents will be updated periodically

More information

Identifying User Behavior in domainspecific

Identifying User Behavior in domainspecific Identifying User Behavior in domainspecific Repositories Wilko VAN HOEK a,1, Wei SHEN a and Philipp MAYR a a GESIS Leibniz Institute for the Social Sciences, Germany Abstract. This paper presents an analysis

More information

Statistical Feature Selection Techniques for Arabic Text Categorization

Statistical Feature Selection Techniques for Arabic Text Categorization Statistical Feature Selection Techniques for Arabic Text Categorization Rehab M. Duwairi Department of Computer Information Systems Jordan University of Science and Technology Irbid 22110 Jordan Tel. +962-2-7201000

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

10426: Large Scale Project Accounting Data Migration in E-Business Suite

10426: Large Scale Project Accounting Data Migration in E-Business Suite 10426: Large Scale Project Accounting Data Migration in E-Business Suite Objective of this Paper Large engineering, procurement and construction firms leveraging Oracle Project Accounting cannot withstand

More information

Partek Flow Installation Guide

Partek Flow Installation Guide Partek Flow Installation Guide Partek Flow is a web based application for genomic data analysis and visualization, which can be installed on a desktop computer, compute cluster or cloud. Users can access

More information

Improved Credential and SSL Configuration for EE 7

Improved Credential and SSL Configuration for EE 7 Improved Credential and SSL Configuration for EE 7 1. Introduction: SSL, trust stores, keystores and credential repositories are generally difficult areas to configure for Java EE environments. The configuration

More information

Chapter 6. About This Chapter. Before You Begin. Windows 2000 Naming Schemes. [Previous] [Next]

Chapter 6. About This Chapter. Before You Begin. Windows 2000 Naming Schemes. [Previous] [Next] [Previous] [Next] Chapter 6 R e s o l v i n g N e t w o r k H o s t N a m e s About This Chapter Both clients and servers on a network must resolve the user-friendly host names to the Internet Protocol

More information

Service Level Agreement

Service Level Agreement Service Level Agreement Addendum Dedicated Server Managed Server Service Versie 1.0 6/08/2012 Telenet N.V.-S.A., Liersesteenweg 4, 2800 Mechelen, Belgium l BTW-TVA BE0473.416.418 RPR-RPM Mechelen l IBAN

More information

1 Log visualization at CNES (Part II)

1 Log visualization at CNES (Part II) 1 Log visualization at CNES (Part II) 1.1 Background For almost 2 years now, CNES has set up a team dedicated to "log analysis". Its role is multiple: This team is responsible for analyzing the logs after

More information

The Spectrum of Data Integration Solutions: Why You Should Have Them All

The Spectrum of Data Integration Solutions: Why You Should Have Them All HAWTIN, STEVE, Schlumberger Information Systems, Houston TX; NAJIB ABUSALBI, Schlumberger Information Systems, Stavanger, Norway; LESTER BAYNE, Schlumberger Information Systems, Stavanger, Norway; MARK

More information

HP JETADVANTAGE SECURITY MANAGER. Adding and Tracking Devices

HP JETADVANTAGE SECURITY MANAGER. Adding and Tracking Devices HP JETADVANTAGE SECURITY MANAGER Adding and Tracking Devices CONTENTS Overview... 2 General Description... 2 Detailed Description... 4 Resolve IP Address to Hostname... 4 Resolve Hostname/DNS Alias to

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9 Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

IceWarp Server Upgrade

IceWarp Server Upgrade IceWarp Unified Communications IceWarp Version 11.4 Published on 2/9/2016 Contents... 3 Best Practices... 3 Planning Upgrade... 3 Prior Upgrade... 3 Upgrade... 3 After Upgrade... 3 Upgrade to Version 11.3...

More information

Kaiser Permanente Member Complaints Text Mining Project. Data and Information Management Enhancement

Kaiser Permanente Member Complaints Text Mining Project. Data and Information Management Enhancement Member Complaints Text Mining Project Data and Information Management Enhancement Member Complaint Text Mining Project Abstract NW has a regional goal of reducing member complaints by 25% over two years

More information

Using MailStore to Archive MDaemon Email

Using MailStore to Archive MDaemon Email Using MailStore to Archive MDaemon Email This guide details how to archive all inbound and outbound email using MailStore, as well as archiving any emails currently found in the users accounts in MDaemon.

More information

AlienVault Unified Security Management (USM) 4.x-5.x. Deployment Planning Guide

AlienVault Unified Security Management (USM) 4.x-5.x. Deployment Planning Guide AlienVault Unified Security Management (USM) 4.x-5.x Deployment Planning Guide USM 4.x-5.x Deployment Planning Guide, rev. 1 Copyright AlienVault, Inc. All rights reserved. The AlienVault Logo, AlienVault,

More information

Mercy Health System. St. Louis, MO. Process Mining of Clinical Workflows for Quality and Process Improvement

Mercy Health System. St. Louis, MO. Process Mining of Clinical Workflows for Quality and Process Improvement Mercy Health System St. Louis, MO Process Mining of Clinical Workflows for Quality and Process Improvement Paul Helmering, Executive Director, Enterprise Architecture Pete Harrison, Data Analyst, Mercy

More information

Working With Flow Data in an Academic Environment in the DDoSVax Project at ETH Zuerich

Working With Flow Data in an Academic Environment in the DDoSVax Project at ETH Zuerich Working With Flow Data in an Academic Environment in the DDoSVax Project at ETH Zuerich Arno Wagner wagner@tik.ee.ethz.ch Communication Systems Laboratory Swiss Federal Institute of Technology Zurich (ETH

More information

Mobillion components. Mobillion Centre. Mobillion Connectors

Mobillion components. Mobillion Centre. Mobillion Connectors Mobillion components Mobillion Centre Mobillion Connectors Mobillion Mobile Client Mobillion Desktop Console Enables efficient coordination within the business organization by receiving and exchanging

More information

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

Firewall Server 7.2. Release Notes. What's New in Firewall Server 7.2

Firewall Server 7.2. Release Notes. What's New in Firewall Server 7.2 Firewall Server 7.2 Release Notes BorderWare Technologies is pleased to announce the release of version 7.2 of the Firewall Server. This release includes the following new features and improvements. What's

More information

Indexing Full Packet Capture Data With Flow

Indexing Full Packet Capture Data With Flow Indexing Full Packet Capture Data With Flow FloCon January 2011 Randy Heins Intelligence Systems Division Overview Full packet capture systems can offer a valuable service provided that they are: Retaining

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

CA Nimsoft Monitor. snmptd Guide. v3.0 series

CA Nimsoft Monitor. snmptd Guide. v3.0 series CA Nimsoft Monitor snmptd Guide v3.0 series Legal Notices Copyright 2013, CA. All rights reserved. Warranty The material contained in this document is provided "as is," and is subject to being changed,

More information

Server Load Prediction

Server Load Prediction Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

More information

Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC

Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC Paper 073-29 Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT Version 9 of SAS software has added functions which can efficiently

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Introduction Epipolar Geometry Calibration Methods Further Readings. Stereo Camera Calibration

Introduction Epipolar Geometry Calibration Methods Further Readings. Stereo Camera Calibration Stereo Camera Calibration Stereo Camera Calibration Stereo Camera Calibration Stereo Camera Calibration 12.10.2004 Overview Introduction Summary / Motivation Depth Perception Ambiguity of Correspondence

More information

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Computational Optical Imaging - Optique Numerique. -- Deconvolution -- Computational Optical Imaging - Optique Numerique -- Deconvolution -- Winter 2014 Ivo Ihrke Deconvolution Ivo Ihrke Outline Deconvolution Theory example 1D deconvolution Fourier method Algebraic method

More information

Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP

Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP Mohan Bandaru, Amarendra Kothalanka, Vikram Uppala Student, Department of Computer Science

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

The Advantages of Enterprise Historians vs. Relational Databases

The Advantages of Enterprise Historians vs. Relational Databases GE Intelligent Platforms The Advantages of Enterprise Historians vs. Relational Databases Comparing Two Approaches for Data Collection and Optimized Process Operations The Advantages of Enterprise Historians

More information

The Sage Evolution Branch Accounting Solution. Uninterrupted Secure Accurate. Branch Accounting 1

The Sage Evolution Branch Accounting Solution. Uninterrupted Secure Accurate. Branch Accounting 1 The Sage Evolution Accounting Solution Uninterrupted Secure Accurate Accounting 1 Table of Contents Executive summary 03 Objective of this document 03 Who is meant to read this document 03 The Sage Evolution

More information

Automating the Measurement of Open Source Projects

Automating the Measurement of Open Source Projects Automating the Measurement of Open Source Projects Daniel German Department of Computer Science University of Victoria dmgerman@uvic.ca Audris Mockus Avaya Labs Department of Software Technology Research

More information

HP IMC User Behavior Auditor

HP IMC User Behavior Auditor HP IMC User Behavior Auditor Administrator Guide Abstract This guide describes the User Behavior Auditor (UBA), an add-on service module of the HP Intelligent Management Center. UBA is designed for IMC

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

Secure Web Gateway 11.7 Upgrade Release Notes

Secure Web Gateway 11.7 Upgrade Release Notes Secure Web Gateway 11.7 Upgrade Release Notes August 2015 Trustwave is pleased to announce that the upgrade path for Secure Web Gateway to version 11.7 is now available. For more information on SWG 11.7,

More information

Passive Network Traffic Analysis: Understanding a Network Through Passive Monitoring Kevin Timm,

Passive Network Traffic Analysis: Understanding a Network Through Passive Monitoring Kevin Timm, Passive Network Traffic Analysis: Understanding a Network Through Passive Monitoring Kevin Timm, Network IDS devices use passive network monitoring extensively to detect possible threats. Through passive

More information

Understanding the Impact of Weights Constraints in Portfolio Theory

Understanding the Impact of Weights Constraints in Portfolio Theory Understanding the Impact of Weights Constraints in Portfolio Theory Thierry Roncalli Research & Development Lyxor Asset Management, Paris thierry.roncalli@lyxor.com January 2010 Abstract In this article,

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Technical Specification. Solutions created by knowledge and needs

Technical Specification. Solutions created by knowledge and needs Technical Specification Solutions created by knowledge and needs The industrial control and alarm management system that integrates video, voice and data Technical overview Process Architechture OPC-OCI

More information