# You Will Get Mail! Predicting the Arrival of Future

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 is non-negligible. We associate with each candidate a score representing the probability of its appearance in the given time window. These three steps of (1) discovering events and causal relations, (2) analyzing inboxes and (3) the actual template prediction, are detailed below. 3.4 Observed events and causal relations We consider two kinds of causal relations: (1) between two templates, and (2) between a temporal event and a template. Recall that our data includes inboxes in which all messages arrived in the time window [t bgn, t cur]. For a template or a temporal event S and a template T, we define Cnt 1(S, T ) as the overall number of times the pair (S, T ) was observed, where S appeared at least two weeks prior to t cur. Note that we increment this counter only for pairs (S, T ) in which S appeared before T and no other S or T appeared between them. Similarly, Cnt 2(S, T ) is the analog that counts the number of times that the pair (S, T ) was observed, but T appeared at least two weeks after t bgn. Note that we use these (two weeks) truncated counts to avoid inaccuracies that relate to the exact times that our time window begins and ends. We also define Cnt 1(S) as the total number of times S was observed at least two weeks prior to t cur, and Cnt 2(T ) as the number of times T was observed at least two weeks after t bgn. We first focus on the case that S is a template. We say that S causes T if Cnt 2(S, T ) Cnt 2(T ) > θ 1, and Cnt2(S, T ) Cnt 2(T ) In a similar way, we say that T is caused by S if Cnt 1(S, T ) Cnt 1(S) > θ 1, and Cnt1(S, T ) Cnt 1(S) Cnt 2(S) > θ2. (1) Cnt 2(T, S) Cnt 1(T ) > θ2. (2) Cnt 1(T, S) where θ 1 and θ 2 are parameters (eventually given corresponding values of 0.3 and 2 after a grid search). In both equations, the first inequality asserts a correlation between S and T. Informally, in equation 1 we get that Pr[S T ] is large, and in equation 2, we get that Pr[T S] is large. The latter inequality in both equations filters spurious relations, where both S and T are caused by unobserved events 1. We emphasize that although those causal relations may seem similar or even identical, they are inherently different. For example, suppose S is a purchase notification template of a very small vendor and T is a shipment notification template of a prime corporation. It is quite natural that S causes T. However, the appearance of T does not indicate that it was caused by S since only a small fraction of the shipments done by the corporation are for this small vendor. As another example, consider the case that S is a product promotion template and T is a purchase notification template. This time it seems natural to say that, given T, it was caused by S. However, the opposite does not seem true, that is, the appearance of S does not indicate that it causes T since promotions are more likely not to lead to purchases. We now turn to analyzing whether a temporal event S causes a template T. In this case, it seems less natural to focus on the corresponding counts since temporal events are consistently interleaved with templates, and hence, some relations may look spurious. Therefore, we rather validate that the time difference between S and T is consistent. This is achieved by validating that the variance of the time differences is small. More formally, for a template or a temporal event S and a template T, we define δ(s, T ) to be the random variable that captures the time difference for the pair 1 Note that this approach does not remove all spurious relations, yet it does eliminate the vast majority of them, which is sufficient for practical purposes. (S, T ) in which S appeared before T and no S or T appeared between them. We statistically infer this random variable from the data. We also define δ(s) to be the (same) time difference between two consecutive temporal events of S. We say that the temporal event S causes T if stdev(δ(s, T )) < θ 3 δ(s), (3) where θ 3 is a parameter (eventually given the value of 0.1 by a grid search). Let S be a template or a temporal event. We define Follow(S) to be the set of templates that may be caused by S, that is, the union of all templates T such that S causes T. Note that given the appearance of a template S at most one T Follow(S) may appear due to S by our assumption. However, in case that S is a temporal event, each T Follow(S) can appear independently. We also define Precede(T ) to be the collection of templates that may give rise to the appearance of template T, namely, templates S such that T is caused by S Analyzing an inbox The goal of this step is to identify causal relations between messages in the inbox. This step is essential since any message that already gave rise to a message within the inbox cannot cause another message. Consequently, we should ignore those messages when we predict the arrival of future messages. We explain below how to estimate for each message, the probability that it causes some other message in the inbox as well as the probability that it is caused by some message. We first focus on an important feature in our approach, namely time differences. Recall that δ(s, T ) is a random variable that captures the time difference for the pair (S, T ), where S is a template or a temporal event and T is a template. We assume that the distribution D δ (S, T ) describes the random variable δ(s, T ). As D δ (S, T ) is unknown, we approximate it by considering all the time differences for the pair (S, T ) in our data, and remembering their 99 percentiles, while assuming a uniform distribution within each bucket. More formally, let t 1,..., t 99 be the 99 percentiles of the observed time differences between S and T. We assume that F S,T, the probability density function of D δ (S, T ), has a probability of 1/(100(t i+1 t i)) for any value in [t i, t i+1]. Here, t 0 = 0 and t 100 is the maximum time difference. Let us focus on an inbox, and let E = (T, t) be an message in the inbox. Suppose the inbox consists of the messages and temporal events E 1 = (S 1, t 1),..., E k = (S k, t k ) such that, 1. t i t for every i [k], and 2. T Follow(S i) or S i Precede(T ) for every i [k]. We posit that the probability that the message E was not caused by any of E 1,..., E k is p T = Cnt2( (S1,..., S k), T ), Cnt 2(T ) where the expression Cnt 2( (S 1,..., S k ), T ) stands for the number of times template T is observed in our data, while there are no appearances of S 1,..., S k during some (parametrized) time period preceding it (eventually two weeks were selected by a grid search). We emphasize that although p T is not an accurate estimation of the probability that E was not caused by any of E 1,..., E k, it is a simple expression that provides a good approximation for a carefully selected time period. In the case that E was caused by one of E 1,..., E k, which happens with probability 1 p T, we assume that the probability that E i gave rise to E is proportional to 1329

4 Cnt 1(S i, T ) F Si,T (t t i), Cnt 1(S i) where F Si,T is the (approximate) probability density function of D δ (S i, T ) which is generated using the process described before. The above discussion accounts for the distribution of messages or events being the parent (or cause) of a specific message. We are interested in the probabilities of all relations within an inbox. To estimate these probabilities, we pick those relations at random according to the calculated distribution. Specifically, given an inbox E 1,..., E n, we go over the messages in the inbox from the oldest one to the most recent. For each message under consideration, we randomly assign a parent (or possibly reflect the case in which a message does not have a parent) according to the above-mentioned probability distribution. Based on our random choices, we adjust the probabilities, and continue to the next message. In particular, if we decide that message E i is the parent of E j, we alter the probability that E i is the parent of messages E k such that k > j > i to zero, and normalize the relevant probabilities. We repeat the above-mentioned random process that identifies parental relations multiple times to get more accurate estimations p parent(e i) (respectively, p child (E i)) of the probability that each message E i caused another message (respectively, was caused by a message or event) in the inbox Predicting future templates Given an inbox that consists of the messages and temporal events E 1 = (S 1, t 1),..., E n = (S n, t n), we want to predict the templates that will arrive within the next time units. We first compute a list of candidate templates that have a non-negligible chance of appearing. This collection is composed of all the templates in the inbox and all the templates in n i=1follow(s i). Let T be this set of templates. We compute a score for each T T that indicates how likely it is to appear within the given time window. The list of templates that we finally predict consists of all templates whose score exceeds some threshold, which can be adjusted to increase recall or precision. In order to assign a probability score to a given template T T, we first calculate α 0, which is defined as the probability that T appears due to an unobserved event in the next time units. We model this generation process as a series of Bernoulli trials that decides, in each time unit, whether T arrives. For this purpose, we consider the collection C of all messages E i whose template is T. We let Cnt(C) = E C(1 p child (E)), be the (fractional) expected number of messages in the inbox that appeared due to an unobserved event. We use q T to mark the probability that T spontaneously appears in a single time unit. Specifically, q T = min{1, Cnt(C)/(t cur t bgn )}. Consequently, the probability that T appears in a time window of length is α 0 = 1 (1 q T ). In order to account for the creation of T due to causality relations, we consider only such templates and temporal events S i such that T Follow(S i). Let us assume without loss of generality that every S i satisfies this property. In case S i is a template, we define β i = Cnt1(Si, T ) Cnt 1(S i) tcur t i + t=t cur t i F Si,T (t)dt. This quantity is the estimate of the probability that S i causes T. Dealing with temporal events is a bit more tricky since temporal events occur periodically, and thus, there are such future events that occur after t cur but before t cur +. Clearly, we like to take them into account. In case that S i is a temporal event that happens every t spn time units, we define β i = Cnt1(Si, T ) Cnt 1(S i) min{tcur t i +,t spn} t=max{t cur t i,0} F Si,T (t)dt. Recall that an message can cause at most one other message. Since some of the messages in the inbox may have already caused other messages in the inbox, we set α i = β i (1 p parent(e i)), for all messages E i. This term is the estimate of the probability that E i caused T (given the underlying inbox). Note that if E i is a temporal event then we set α i = β i. Since we assume independence between the different generation processes, we estimate the probability that T is generated in the mentioned time window by n 1 (1 α i). i=0 4. DATA INSIGHTS An important signal in our approach is the time differences between pairs (S, T ), where S is a template or a temporal event and T is a template. This information is utilized in both the inbox analysis and the prediction components. Given a pair (S, T ), it may seem natural to approximate the distribution D δ (S, T ) by some standard distribution, such as Gaussian or Zipf, and mine its underlying parameters. This would allow us to represent the probability density function concisely and efficiently. Unfortunately, such an approach fails miserably. In almost all cases, the distribution D δ (S, T ) has a complex structure. Furthermore, different (S, T ) pairs often exhibit widely-differing time differences behavior, which is very inductive to the relation between them (see the figures and discussion below). Luckily, as we are only interested in pairs (S, T ) that co-occur often, we obtain a large number of samples from the mentioned distribution. As a consequence, we decided to approximate the probability density function of D δ (S, T ) by recording percentiles 1 through 99 of the observed time differences, while assuming a uniform distribution within each bucket defined by two consecutive percentiles. We note that such an approximation not only provides a good estimation of the unknown distribution, but also scales well. Figure 1 includes a few examples that demonstrate how template pairs may have different distributions. For each of the graphs in the figure, the template pair is indicated in the caption above the graph, the x-axis represents the time differences between instances of that pair (with a step size of 1 hour), and the y-axis represents the number of pair instances that were observed. For example, the upper graph demonstrates the relation between the template your ebay item sold. from ebay.com and the template we re transferring money to your bank from paypal.com. This graph confirms for instance that most commonly, the transfer of money immediately follows the sale. Also, one can see that there is a decrease in the number of times money was transferred as the time passes (with the caveat that this overall trend exhibits diurnal fluctuations). One can derive additional insights from these graphs. The second to bottom graph suggests that walmart.com most commonly sends a survey two weeks after a purchase has been done, while the bottom graph suggests that processing photo printing orders at walgreens.com usually takes less than 3 hours, although there are some orders that are ready after 10 hours (maybe due to nighttime). Another example, presented in Figure 2, shows that when users indicate that they have forgotten their password on ebay.com, it typically takes less than 10 minutes until they change it. There are 1330

### Reputation Network Analysis for Email Filtering

Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

### Introduction to time series analysis

Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

### Customer Analytics. Turn Big Data into Big Value

Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

### 11.3 BREAK-EVEN ANALYSIS. Fixed and Variable Costs

385 356 PART FOUR Capital Budgeting a large number of NPV estimates that we summarize by calculating the average value and some measure of how spread out the different possibilities are. For example, it

### Why is Internal Audit so Hard?

Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

### Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

### Invited Applications Paper

Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### The Cost of Annoying Ads

The Cost of Annoying Ads DANIEL G. GOLDSTEIN Microsoft Research and R. PRESTON MCAFEE Microsoft Corporation and SIDDHARTH SURI Microsoft Research Display advertisements vary in the extent to which they

### Spam Testing Methodology Opus One, Inc. March, 2007

Spam Testing Methodology Opus One, Inc. March, 2007 This document describes Opus One s testing methodology for anti-spam products. This methodology has been used, largely unchanged, for four tests published

### The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

### Web Content Summarization Using Social Bookmarking Service

ISSN 1346-5597 NII Technical Report Web Content Summarization Using Social Bookmarking Service Jaehui Park, Tomohiro Fukuhara, Ikki Ohmukai, and Hideaki Takeda NII-2008-006E Apr. 2008 Jaehui Park School

### Frequency Matters. The keys to optimizing email send frequency

The keys to optimizing email send frequency Email send frequency requires a delicate balance. Send too little and you miss out on sales opportunities and end up leaving money on the table. Send too much

### QUANTIFIED THE IMPACT OF AGILE. Double your productivity. Improve Quality by 250% Balance your team performance. Cut Time to Market in half

THE IMPACT OF AGILE QUANTIFIED SWAPPING INTUITION FOR INSIGHT KEY FIndings TO IMPROVE YOUR SOFTWARE DELIVERY Extracted by looking at real, non-attributable data from 9,629 teams using the Rally platform

### CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

### The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

The Viability of StockTwits and Google Trends to Predict the Stock Market By Chris Loughlin and Erik Harnisch Spring 2013 Introduction Investors are always looking to gain an edge on the rest of the market.

### So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

### Bayesian Spam Filtering

Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating

### Image Content-Based Email Spam Image Filtering

Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among

### The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

### H. The Study Design. William S. Cash and Abigail J. Moss National Center for Health Statistics

METHODOLOGY STUDY FOR DETERMINING THE OPTIMUM RECALL PERIOD FOR THE REPORTING OF MOTOR VEHICLE ACCIDENTAL INJURIES William S. Cash and Abigail J. Moss National Center for Health Statistics I. Introduction

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Chapter 20: Data Analysis

Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

### 1 Uncertainty and Preferences

In this chapter, we present the theory of consumer preferences on risky outcomes. The theory is then applied to study the demand for insurance. Consider the following story. John wants to mail a package

### Software Solutions Digital Marketing Business Services. Email Marketing. What you need to know

Software Solutions Digital Marketing Business Services Email Marketing What you need to know Contents Building Your Email List 1 Managing Your Email List. 2 Designing Your Emails 3 Branding Your Emails.....

### Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

### Globally Optimal Crowdsourcing Quality Management

Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford

### Savita Teli 1, Santoshkumar Biradar 2

Effective Spam Detection Method for Email Savita Teli 1, Santoshkumar Biradar 2 1 (Student, Dept of Computer Engg, Dr. D. Y. Patil College of Engg, Ambi, University of Pune, M.S, India) 2 (Asst. Proff,

### A survey on click modeling in web search

A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

### Predicting IMDB Movie Ratings Using Social Media

Predicting IMDB Movie Ratings Using Social Media Andrei Oghina, Mathias Breuss, Manos Tsagkias, and Maarten de Rijke ISLA, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### Decision Theory. 36.1 Rational prospecting

36 Decision Theory Decision theory is trivial, apart from computational details (just like playing chess!). You have a choice of various actions, a. The world may be in one of many states x; which one

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Keywords social media, internet, data, sentiment analysis, opinion mining, business

Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Software Engineering 4C03 SPAM

Software Engineering 4C03 SPAM Introduction As the commercialization of the Internet continues, unsolicited bulk email has reached epidemic proportions as more and more marketers turn to bulk email as

### Designing a Lead Lifecycle in Salesforce

Designing a Lead Lifecycle in Salesforce A Best Practices White Paper for Response Management Better data. Better marketing. Table of Contents Introduction 4 The Words We Use 4 What is a Lead? 4 Evolving

### A Description of Consumer Activity in Twitter

Justin Stewart A Description of Consumer Activity in Twitter At least for the astute economist, the introduction of techniques from computational science into economics has and is continuing to change

### Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

### BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Encoding Text with a Small Alphabet

Chapter 2 Encoding Text with a Small Alphabet Given the nature of the Internet, we can break the process of understanding how information is transmitted into two components. First, we have to figure out

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

### The Bass Model: Marketing Engineering Technical Note 1

The Bass Model: Marketing Engineering Technical Note 1 Table of Contents Introduction Description of the Bass model Generalized Bass model Estimating the Bass model parameters Using Bass Model Estimates

### Webmail Friends & Exceptions Guide

Webmail Friends & Exceptions Guide Add email addresses to the Exceptions List and the Friends List in your Webmail account to ensure you receive email messages from family, friends, and other important

### Introduction. A. Bellaachia Page: 1

Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

### Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

### Dissecting the Learning Behaviors in Hacker Forums

Dissecting the Learning Behaviors in Hacker Forums Alex Tsang Xiong Zhang Wei Thoo Yue Department of Information Systems, City University of Hong Kong, Hong Kong inuki.zx@gmail.com, xionzhang3@student.cityu.edu.hk,

### MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group

Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### EnterGroup offers multiple spam fighting technologies so that you can pick and choose one or more that are right for you.

CONFIGURING THE ANTI-SPAM In this tutorial you will learn how to configure your anti-spam settings using the different options we provide like Challenge/Response, Whitelist and Blacklist. EnterGroup Anti-Spam

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Effective Replenishment Parameters By Jon Schreibfeder

WINNING STRATEGIES FOR THE DISTRIBUTION INDUSTRY Effective Replenishment Parameters By Jon Schreibfeder >> Compliments of Microsoft Business Solutions Effective Replenishment Parameters By Jon Schreibfeder

Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles

Customer Referral Programs A How-To Guide to Help You Generate Better Sales Leads Whatare Customer Referral Programs? Customer referral programs are a simple, low cost and effective marketing strategy,

### Commtouch RPD Technology. Network Based Protection Against Email-Borne Threats

Network Based Protection Against Email-Borne Threats Fighting Spam, Phishing and Malware Spam, phishing and email-borne malware such as viruses and worms are most often released in large quantities in

### Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 2 Solutions

Math 70/408, Spring 2008 Prof. A.J. Hildebrand Actuarial Exam Practice Problem Set 2 Solutions About this problem set: These are problems from Course /P actuarial exams that I have collected over the years,

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### INBOX. How to make sure more emails reach your subscribers

INBOX How to make sure more emails reach your subscribers White Paper 2011 Contents 1. Email and delivery challenge 2 2. Delivery or deliverability? 3 3. Getting email delivered 3 4. Getting into inboxes

### Applying Data Science to Sales Pipelines for Fun and Profit

Applying Data Science to Sales Pipelines for Fun and Profit Andy Twigg, CTO, C9 @lambdatwigg Abstract Machine learning is now routinely applied to many areas of industry. At C9, we apply machine learning

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Leveraging Social Media

Leveraging Social Media Social data mining and retargeting Online Marketing Strategies for Travel June 2, 2014 Session Agenda 1) Get to grips with social data mining and intelligently split your segments

### South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

### LEVERAGING BIG DATA & ANALYTICS TO IMPROVE EFFICIENCY. Bill Franks Chief Analytics Officer Teradata July 2013

LEVERAGING BIG DATA & ANALYTICS TO IMPROVE EFFICIENCY Bill Franks Chief Analytics Officer Teradata July 2013 Agenda Defining The Problem Defining The Opportunity Analytics For Compliance Analytics For

### The Scientific Guide To: Email Marketing 30% OFF

The Scientific Guide To: Email Marketing 30% OFF Who is this guide for? All Marketing and ecommerce Managers at B2C companies. Introduction Science gives us the power to test assumptions by creating experiments

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### From Web Analytics to Engagement Analytics

white paper From Web Analytics to Engagement Analytics Table of Contents Executive Summary 1 Engagement Analytics 2 Measuring Quality with Engagement Value Points 2 Engagement Analytics Examples 5 Overall

### Spam Filtering based on Naive Bayes Classification. Tianhao Sun

Spam Filtering based on Naive Bayes Classification Tianhao Sun May 1, 2009 Abstract This project discusses about the popular statistical spam filtering process: naive Bayes classification. A fairly famous

### About the Junk E-mail Filter

1 of 5 16/04/2007 11:28 AM Help and How-to Home > Help and How-to About the Junk E-mail Filter Applies to: Microsoft Office Outlook 2003 Hide All The Junk E-mail Filter in Outlook is turned on by default,

### A Practical Application of Differential Privacy to Personalized Online Advertising

A Practical Application of Differential Privacy to Personalized Online Advertising Yehuda Lindell Eran Omri Department of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il,omrier@gmail.com

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### Computer Science Teachers Association Analysis of High School Survey Data (Final Draft)

Computer Science Teachers Association Analysis of High School Survey Data (Final Draft) Eric Roberts and Greg Halopoff May 1, 2005 This report represents the first draft of the analysis of the results

### CONFIGURING FUSEMAIL ANTI-SPAM

CONFIGURING FUSEMAIL ANTI-SPAM In this tutorial you will learn how to configure your anti-spam settings using the different options we provide like FuseFilter, Challenge/Response, Whitelist and Blacklist.

### How to Drive. Online Marketing through Web Analytics! Tips for leveraging Web Analytics to achieve the best ROI!

How to Drive Online Marketing through Web Analytics! Tips for leveraging Web Analytics to achieve the best ROI! an ebook by - Delhi School Of Internet marketing 84 84 Table of Content Introduction Chapter1:

### Lab 11. Simulations. The Concept

Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

### Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

### Discovering process models from empirical data

Discovering process models from empirical data Laura Măruşter (l.maruster@tm.tue.nl), Ton Weijters (a.j.m.m.weijters@tm.tue.nl) and Wil van der Aalst (w.m.p.aalst@tm.tue.nl) Eindhoven University of Technology,

### Cold Calling in the 21st Century

Contents Sales Messages vs. Marketing Messages Sales Messages Are Active Ways to Promote Your Products Marketing Messages Passively Let Potential Clients Make Up Their Own Minds How Your Salespeople Can

### Observing Data Quality Service Level Agreements: Inspection, Monitoring, and Tracking

A DataFlux White Paper Prepared by: David Loshin Observing Data Quality Service Level Agreements: Inspection, Monitoring, and Tracking Leader in Data Quality and Data Integration www.dataflux.com 877 846

### Microsoft Outlook 2010 Hints & Tips

IT Services Microsoft Outlook 2010 Hints & Tips Contents Introduction... 1 What Outlook Starts Up In... 1 Sending Email Hints... 2 Tracking a Message... 2 Saving a Sent Item... 3 Delay Delivery of a Single

### A Framework for Highly Available Services Based on Group Communication

A Framework for Highly Available Services Based on Group Communication Alan Fekete fekete@cs.usyd.edu.au http://www.cs.usyd.edu.au/ fekete Department of Computer Science F09 University of Sydney 2006,

### ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

### Predicting Airfare Prices

Predicting Airfare Prices Manolis Papadakis Introduction Airlines implement dynamic pricing for their tickets, and base their pricing decisions on demand estimation models. The reason for such a complicated

### Why Modern B2B Marketers Need Predictive Marketing

Why Modern B2B Marketers Need Predictive Marketing Sponsored by www.raabassociatesinc.com info@raabassociatesinc.com www.mintigo.com info@mintigo.com Introduction Marketers have used predictive modeling

### Reducing the Costs of Employee Churn with Predictive Analytics

Reducing the Costs of Employee Churn with Predictive Analytics How Talent Analytics helped a large financial services firm save more than \$4 million a year Employee churn can be massively expensive, and

### Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights

### Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

### Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

### LASTLINE WHITEPAPER. Using Passive DNS Analysis to Automatically Detect Malicious Domains

LASTLINE WHITEPAPER Using Passive DNS Analysis to Automatically Detect Malicious Domains Abstract The domain name service (DNS) plays an important role in the operation of the Internet, providing a two-way

### The Best of Both Worlds:

The Best of Both Worlds: A Hybrid Approach to Calculating Value at Risk Jacob Boudoukh 1, Matthew Richardson and Robert F. Whitelaw Stern School of Business, NYU The hybrid approach combines the two most

### Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality

Measurement and Metrics Fundamentals Lecture Objectives Provide some basic concepts of metrics Quality attribute metrics and measurements Reliability, validity, error Correlation and causation Discuss