User Authentication/Identification From Web Browsing Behavior



Similar documents
Using multiple models: Bagging, Boosting, Ensembles, Forests

Data Mining. Nonlinear Classification

Social Media Mining. Data Mining Essentials

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Conclusions and Future Directions

Supervised Feature Selection & Unsupervised Dimensionality Reduction

BehavioSec participation in the DARPA AA Phase 2

A Study of Web Log Analysis Using Clustering Techniques

Knowledge Discovery from patents using KMX Text Analytics

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Information Management course

SIGNATURE VERIFICATION

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Knowledge Discovery and Data Mining

Azure Machine Learning, SQL Data Mining and R

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Open-Set Face Recognition-based Visitor Interface System

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Biometric Authentication using Online Signatures

KEITH LEHNERT AND ERIC FRIEDRICH

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

How To Make A Credit Risk Model For A Bank Account

Biometric Authentication using Online Signature

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Introduction to Data Mining

Support Vector Machines for Dynamic Biometric Handwriting Classification

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Predicting website audience demographics based on browsing history

Environmental Remote Sensing GEOG 2021

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Model Selection. Introduction. Model Selection

Machine Learning for Data Science (CS4786) Lecture 1

Introducing diversity among the models of multi-label classification ensemble

Efficient on-line Signature Verification System

Character Image Patterns as Big Data

Knowledge Discovery and Data Mining

User Authentication using Combination of Behavioral Biometrics over the Touchpad acting like Touch screen of Mobile Device

Local features and matching. Image classification & object localization

Data Mining of Web Access Logs

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

II. RELATED WORK. Sentiment Mining

Cross Validation. Dr. Thomas Jensen Expedia.com

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

The Data Mining Process

Why Google Analytics Cannot Be Used For Educational Web Content

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

How To Solve The Kd Cup 2010 Challenge

Data, Measurements, Features

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

E-commerce Transaction Anomaly Classification

Data Mining with SQL Server Data Tools

SVM Ensemble Model for Investment Prediction

2 Decision tree + Cross-validation with R (package rpart)

Knowledge Discovery and Data Mining

Performing a Web Application Security Assessment

Spam detection with data mining method:

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Performance Metrics for Graph Mining Tasks

Robust Network Traffic Classification

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Employer Health Insurance Premium Prediction Elliott Lui

Learning is a very general term denoting the way in which agents:

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Forschungskolleg Data Analytics Methods and Techniques

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Machine Learning with MATLAB David Willingham Application Engineer

Data Mining Practical Machine Learning Tools and Techniques

Scalable Developments for Big Data Analytics in Remote Sensing

On Entropy in Network Traffic Anomaly Detection

SPATIAL DATA CLASSIFICATION AND DATA MINING

AS MORE WIRELESS and sensor networks are deployed,

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining for Knowledge Management. Classification

WiNG5 CAPTIVE PORTAL DESIGN GUIDE

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Decision Support Systems

Unsupervised Data Mining (Clustering)

Predict Influencers in the Social Network

Alternative Biometric as Method of Information Security of Healthcare Systems

Context Aware Predictive Analytics: Motivation, Potential, Challenges

The Scientific Data Mining Process

Predicting Flight Delays

Fingerprinting the Datacenter: Automated Classification of Performance Crises

User Guide BMC Remedy Support System

Mining Wiki Usage Data for Predicting Final Grades of Students

Transcription:

User Authentication/Identification From Web Browsing Behavior US Naval Research Laboratory PI: Myriam Abramson, Code 5584 Shantanu Gore, SEAP Student, Code 5584 David Aha, Code 5514 Steve Russell, Code 5584 first.last@nrl.navy.mil DARPA AAUTH Meeting 09/19/13 The views, opinions, and/or findings contained in this article/presentation are those of the author/presenter and should not be interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense. 1

Outline Objective Human Subject Research Web behavior features User Authentication with Ensemble of One-class SVMs Associative Patterns of Web Browsing Behavior Future work 2

Active Authentication Performer Overview and Status Naval Research Laboratory (funded by NRL) PROGRAM OVERVIEW AND STATUS BIOMETRIC: Identification of users through Web browsing. behavior. Develop the theoretical foundations and supporting algorithms for the detection, tracking and prediction of Web browsing behavior using information available from the address line of the browser. The biometric is the activity patterns that can be captured in the browser including the timing of clicks, the type of page visited, the length of a session, the revisit rate, etc. Behavioral Web Analytics Key Objectives Identify, extract and analyze features of Web behavior collected in a user study. Investigate structured prediction methods to authenticate users based on their Web browsing behavior. Develop a genre palette to categorize webpages for authentication and identification purposes. Analyze a large clickstream dataset obtained from comscore, Inc. Status Browser extensions for tracking and monitoring user Web behavior completed and deployed in an ongoing user study. Completed detailed analysis of initial user study dataset and identified key features of Web browsing behavior. Completed user authentication approach using ensemble of one-class SVMs and random subspace method with best FRR: 11%; and best FAR: 7% ; average FRR: 17% and average FAR: 18% Investigation of spatio-temporal models of Web browsing behavior with structured prediction methods under way. Team Members Principal Investigator: Myriam Abramson, Code 5584 David W. Aha, Code 5514 Steve Russell, Code 5584

Clickstream data Clickstream data: UserId, Time, URL visited, browsing agent Click Server Clickstream data Internet Access Log 4

Human Subject Research 12 volunteers! 5

Web Behavior Features Sessions: series of consecutive clicks delimited by pauses of 30 mins or longer Global session features Session duration Session length Day-of-week Time-of-day Number of unique hosts Time-variant distributions Time-between-revisit distribution Pause distribution Burstiness distribution Genre distribution 1 1 http://www.diffbot.com 6

Time-variant distributions Time-between-revisit Time between webpage revisits within a certain timeframe Pauses Time interval between 2 consecutive clicks Burstiness Difference between 2 consecutive pauses 7

User Authentication Task: One-class SVMs Unsupervised learning problem Like clustering but solves a discriminative problem (self or not self) Moves the data to a highdimensional space with a kernel (e.g. Gaussian Kernel) LibSVM: Takes the origin as the only support vector from the complement class Authentication Metric: false rejection rate (FRR) and false acceptance rate (FAR) 8

Ensemble Learning: Random Subspace (Abramson, et al., FLAIRS-26) Varies the set of features of an ensemble of learners (one-class SVMs) for diversity Pool of learners with different feature sets Select subset of learners with weighted sampling on internal 2- fold cross-validation Weighted vote Findings: No best feature(s) across all volunteers (a profile-based approach should work) Shorter time spans with high resolution are better discriminator 9

Empirical results 10

Associative Patterns of Web Browsing Behavior (Abramson et al., AAAI Fall Symp) 11

Temporal ordering matters! Shuffling clicks and partitioning into training and test sets preserves the original distribution and gives 100% prediction accuracy using Hamming distance in NRL study dataset! But preserving the temporal order of the clicks gives only 75% prediction accuracy. Volunteer 1 Train Volunteer 1 Test Volunteer 2 Test 12

Hopfield Identification Approaches Identification Methods Temporal Sessions NRL study comscore 1 st Top 2 1 st Top 2 Tournament 75 83 72 75 All-pairs 75 100 73 81 Hamming 75 100 72 79 Tournament Approach No significant difference with Hamming distance metric 13

Future Work Temporal predictive models (CRFs) Genre classification with categories pertinent to identification/authentication Robustness of predictive analytics Concept drift (context change) Label noise Partially-labelled sequences Intent recognition e.g. evasive behavior 14