Computer Forensics Application. ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification



Similar documents
Big Data Analytics Process & Building Blocks

DATA MINING TECHNIQUES AND APPLICATIONS

Online Fraud Detection Model Based on Social Network Analysis

Lecture 13: Validation

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware

Face Recognition For Remote Database Backup System

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

IMAGE PROCESSING BASED APPROACH TO FOOD BALANCE ANALYSIS FOR PERSONAL FOOD LOGGING

Introduction to Data Mining

Music Mood Classification

Data Analysis of Trends in iphone 5 Sales on ebay

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

How To Create A Text Classification System For Spam Filtering

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Customer Classification And Prediction Based On Data Mining Technique

Introduction to Data Mining

Learn to Personalized Image Search from the Photo Sharing Websites

Machine Learning Capacity and Performance Analysis and R

DATA PREPARATION FOR DATA MINING

Big Data: Image & Video Analytics

Analyzing survey text: a brief overview

Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Image Classification for Dogs and Cats

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Social Prediction in Mobile Networks: Can we infer users emotions and social ties?

Final Project Report

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

SOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS. Nitin Khanna and Edward J. Delp

A Logistic Regression Approach to Ad Click Prediction

Author Gender Identification of English Novels

Galaxy Morphological Classification

Classification algorithm in Data mining: An Overview

Micro blogs Oriented Word Segmentation System

Employer Health Insurance Premium Prediction Elliott Lui

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

Cross-Validation. Synonyms Rotation estimation

Blog Post Extraction Using Title Finding

Florida International University - University of Miami TRECVID 2014

Social Media Mining. Data Mining Essentials

The Data Mining Process

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

Steven C.H. Hoi School of Information Systems Singapore Management University

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Automatic Reconstruction of Parametric Building Models from Indoor Point Clouds. CAD/Graphics 2015

A Novel Classification Approach for C2C E-Commerce Fraud Detection

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Cluster Analysis for Evaluating Trading Strategies 1

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Machine Learning Approach to Handle Fraud Bids

Statistical Feature Selection Techniques for Arabic Text Categorization

Customer Relationship Management using Adaptive Resonance Theory

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

Hong Kong Stock Index Forecasting

AN ENHANCED APPROACH FOR CONTENT FILTERING IN SPAM DETECTION

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Dynamic Composition Techniques for Video Production

MapReduce Approach to Collective Classification for Networks

Evaluating Software Products - A Case Study

Spend Enrichment: Making better decisions starts with accurate data

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Design call center management system of e-commerce based on BP neural network and multifractal

A Feature Selection Methodology for Steganalysis

MusicGuide: Album Reviews on the Go Serdar Sali

IBM SPSS Data Preparation 22

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Data Mining for Knowledge Management. Classification

Pattern Classification Techniques for the Ultrasonographic Periodontal Probe Introduction Figure 1:

How To Use Neural Networks In Data Mining

Information Management course

Automatic Text Processing: Cross-Lingual. Text Categorization

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER

Data Mining. Nonlinear Classification

Scalable Developments for Big Data Analytics in Remote Sensing

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Introduction. A. Bellaachia Page: 1

Research on Sentiment Classification of Chinese Micro Blog Based on

A Framework of User-Driven Data Analytics in the Cloud for Course Management

Travis Goodwin & Sanda Harabagiu

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Open issues and research trends in Content-based Image Retrieval

Robust Network Traffic Classification

Transcription:

Computer Forensics Application ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification

Project Overview A new framework that provides additional clues extracted from images and free-form texts for: Authorship identification Multi-account linking

Three main modules: Picture Quality Assessment: classifying product listing images into professional/amateur/stock images Image Editing Style Summarization: summarize the common patterns shared by images under the same seller Image Embedded Text Analysis: extract and summarize embedded text from listing images

Module I: Picture Quality Assessment Assessment scheme I: classify images into {professional, amateur} classes Quality-based classification: Professional: clean (or human edited) background, object in focus, good lighting, etc. Amateur: otherwise.

Module I: Picture Quality Assessment Algorithm: Visual features that encode background clutter, color, lighting, sharpness, etc. are computed for all images. An SVM model is trained on the visual features and used to label the testing data. Five-fold cross validation is performed on the training data to select the optimal parameters for SVM (achieving 90.1% accuracy). The optimal parameters are used to re-train the model on all training data. The re-trained model is applied to label the testing data (achieving 89.9% accuracy). http://students.cis.uab.edu/galabing/ebay/images/q uality

Module I: Picture Quality Assessment Assessment scheme II: classify images into {stock, non-stock} classes A different way to categorize images: whether they are unique to one user, or shared by multiple users (stock). Algorithm: Match near-duplicate images across users: given a testing image, we retrieve all of its near-duplicates, and count the number of users to determine whether the testing image is a stock image. http://students.cis.uab.edu/galabing/ebay/images/stock/s tock-table.html

Module II Image Editing Style Summarization SIFT (Scale-invariant feature transform) feature based summarization

Module II Image Editing Style Summarization ---SIFT feature based Interest point features capture image points with distinctive texture so that the same points can be robustly matched across multiple images. We assume that many ebay users have developed distinctive editing styles over time to embed to their product images (logos, background, etc.) Such editing styles are highly repetitive within one user's images, but mostly distinctive among different users'. We detect and encode such editing styles for each user using interest point features, and in turn use the encoded editing styles to automatically predict the ownership for unlabeled images.

Module II Image Editing Style Summarization ---SIFT feature based We collected 917 images from 10 ebay users, and applied a 10-fold cross validation. Give a set of (training) images for a user: Extract PCA-SIFT features from all images. Match PCA-SIFT features among all pairs of images. Naïve method: O(N^2) Our method: O(NlogN) Both yield almost identical results In total, we achieved 96.6% of accuracy in ownership prediction. http://students.cis.uab.edu/galabing/ebay/images /signature2/

Lin Yang, Wei-Bang Chen, Chengcui Zhang, John Johnstone, Gary Warner, and Song Gao, Profiling Online Auction Sellers by Image Editing Styles, Accepted for publication, IEEE Multimedia.

Computer Forensics Applications (cont.) Uncovering auction fraud from ebay transaction graph - Initial study

Data set: ebay transaction feedbacks A total of 220,000 (two-hundred and twenty thousand) users are crawled. Idea of belief propagation: Fraudsters create two types of identities - fraud and accomplice, where fraud identities are the ones used eventually to carry out the actual fraud, and the accomplice identities are the ones used to help build the reputation for the fraud identities. This pattern forms a near bipartite core in the transaction graph.

Algorithm: Each vertex in the transaction graph is labeled by one of {fraud, accomplice, honest} based on their pattern of interaction with other vertexes. Belief propagation (BP) is used to optimize the labeling across the entire graph by maximizing the joint probabilities of all the vertexes. Honest user model: Barabasi-Albert model

Evaluation results on the sparse ebay transaction dataset 20% accomplice 50% fraud??? What can be improved: Network too sparse (average degree is ~5, ideally >=10) Initial probabilities (1/3, 1/3, 1/3) may not make sense. BP seems not to scale well with large graphs.