Crowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis

Similar documents
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

Studying an Adaptive Evaluation Game using Ebbinghaus' Forgetting Theory and Leitner's Flashcard Mnemonic

How To Filter Spam Image From A Picture By Color Or Color

Domain Classification of Technical Terms Using the Web

UPS battery remote monitoring system in cloud computing

An Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

English Idiom Educational Game Design & Implementation Using Acceleration Sensors

Research on Operation Management under the Environment of Cloud Computing Data Center

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition

Operation and Maintenance Management Strategy of Cloud Computing Data Center

An Imbalanced Spam Mail Filtering Method

Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System

Collecting Polish German Parallel Corpora in the Internet

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Design of Remote data acquisition system based on Internet of Things

CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW

Blog Post Extraction Using Title Finding

Modern Agricultural Digital Management Network Information System of Heilongjiang Reclamation Area Farm

Security System in Cloud Computing for Medical Data Usage

The Evaluation Model of HD Interactive TV Shopping Service

Fault Analysis in Software with the Data Interaction of Classes

Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Modeling of Knowledge Transfer in logistics Supply Chain Based on System Dynamics

Design call center management system of e-commerce based on BP neural network and multifractal

Lasso-based Spam Filtering with Chinese s

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

A Study of Low Cost Meteorological Monitoring System Based on Wireless Sensor Networks

Ultrasonic Detection Algorithm Research on the Damage Depth of Concrete after Fire Jiangtao Yu 1,a, Yuan Liu 1,b, Zhoudao Lu 1,c, Peng Zhao 2,d

Enhancing Quality of Data using Data Mining Method

A Survey on Bilingual Teaching in Higher Education Institute in the Northeast of China

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Research on ERP Teaching Model Reform for Application-oriented Talents Education

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks

A Method for Load Balancing based on Software- Defined Network

U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 1, 2015 ISSN

Mimicking human fake review detection on Trustpilot

Research on Trust Management Strategies in Cloud Computing Environment

An Empirical Study of Application of Data Mining Techniques in Library System


A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers

Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme

Application of Data Mining Techniques in Intrusion Detection

Research and Implementation of View Block Partition Method for Theme-oriented Webpage

Design of Hospital EMR Management System

How To Use Neural Networks In Data Mining

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Less naive Bayes spam detection

The Research of Vancl Network Marketing

Big Data with Rough Set Using Map- Reduce

Research on Credibility Measurement Method of Data In Big Data

CENG 734 Advanced Topics in Bioinformatics

Research on Capability Assessment of IS Outsourcing Service Providers

Analysis of Regression Based on Sampling Weights in Complex Sample Survey: Data from the Korea Youth Risk Behavior Web- Based Survey

A WEB-BASED KNOWLEDGE AIDED TUTORING SYSTEM FOR VEGETABLE SUPPLY CHAIN

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

A Proxy-Based Data Security Solution in Mobile Cloud

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

A Prediction-Based Transcoding System for Video Conference in Cloud Computing

Study on the Evaluation for the Knowledge Sharing Efficiency of the Knowledge Service Network System in Agile Supply Chain

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

Direct Evidence Delay with A Task Decreases Working Memory Content in Free Recall

SEO Techniques for various Applications - A Comparative Analyses and Evaluation

Understanding Learners Cognitive Abilities: A Model of Mobilizing Non-English Majors Cognitive Abilities in the Process of Their Writing in English

Comments on "public integrity auditing for dynamic data sharing with multi-user modification"

SH-Sim: A Flexible Simulation Platform for Hybrid Storage Systems

User Behavior Research of Information Security Technology Based on TAM

Design of Electric Energy Acquisition System on Hadoop

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search

Review of the Techniques for User Management System

The Study on the Effect of Background Music on Customer Waiting Time in Restaurant

E-commerce Transaction Anomaly Classification

Crime Hotspots Analysis in South Korea: A User-Oriented Approach

How To Cluster On A Search Engine

Network Attack Platform

Transcription:

, pp.138-142 http://dx.doi.org/10.14257/astl.2013.31.31 Crowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis Li Peng 1,2, Yu Xiao-yang 1, Liu Yang 2, Bi Ting-ting 2 1 Higher Educational Key Laboratory for Measuring and Control Technology, Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, 150080 Harbin, China 2 School of Computer Science and Technology, Harbin University of Science and Technology, 150080 Harbin, China {pli, yuxiaoyang }@hrbust.edu.cn. Abstract. This paper present a new crowdsourcing fraud detection method based on psychological behavior analysis. We apply Ebbinghaus forgetting curve to find out the spammer according to the psychological difference between deception and reliable behavior. The experimental results show that our method is effective and feasible. Keywords: Crowdsourcing, Ebbinghaus Forgetting Curve; Fraud Detection 1 Introduction Crowdsourcing, a new organization form and cooperation pattern in the process of enterprise production, was grown with the rapid popularity of Internet [1]. Enterprises actively utilize many online user resources and allocate the outsourcing task to the interest groups by means of crowdsourcing technology, solving some limitations of traditional outsourcing service. In recent years, crowdsourcing technology has become a focus of research and researchers around the world have realized it in some practical applications. For example, Rensnik combined monolingual crowdsourcing and targeted paraphrasing to improve the quality of pure machine translation [2] ; Hwang introduced mobile-based crowdsourcing into the field of environmental audio recognition, to improve the performance of mobile devices in filtering background noise [3] ; Fritz applied crowdsourcing to global land cover, to solve the problem of ignoring potential land in statistical work [4]. The rapid development of crowdsourcing technology drew the Text REtrieval Conference s (TREC) attention and crowdsourcing track has attracted more than 100 groups around the world to join the competition in TREC 2013. Unfortunately, some unreliable workers emerged due to profit driven in the practical application of crowdsourcing. Their results seriously reduce the quality and bring about the initiator s judgment biases [5]. Therefore, the kernel of crowdsourcing quality improvement is how to find the spammer. Fraud behavior is a kind of people's mental activity; psychological methods maybe make effective judgment on it in a ISSN: 2287-1233 ASTL Copyright 2013 SERSC

certain extent. Consequently, we put forward a crowdsourcing fraud detection method based on psychological analysis according to the psychological difference between deception and reliable behavior. 2 Behavior Analysis Based on Ebbinghaus Forgetting Curve The German psychologist H.Ebbinghaus researched the basic rule of human memory and oblivion, and put forth "the function of time and memory" as shown in the formula (1) [6]. OriginalLearning stands for the number of writing from memory, when he remembered all materials in the first time. After a while, Relearning is the number. Thus retention scores are obtained and represented by SavingScore. OrignalLearning Relearning SavingScore = 100 (1) OrignalLearning According to the formula (1), Ebbinghaus forgetting curve can be drawn (see Figure 1 ). In this figure, the longitudinal axis represents the memory retention scores; The horizontal axis represents elapsed time since learning. The cure conducts a quantitative expression for the forgetting rules in learning process resulting oblivion can be calculated. Human oblivion is an unbalanced development, Memory is forgotten very quickly in the initial stage, and then slows down gradually, after a certain time almost no longer forgotten. Fig. 1. Ebbinghaus forgetting curve Fraud is usually divided into two basic types. One is random spammer, they submit results randomly and it is difficult to identify them; the other is uniform spammer, they just submit results regularly and this type is easy to be found [7]. Therefore, we only focus on the former. The principle of our detection method is crowdsourcing participants will produce different memory rules owing to different psychological state in the process of judgment to the crowdsourcing task. Their behavior is a specific embodiment of mental activity, no matter he is a spammer or credible person. The trusted participant will strictly comply with the requirements and think hard when judging the relevance Copyright 2013 SERSC 139

of pairs, so It will produce a deep memory in his mind. This is the general process of human memory accord with Ebbinghaus forgetting rule. However, the spammer will only spend little or no effort on crowdsourcing and complete tasks mechanically. They lack the understanding memory about task content and their forgetting states don t comply with Ebbinghaus forgetting rule. According to above psychological differences and quantitative expression of Ebbinghaus forgetting curve, the flow chat of our algorithm is shown in Fig.2. Workers will rejudge a certain amount of repeated pairs in limited time, and then the work s SavingScore and threshold is compared, low SavingScore workers are considered as the spammer. Begin Initialize A O W T Start Timer 1 Select a pair from A or W r andomly W is empty? Select a pair fr om A r andomly TImer 1 r each 40min for the first time? yes The pair is from A? Escor e>= 58? Escor e>= 40? Put the pair into O after being ju dged Start Timer 2 Put the pair into W after 20 min Put the pair into T after being judged in time allocated Timer 1 r each 40min and 100 pairs in T yes Calculate wor k er s savingscore Clean O W T Filter the wor k er s ju dgments Label the worker as spammer End Fig. 2. The workflow of our algorithm 140 Copyright 2013 SERSC

3 The Results and Analysis of Experiments We constructed an online crowdsourcing experiment platform to verify the validity of our method. In the Foreign Broadcast Information Service data, a total of 1380 documents were selected from 4 themes as task document collection. All of these documents have the standard relevance labels, and contain 1120 non-relevant documents, 260 relevant documents. The experiment platform randomly assigned these tasks to 76 online students with computer background, let them label pairs. we utilize LAM(Logistic Average Misclassification )and AUC(receiver operating characteristic curve)to evaluate the effectiveness of our method. Our method filter 13 ones from 76 workers, 11 spammers and 2 trusted workers. In addition, 3 spammers are not detected; recall and precision are 0.93 and 0.79 respectively. Final labels were selected by majority vote before and after filtration, using AUC and LAM assessed 4 topics respectively, and then assessed the whole. As a result, the overall LAM decreased by 3.7%, AUC increased by 8%. 5 Conclusion This paper proposed an effective solving strategy on crowdsourcing fraud detection by means of psychological behavior analysis method. We creatively apply Ebbinghaus forgetting curve to find out the spammer according to the psychological difference between fraud and reliable behavior. This is an exciting exploration because we successfully applied the psychological method to the field of computer science. The experimental results show that our method is effective and feasible. Acknowledgement. This paper is supported by National Natural Science Foundation of China ( 61103149) and China Postdoctoral Science Foundation (2011M500682). References 1. Dai, Peng, Christopher, H., Daniel, S.: POMDP-based Control of Workflows for Crowdsourcing. Artificial Intelligence. 202, 52--85 (2013) 2. Resnik, P., Buzek, O., Kronrod, Y., Quinn, A.J., Bederson, B.B.: Using Targeted Paraphrasing and Monolingual Crowdsourcing to Improve Translation. ACM Transactions on Intelligent Systems and Technology. 43, 185--197 (2013) 3. Hwang, K., Lee, S.Y.: Environmental Audio Scene and Activity Recognition through Mobile-based Crowdsourcing. IEEE Transaction on Consumer Electronics. 582, 700-- 705(2012) 4. Fritz, S., McCalium, I., Schill, C., Perger, C.: Geo-wiki.org: The Use of Crowdsourcing to Improve global land cover. Remote Sensing. 13, 345--354(2009) 5. Zhang, Z.Q.:Research on Crowdsourcing Quality Control Strategies and Evaluation Algorithm. Chinese Journal of Computer. 368, 1636--1649(2013) Copyright 2013 SERSC 141

6. Luo, N., Yuan, F.:Detection User s Long-term Interest Based on Ebbinghaus Forgetting Carve. ICIC Express Letters, Part B: Applications. 25, 1151--1155(2011) 7. Jeroen, B.P., Arijen P.D.: Obtaining High-quality Relevance Judgments Using Crowdsourcing. 165, IEEE Internet Computing. 20--27(2012) 142 Copyright 2013 SERSC