Social media based analytical framework for financial fraud detection



Similar documents
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms

Financial Statement Fraud Detection by Data Mining

A Review of Financial Accounting Fraud Detection based on Data Mining Techniques

DETECTION OF CREATIVE ACCOUNTING IN FINANCIAL STATEMENTS BY MODEL THE CASE STUDY OF COMPANIES LISTED ON THE STOCK EXCHANGE OF THAILAND

Social Media Mining. Data Mining Essentials

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Equity forecast: Predicting long term stock price movement using machine learning

Manipulating Receivables: A Comparison Using the SEC s Accounting and Auditing Enforcement Releases

Predicting Bankruptcy with Robust Logistic Regression

For special fraud investigations the Audit Committee has the authority to:

Data Isn't Everything

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Financial Trading System using Combination of Textual and Numerical Data

A Proposed Prediction Model for Forecasting the Financial Market Value According to Diversity in Factor

Mimicking human fake review detection on Trustpilot

Research on the Factor Analysis and Logistic Regression with the Applications on the Listed Company Financial Modeling.

Fraud Prevention, Detection and Response. Dean Bunch, Ernst & Young Fraud Investigation & Dispute Services

Fraud Detection in Online Reviews using Machine Learning Techniques

Dan French Founder & CEO, Consider Solutions

Data Mining Yelp Data - Predicting rating stars from review text

Text Opinion Mining to Analyze News for Stock Market Prediction

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Data Mining techniques for the detection of fraudulent financial statements

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Using Artificial Intelligence to Manage Big Data for Litigation

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

A Survey of Customer Relationship Management

Spam detection with data mining method:

Sidney Winter Lecture Series. Judee K. Burgoon University of Arizona

A Big Data Analytical Framework For Portfolio Optimization Abstract. Keywords. 1. Introduction

Social Business Intelligence For Retail Industry

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Prevention and Detection of Financial Statement Fraud An Implementation of Data Mining Framework

Voice of the Customer: How to Move Beyond Listening to Action Merging Text Analytics with Data Mining and Predictive Analytics

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

Providing Data Protection as a Service in Cloud Computing

Business Challenges and Research Directions of Management Analytics in the Big Data Era

Hexaware E-book on Predictive Analytics

Neural Network Applications in Stock Market Predictions - A Methodology Analysis

Detecting Fraudulent Financial Reporting through Financial Statement Analysis

Performance Management Systems: Conceptual Modeling

Introduction to Data Mining

Data Mining and Analysis of Online Social Networks

Data mining life cycle in fraud auditing

Data Warehousing and Data Mining in Business Applications

Data Mining Algorithms Part 1. Dejan Sarka

Detecting fraud: Utilizing new technology to advance the audit profession

DATA MINING TECHNIQUES AND APPLICATIONS

Cis330. Mostafa Z. Ali

Visualizing e-government Portal and Its Performance in WEBVS

Information and Decision Sciences (IDS)

DATA MINING IN FINANCE AND ACCOUNTING: A REVIEW OF CURRENT RESEARCH TRENDS

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Customer Relationship Management using Adaptive Resonance Theory

TEXT ANALYTICS INTEGRATION

Prediction of Stock Performance Using Analytical Techniques

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

IBM SPSS Modeler Premium

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

Fraud Risk Management Procedures

Intrusion Detection via Machine Learning for SCADA System Protection

The Data Mining Process

Using Data Mining for Mobile Communication Clustering and Characterization

Graduate Business Programs SDSU College of Business Administration. MBA Program of Study Worksheet. Information Systems Specialization

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

Using Data Analytics to Detect Fraud. Other Data Analysis Techniques

INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 240 THE AUDITOR S RESPONSIBILITY TO CONSIDER FRAUD IN AN AUDIT OF FINANCIAL STATEMENTS CONTENTS

Facilitating Business Process Discovery using Analysis

E-commerce Transaction Anomaly Classification

ANALYTICS IN BIG DATA ERA

Characteristics of Computational Intelligence (Quantitative Approach)

Decision Support Systems

and Hung-Wen Chang 1 Department of Human Resource Development, Hsiuping University of Science and Technology, Taichung City 412, Taiwan 3

Forecasting Fraudulent Financial Statements using Data Mining

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector

Graduate Business Programs SDSU College of Business Administration. MBA Program of Study Worksheet. Health Services Administration Specialization

Graduate Business Programs SDSU College of Business Administration. MBA Program of Study Worksheet. Entrepreneurship Specialization

Master of Science in Health Information Technology Degree Curriculum

Random forest algorithm in big data environment

INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 240 THE AUDITOR S RESPONSIBILITIES RELATING TO FRAUD IN AN AUDIT OF FINANCIAL STATEMENTS

Introduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana

Stock Market Forecasting Using Machine Learning Algorithms

Data quality in Accounting Information Systems

Cleaned Data. Recommendations

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

Data Mining in Financial Application

Data Mining Solutions for the Business Environment

Detection of Financial Statement Fraud using Data Mining Technique and Performance Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

How To Detect Credit Card Fraud

Transcription:

Social media based analytical framework for financial fraud detection Abstract: With more and more companies go public nowadays, increasingly number of financial fraud are exposed. Conventional auditing practices primarily focus on statistical analysis of structured financial and market activity data in financial statement. However, it may fail in case of misleading or well-planned financial report, such as vivid cases of highprofile frauds include Lehman Brothers, WorldCom, and Enron. With the rapid growth of Web 2.0, social media grows to be a hot area for public firms, investors and other stakeholders to share information and interact with each other. This paper has the first try to use textual content in social media to detect financial fraud. A systematic analytical framework is proposed, including the statistical language model analysis part, social network analysis part, and target-dependent sentiment analysis part. Based on this analytical framework, different supervised machine learning algorithms are used to classify the fraud and non-fraud firms. Advantages of this framework over traditional financial fraud detection techniques are to be tested in future research. Keywords: Financial fraud detection; Social media; Analytical framework 1. Introduction With more and more companies go public nowadays, increasingly number of financial fraud are exposed. Falsification of financial information made by public companies not only causes significant financial loss to broad shareholders but also triggers the bankruptcies of some well-known companies. For instance, Enron s case was attributed as the biggest auditing failure, which led to nearly USD$11 billion loss to stockholders [1]. In case of these great lost, there is an urgent need to detect and identify financial fraud, which is instrumental to ensure a fair, open, and transparent financial market. Wang, et al. [2] defined financial fraud as a deliberate act that is contrary to law, rule, or policy with intent to obtain unauthorized financial benefit. In this paper, we deal with the financial statement fraud (FSF) detection, one of the severe financial frauds. The broadly accepted definition of financial statement fraud is articulated by National Commission on Fraudulent Reporting [3] as intentional or reckless conduct, either by act or omission, which results in materially misleading financial statements. Conventional auditing practices and academic researches rely highly on statistical analysis of structured financial and market activity data in financial statements to expose a firm s fraud behavior [4, 5]. However, conventional fraud detection methods work not so well in nowadays circumstances due to the limitation of financial statement. On one hand, since the financial statements are well-planned and ex-anti prepared reports, the management has enough time to conceal the true financial condition, and has the abilities to circumvent different audit mechanism by accounting shenanigans. On the other hand, financial statement is a monologue and a one-way pitch, which is irreciprocal without considering feedbacks and interactions from investors. Considering these shortcomings in previous researches, a new data source that can provide timely, dynamic, updated decision support for investors is needed. In the era of Web 2.0, Internet based social media platform has become an efficient way of investor relations management for companies to communicate with investors, and other stakeholders after big event of the company, such as release of quarterly report and annual report, especially when encounter business crisis, like being accused of fraud by investors or a short-seller [6]. Due to timely, dynamic, updated, and wide-coverage information on social media, it shows a potential for serving as the data source to detect financial fraud. Nevertheless, the volume and variety of the potentially relevant business information is huge, causing cognitive load on both shareholders and data analysts. The research question of this paper is how to effectively employ the information on social media for financial fraud detection. In order to solve this problem, we first categorize unstructured information in social

media that are related with financial statement fraud into three categories, and then propose an analytical framework with three components to analyze each kind of data. This paper unfolds as follows. Section 2 reviews the extant literature and defines the research gap. Section 3 presents the analytical framework for analyzing information in social media. Section 4 deals with the data collection and analysis. Conclusions are in last section. 2. Literature review Since financial fraud detection is a classic research topic, we generalize different financial and nonfinancial measures together with the methodologies used to detect financial fraud in literature. In terms of measure selection, a common thread in prior literature is to develop a checklist, which also known as red flags for auditors in their auditing practice [7]. However, Cecchini, et al. [4] stated that there are some problems with the red flag checklist as a fraud detection mechanism. Another thread is employing quantitative financial ratios to detect potential financial fraud. Considering the financial ratios used, prior studies unvaryingly employed data taken from annual financial reports. Most studies used 8 to 10 financial ratios [8-12]. Other researchers used large feature sets, almost more than 20 ratios [13]. The most popular trend is integrating both quantitative variables (including financial ratios) and qualitative variables (including the red flags) together into an integrated model [14-17]. Recently, researchers started to analyze textual data in financial statement, i.e., MD&A section. Cecchini, et al. [18] extracted linguistic cues from textual information and aggregated them with financial ratios to classify the fraud and non-fraud firms. Humpherys, et al. [19] had the first try to only use linguistic cues to detect financial statement fraud. Glancy and Yadav [20] dealt directly with textual data in MD&A section. In order to handle these different measures, we summarize five types of methodologies used in literature (listed in Table 1). These researches always used just one method, and compared their results with previous researches. In contrast, some researchers employed more than two methods and made an overall comparison [10, 16, 21-23]. Methodology Specific techniques References Total Statistical model Discriminant analysis [13] 1 Logistic regression model [8, 11, 12, 15, 21, 24-26] 8 Fuzzy set theory Fuzzy set model [27, 28] 2 Multi-criteria decision model UTADIS and MHDIS model [29, 30] 2 Neural networks [9, 10, 14, 21, 22, 31] 6 Decision trees [10, 21, 22, 32] 4 Data mining Bayesian belief network [10, 22] 2 K-nearest neighbours [22] 1 Support vector machine [4, 22] 2 Text mining Linguistic cues based method [18, 19, 33] 3 A computational fraud detection model [20] 1 Table 1 Methodologies used in prior fraudulent financial statement detection studies As we can see, previous literature mainly used financial statement to extract the features of fraud and non-fraud companies, and then employ different data mining and text mining techniques to make a classification of fraud and non-fraud ones. However, financial statement is self-limited as we discussed in the Introduction section. Due to the drawbacks of data source in existing research, this paper creatively test the usability of extensive and interactive data in social media for financial fraud detection. Following the design science research paradigm, a social media based analytical framework for detecting financial fraud is developed below. 3. Analytical framework and research methodology

Nowadays, firms broadcast information (e.g., news and financial announcements) in social media and investors tend to rely on social media generated information to assist their investment decisions. This paper resorts to these information in social media to support decision of financial fraud detection. Information about a firm s financial statement in social media mainly presents in textual format, which can be categorized into three types. The first kind is the materials such as reports and announcements that are released online by the focal firm itself, which are viewed as internal factors. The second kind is interaction factors about the contact network between the firm and investors and other stakeholders. The third type is the sentiment of financial news and public mood on focal firm s financial operation conditions and fraud behaviors, which can be called external factors. Considering three kinds of data simultaneously, an analytical framework is proposed to address all the information in social media for financial fraud detection (Figure 1). Figure 1 Analytical framework for social media based financial fraud detection The dashed part represents the data source and analysis technique in previous research, in which financial ratios and/or textual information in financial statements are analyzed by different data mining and/or text mining techniques. Three solid diamond parts in Figure 1 comprise the main parts of this research. They are statistical language model analysis part for analyzing internal factors, social network analysis part for analyzing management s social network connections, and target-dependent sentiment analysis part for analyzing views of financial news and public mood. Each part is discussed in detail below. 3.1 Statistical language model analysis Zhou, et al. [34] has demonstrated the effectiveness of statistical language model (SLM) in deception detection. To analyze the reports and announcements of the focal firm, we introduce SLM into realm of fraudulent financial statement detection. SLM is used to estimate the occurrence probability of a span of text, which can be adapted to detect the strategic use of deceptive language in financial statements. However, SLM is self-limited because of its inability to capture long-span information. To overcome this insufficiency, latent semantic analysis (LSA) framework is integrated with SLM. The integration model not only deals with the inability of SLM to capture long-span information, but also extract the semantic patterns from textual information. Using this integration model, reports and announcements of the firm are represented by a document vector, which can distinguish fraudulent financial statements from non-fraudulent ones.

This approach is significantly superior to existing linguistic cues based methods [18, 19] from at least two perspectives. One is that we do not need to extract a predefined set of cues in advance, which is time consuming and labor intensive. The other is that SLM can essentially model the dependency relationships between words in natural language, which is of great importance but is ignored in previous research. 3.2 Social network analysis In social media platform, managers declare the operation conditions and financial abilities about the focal firm. At the same time, investors can interact and argue with them, and share their own opinions about its performance. Through this interaction, the managers, investors and other stakeholders form a social network, in which normal information and fraudulent information transfers between nodes. Pak and Zhou [35] has the first try to use a social structure model to investigate the deception in computer mediated communication. Chiu, et al. [36] used social network to detect Internet auction fraud, and they found the structures of trade network differ greatly between fraudulent sellers and normal ones. Based on previous work, I assume that structural characteristics of online social networks for fraudulent firms and non-fraudulent firms are quite different. The social structure features used in this paper includes centrality (i.e., in-degree, outdegree, closeness, betweenness), cohesion, similarity, and multiplexity. 3.3 Target-dependent sentiment analysis Since business press acts as intermediary for information transfer from the firm to investment community, views of financial news about firm s financial statement can serve as a reliable reference for investors. In addition, sentiment in public mood can also help to find clues about firm s financial statement. However, different from general sentiment analysis, financial news and public posts in social media contains contents not just about financial fraud. In other words, few contents cover the information about fraud, and few of them discuss financial fraud directly. According to Jiang, et al. [37] and Chen, et al. [38], a target-dependent sentiment analysis framework is introduced to exactly classify the sentiment of financial news and public mood as positive, negative, or neutral to the financial fraud behavior of the firm. Three steps are involved in the target-dependent sentiment analysis process. Firstly, subdivide the topic financial statement fraud into several related topics, such as financial misstatement, misconduct, and irregularity. Then many extended targets are identified for each of the related topics. The third step is extracting target-dependent features of each topic. At last, final feature of financial news or public mood is formed by the integration of all features of each topic. 4. Data collection and data analysis 4.1 Sample selection Before collecting data, we first identify fraud and non-fraud firms. We utilize Accounting and Auditing Enforcement Releases (AAERs), which are federal materials issued by SEC (Securities and Exchange Commission) for alleging accounting and/or auditing misconduct in US-listed companies, to find firms involving fraudulent financial statements. We carry out the due diligence to check all the AAERs in last ten years, and identify 123 fraud firms. The first fiscal year that violates the lawsuits is selected for collecting related information in social media. Another 123 non-fraud firms are matched accordingly on the basis of year, size, and industry directly from COMPUSTAT. 4.2 Data collection For each firm, we collect three types of information in different social media platform within three months after the release of financial statement in the first fiscal year. Management s reports and announcements, and financial news are found in three social media platforms, i.e., Yahoo!Finance, Seekingalpha, and Twitter. The interaction content within the

network of firms and stakeholders are extracted in Twitter. Public posts are extracted from Seekingalpha and Twitter. 4.3 Data analysis After the data collection, several text mining and social network analysis techniques are used to extract corresponding features. Then five types of supervised learning classifiers, i.e., Logistic regression, Decision trees (C4.5), Neural networks, Bayesian belief network, Support vector machine, are employed to train and test the fraud and non-fraud samples using a fivefold cross validation. Performance metrics, such as accuracy, precision, recall, false positive and false negative, are introduced to evaluate the results. 5. Conclusions In this paper, we propose a systematic analytical framework for using social media content as a new source to detect firm s financial fraud behavior. Its potential ability to detect financial fraud and advantages over traditional methods will be tested in my following research. Obviously, social media based financial fraud detection techniques would not substitute former methods, i.e. using financial statements as the source to detect fraud. This research just behaves as a new way to discover the truth. References [1] Bratton, Tul. L. Rev., v76, p1275, 2001. [2] Wang, et al., in SMC, 2006. [3] Treadway Commission, National Commission on Fraudulent Financial Reporting, New York, 1987. [4] Cecchini, et al., Management Science, v56, p1146-1160, 2010. [5] Ravisankar, et al., Decision Support Systems, v50, p491-500, 2011. [6] Alfonso and Suzanne, Journal of Contingencies & Crisis Management, v16, p143-153, 2008. [7] Albrecht and Romney, in Proceedings of the 1980 Touche Ross, 1980, p101-119. [8] Beneish, Financial Analysts Journal, pp. 24-36, 1999. [9] Green and Choi, Auditing: A Journal Of Practice & Theory, v16, p14-28, 1997. [10] Kirkos, et al., Expert Systems with Applications, v32, p995-1003, 2007. [11] Persons, Journal of Applied Business Research, v11, p38-46, 2011. [12] Spathis, Managerial Auditing Journal, v17, p179-191, 2002. [13] Kaminski, et al., Managerial Auditing Journal, v19, p15-28, 2004. [14] Fanning and Cogger, International Journal of Intelligent Systems in Acc., Fin. & Mgt., v7, p21-41, 1998. [15] Summers and Sweeney, Accounting Review, p131-146, 1998. [16] Gaganis, Intelligent Systems in Accounting, Finance & Management, v16, p207-229, 2009. [17] Abbasi, et al., Mis Quarterly, v36, p1293-1327, 2012. [18] Cecchini, et al., Decision Support Systems, v50, p164-175, 2010. [19] Humpherys, et al., Decision Support Systems, v50, p585-594, 2011. [20] Glancy and Yadav, Decision Support Systems, v50, p595-601, 2011. [21] Liou, Managerial Auditing Journal, v23, p650-662, 2008. [22] Kotsiantis, et al., International Journal of Computational Intelligence, v3, p104-110, 2006. [23] Perols, Auditing, v30, p19-50, 2011. [24] Bell and Carcello, Auditing: A Journal of Practice & Theory, v19, p169-184, 2000. [25] Dechow, et al., Contemporary Accounting Research, v28, p17-82, 2011. [26] Yuan, et al., International Journal of Management, v25, p322-335, 2008. [27] Deshmukh, et al., Managerial Finance, v23, p35-48, 1997. [28] Deshmukh and Talluru, International Journal of Intelligent Systems in Acc., Fin. & Mgt, v7, p223-241, 1998. [29] Pasiouras, et al., European Journal of Operational Research, 180, p1317-1330, 2007. [30] Spathis, et al., European Accounting Review, v11, p509-535, 2002. [31] Lin, et al., Managerial Auditing Journal, v18, p657-665, 2003. [32] Bai, et al., International Journal of Information Technology & Decision Making, v7, p339-359, 2008. [33] Larcker and Zakolyukina, Journal of Accounting Research, v50, p495-540, 2012. [34] Zhou, et al., Knowledge and Data Engineering, IEEE Transactions on, v20, p1077-1081, 2008. [35] Pak and Zhou, Decision Support Systems, 2013. [36] Chiu, et al., International Journal of Electronic Commerce, v15, p123-147, 2011. [37] Jiang, et al., in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, v1, p151-160, 2011. [38] Chen, et al., in ICWSM, 2012.