Combating Web Fraud with Predictive Analytics Dave Moore Novetta Solutions dmoore@novetta.com
Novetta Solutions Formerly, International Biometric Group (IBG) Consulting DoD, DHS, DRDC IR&D Identity Cyber
Fundamental problem Machines are the proxies of personal identity. Attributing machine activity to a person is difficult, even when the session is authenticated. Contrast this to the pre-internet society, where presence established trust.
Fundamental problem Old question Are you who you claim to be? New question Are you what you claim to be? Both questions are equally relevant in our generation of ubiquitous computing.
Machine-enabled anonymity Account takeover Click & impression fraud Content scraping Espionage Fake account registration Identity theft Spam Vandalism Vulnerability scanning Vulnerability exploitation
Machine-enabled anonymity Edward Snowden acquired ~1.7MM NSA files using a Web crawler. Bradley Manning used a simple Web client to acquire files. Sanger, David E. and Eric Schmitt, Snowden Used Low-Cost Tool to Best N.S.A., The New York Times, 8 Feb 2014, <http://www.nytimes.com/2014/02/09/us/snowden-used-low-cost-tool-to-best-nsa.html?_r=1>. Fisher, Max, The free Web program that got Bradley Manning convicted of computer fraud, The Washington Post, 30 Jul 2013, <http://www.washingtonpost.com/blogs/worldviews/wp/2013/07/30/the-free-web-program-that-got-bradleymanning-convicted-of-computer-fraud/>.
How can we distinguish humans from bots? Bot traps Challenge-response IP address reputation Device fingerprinting
How can we distinguish humans from bots? Bot traps Challenge-response IP address reputation Device fingerprinting Limited, ineffective, and burdensome
What is it, really? PA is the application of software and statistical modeling to determine the outcome of an unknown, future event based on prior knowledge. Why is it a buzzword? PA describes any software that uses statistical models to make decisions. Most applications of Machine Learning (ML) do this. Everyone is now predictive. PA and Authentication are identical in our use case, where the future event in question is the likelihood that a user agent will commit fraud.
What s a user agent? A user agent is an application that requests content from the Web on behalf of a person. Web browsers Internet Explorer, Firefox, Chrome, Safari, Search engine crawlers GoogleBot, BingBot, YandexBot, Slurp, Everyone else
User agents make assertions of identity. Firefox 27.0, Windows 7 User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 Host www.google.com DNT 0 Connection Accept-Language Accept-Encoding Accept keep-alive en-us,en;q=0.5 gzip, deflate text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User agents make assertions of identity. This is true for all major desktop and mobile Web browsers, as well as search engine crawlers.
User agents make assertions of identity. User agents can claim to be anything. Spoofing is trivial. Rightfully, Web security experts often advise not to take those assertions at face value.
User agents make assertions of identity. Novetta computer scientists have discovered it is entirely possible to harness those assertions to detect bots and combat Web fraud.
Basic concept Gather statistics on the behaviors of user agents. Train an ML classifier (e.g. neural network) to learn the behaviors of known user agents. Deploy the classifier to detect false assertions of identity on the premises of a Web application.
Feature selection Device features Human features Packet headers Keystroke dynamics Capability test results Mouse dynamics Geolinguistic validation Touch and swipe dynamics IP address validation Request time deltas
How it performs ~0.15% equal error rate (EER) when the claim is a desktop or mobile Web browser. Higher error rates for lesser known user agents. This rarely matters in practice.
How it performs Fast, efficient We can confidently determine the likelihood of spoofing in the first request of a session. Robust Not dependent on JavaScript, which users can disable.
Policies for effective implementation Allow Standard desktop and mobile Web browsers verified by the proposed system. Standard search engine crawlers verified by hostname lookups. Custom exceptions. Deny Everyone else.
Applications Implementations Breach prevention Fraud prevention Scraping prevention Spam prevention Threat intelligence Web (HTTP) Email (SMTP) VoIP (SIP)
Takeaways Personal identity and user agent identity are equally important in establishing trust on the Internet. User agent assertions are verifiable, especially for the everyday Web browsers. User agent verification enhances privacy by establishing trust for anonymous sessions.
Questions? dmoore@novetta.com