Opening Remarks. Chairwoman Edith Ramirez Federal Trade Commission

Similar documents

Big Data: A Tool for Inclusion or Exclusion?

The Networked Nature of Algorithmic Discrimination

Data, Lending, and Civil Rights April 8, 2015 MORNING WORKING SESSION

CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING

Response to Critiques of Mortgage Discrimination and FHA Loan Performance

Workshop Discussion Notes: Housing

A Data Warehouse Case Study

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

Remarks by Dr. N. Gregory Mankiw Chairman Council of Economic Advisers at the National Bureau of Economic Research Tax Policy and the Economy Meeting

Big Data / Privacy: Pick One?

Inviting Data Hackers to Lunch Without Knowing it

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Comparing Loan Performance Between Races as a Test for Discrimination

DATA MINING AND WAREHOUSING CONCEPTS

Social Media Mining. Data Mining Essentials

Credit Scoring and Its Role In Lending A Guide for the Mortgage Professional

Analytics Live! Draft agenda as of November 20, :00 AM 10:45 AM BEST IN SHOW: SETTING YOUR ORGANIZATION UP FOR SUCCESS USING ANALYTICS

APPLICATIONS OF PERFORMANCE SCORING TO ACCOUNTS RECEIVABLE MANAGEMENT IN CONSUMER CREDIT John Y. Coffman * Gary G. Chandler

Improve your healthcare research skills via flexible online learning

Implementation Date Fall Marketing Pathways

Introduction. Background Knowledge. The Task. The Support the Work Setting Should Provide

Research of Postal Data mining system based on big data

Mining Pharmacy Data Helps to Make Profits

FTC Consumer Protection Priorities In 2015

Big Data: A Critical Analysis!!

The Probability of Predicting E-Customer s Buying Pattern Based on Personality Type

not possible or was possible at a high cost for collecting the data.

Civil Rights, Big Data, and Our Algorithmic Future

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Data Mining for Fun and Profit

Decontextualizing + Assumptions = Fallacies: And It s Worse for Big Data

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

Health Information Technology in the United States: Information Base for Progress. Executive Summary

UNITED STATES OF AMERICA BEFORE THE FEDERAL TRADE COMMISSION. Julie Brill Maureen K. Ohlhausen Joshua D. Wright Terrell McSweeny

Healthcare Measurement Analysis Using Data mining Techniques

How Cybersecurity Initiatives May Impact Operators. Ross A. Buntrock, Partner

University Uses Business Intelligence Software to Boost Gene Research

JURY INSTRUCTIONS. 2.4 Willful Maintenance of Monopoly Power

Personalization of Web Search With Protected Privacy

Executive Summary. Summary - 1

CONSUMER PERCEPTIONS OF CREDIT BUREAUS. William C. Dunkelberg Robert W. Johnson Robin DeMagistris

Resources for Writing Program Learning Outcomes

Comparing Alternate Designs For A Multi-Domain Cluster Sample

Reasoning Component Architecture

August Introduction

Planning for Princeton s Future

Data Mining Solutions for the Business Environment

General Philosophy. Dr Peter Millican, Hertford College. Lecture 3: Induction

A Pragmatic Guide to Big Data & Meaningful Privacy. kpmg.be

Before the Federal Trade Commission Washington, DC SUPPLEMENTAL COMMENTS OF COMPUTER & COMMUNICATIONS INDUSTRY ASSOCIATION

How To Understand Predictive Analysis And Data Mining

730 Yale Avenue Swarthmore, PA

Introduction to Data Mining

The big data dilemma an inquiry by the House of Commons Select Committee on Science and Technology

Prepared Remarks by Richard Cordray Director of the Consumer Financial Protection Bureau. Credit Reporting Field Hearing

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Mobile phone data for Mobility statistics

analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics

Predicting outcome of soccer matches using machine learning

Influenced by - Alfred Binet intelligence testing movement

Hacking the EULA: Reverse Benchmarking Web Application Security Scanners

PATENT LITIGATION IN MEXICO: OVERVIEW AND STRATEGY

How To Get Your Online Information To Be Transparent Online

Comments of the World Privacy Forum To: Office of Science and Technology Policy Re: Big Data Request for Information. Via to

Removing Web Spam Links from Search Engine Results

Insights: Financial Capability. The Financial Welfare of Military Households. Background. November 2014 Author: What s Inside:

Lead Scoring. Five steps to getting started. wowanalytics. 730 Yale Avenue Swarthmore, PA

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

The skill content of occupations across low and middle income countries: evidence from harmonized data

Accelerating Audits with Automation: Who s Accessing Your Unstructured Data?

Daniel Smith/Corbis 32

Knowledge Management & Process Performance: Implications for Action

Legal view of digital evidence

Optimizing Data Validation!

Grants Management Systems: Primer for Best Practices

DATA MINING TECHNIQUES AND APPLICATIONS

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

Healthcare, transportation,

The Impact of Social Connections on Credit Scoring

EMERGING FRONTIERS AND FUTURE DIRECTIONS FOR PREDICTIVE ANALYTICS, VERSION 4.0

Introduction. A. Bellaachia Page: 1

Security challenges for internet technologies on mobile devices

Online Profiling Project - Comment, P / Docket No

THE IMPACT OF CHILDHOOD HEALTH AND COGNITION ON PORTFOLIO CHOICE

The Truth About Credit Repair

Business Analytics: Where Do We Go From Here?

Machine Learning Meets Marketing

The Effects of Comprehensive Parent Engagement on Student Learning Outcomes

What is Evidence, and What It Takes to Prove Discrimination

Ethics in the Qualifying Law Degree

Data Analytics in Organisations and Business

DATE: November 20, 2013 FILE: FTC Workshop: The Internet of Things -- Privacy and Security in a Connected World

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

The Conference on Computers, Freedom, and Privacy. 'Overseeing' the Poor: Technology Privacy Invasions of Vulnerable Groups

When E-Discovery Becomes Evidence

Lending to small business

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Big Data for Development: What May Determine Success or failure?

Information for Worker s Compensation Clients

Text Opinion Mining to Analyze News for Stock Market Prediction

Transcription:

Welcome

Opening Remarks Chairwoman Edith Ramirez Federal Trade Commission

Presentation: Framing the Conversation Solon Barocas Princeton University Center for Information Technology Policy

Big Data: A Tool for Inclusion or Exclusion? Framing the Conversation Solon Barocas Princeton University

Big Data Volume, Velocity, Variety Observational Transactional data Self-Reported and User-Generated Social media Experimental A/B testing

Data Mining (Machine Learning)

Data Mining Automate the process of discovering useful patterns regularities upon which subsequent decision-making can rely Learning The accumulated set of discovered relationships in the dataset is commonly called a model, and these models can be employed to automate the process of: classifying entities or activities of interest estimating the value of unobserved variables or predicting future outcomes

Data Mining as Discrimination By definition, data mining is always a form of statistical (and therefore seemingly rational) discrimination The very point of data mining is: to provide a rational basis upon which to distinguish between individuals to reliably confer to the individual the qualities possessed by those who seem statistically similar

How Data Mining Can Discriminate 1. Defining the target variable 2. Collecting and labeling the training data 3. Feature selection 4. Proxies 5. Masking

1. Target Variables Determine how to solve the problem at hand by translating it into a question about the value of some target variable Treats the target variable as a function of some other set of observed characteristics A, B, C X D, E, F Y

The Art of Data Mining The proper specification of the target variable is frequently not obvious, and it is the data miner s task to define it The definition of the target variable and its associated class labels will determine what data mining happens to find And it is possible to parse the problem and define the target variable in such a way that protected classes happen to be subject to systematically less favorable determinations

2. Training Data Data mining is really a way to learn by example The data that function as examples are known as training data quite literally the data that train the model to behave in a certain way Two types of problems can arise Data collection: skewed set of examples Labeling of examples: setting a bad example

Data Collection Data mining is especially sensitive to statistical bias because it aims to extract general rules from a particular set of examples These rules will only hold for future cases if the cases in the training data resemble those to which they are applied But data gathered for routine business purposes tend to lack the rigor of social scientific data collection Firms tend to perform such analyses in order to change the composition of their customer base

Data Collection: Uncounted, Unaccounted, Discounted The quality and representativeness of records might vary in ways that correlate with class membership less involved in the formal economy and its datagenerating activities unequal access to and less fluency in the technology necessary to engage online less profitable customers or less important constituents and therefore less interesting as targets of observation The under- and over-representation of members of protected classes is not always evident

Data Collection: Limiting Future Contact Worse, skewed results may lead to decision procedures that limit the future contact companies have with specific groups, skewing still further the sample upon which subsequent analyses will be performed They would deny members of these populations the opportunity to prove that they buck the apparent trend

Labeling Examples Sometimes a rather straightforward affair e.g., spam/not spam Sometimes a laborious process that is fraught with peril e.g., good/bad customer

Labeling Examples: Reproduce Past Prejudice So long as prior decisions affected by some form of prejudice serve as examples of correctly rendered determinations, data mining will necessarily infer rules that exhibit the same prejudice e.g., Fair Isaac s comments in the early debates about credit scoring Data mining can turn the conscious prejudice or implicit bias of individuals involved in previous decision-making into a formalized rule that would systematically discount all applicants in this way For all their potential problems, the labels applied to the training data must serve as ground truth.

Labeling Examples: Reflect Current Prejudice Not only can data mining inherit prior prejudice through the mislabeling of examples, it can also reflect current prejudice through the ongoing behavior of users taken as inputs to data mining When relying on data mining to cater to the demonstrated preferences of users, companies may unintentionally adopt the prejudices that guide users behavior

3. Feature Selection The process of settling on the specific string of input variables Protected classes may find that they suffer a disproportionate cost of errors Coarseness and comprehensiveness of the criteria that permit statistical discrimination and the uneven rates at which different groups happen to be subject to erroneous determinations Artifacts of statistical reasoning rather than decisionmaker prejudice or bias in the dataset

At What Cost? Obtaining information that is sufficiently rich to draw precise distinctions can be expensive. Even marginal improvements in accuracy may come at significant practical costs, and may justify a less exacting and encompassing analysis. Does the relatively higher costs involved in gaining more data about marginalized groups justify subjecting them to higher error rates? Should these groups bear the disproportionate burden of erroneous determinations, even if this means that the majority enjoys greater accuracy in decision-making?

4. Proxies The very same criteria that correctly sort individuals according to their predicted profitability, for example, may also sort individuals according to class membership Decision-makers reasonable priorities as profitseekers unintentionally recapitulate the inequality that happens to exist in society These discoveries reveal the simple fact of inequality, but they also reveal the fact that these are inequalities in which members of protected classes are frequently the groups in the relatively less favorable position Better data will simply expose the exact extent of inequality

Redundant Encodings In many instances, making accurate determinations will mean considering factors that are somehow correlated with proscribed features There is no obvious way to determine how correlated a relevant attribute must be with class membership to be worrisome, nor is there a selfevident way to determine when an attribute is sufficiently relevant to justify its consideration, despite the fact that it is highly correlated with class membership.

5. Masking Data mining could also breathe new life into traditional forms of intentional discrimination because decision-makers with prejudicial views can mask their intentions by exploiting each of the mechanisms enumerated above Also possible to infer class membership Unintentional discrimination is likely to be far more common than the kinds of discrimination that could be pursued intentionally

Conclusion Unintentionality Exacerbate Inequality No Ready Answer

Solon Barocas sbarocas@princeton.edu solon.barocas.org @s010n Solon Barocas and Andrew Selbst, Big Data s Disparate Impact http://ssrn.com/abstract=2477899

Panel 1: Assessing the Current Environment Kristin Amerling, U.S. Senate Committee on Commerce, Science and Transportation danah boyd, Microsoft Research & New York University Mallory Duncan, National Retail Federation Gene Gsell, SAS David Robinson, Robinson + Yu Joseph Turow, University of Pennsylvania

Break

Panel 2: What s On the Horizon With Big Data? Alessandro Acquisti, Carnegie Mellon University Pamela Dixon, World Privacy Forum Cynthia Dwork, Microsoft Research Mark MacCarthy, Software Information Industry Association Stuart Pratt, Consumer Data Industry Association Nicol Turner-Lee, Minority Media and Telecommunications Council

Lunch

Remarks Commissioner Julie Brill Federal Trade Commission

Presentation: Digging Into the Data LaTanya Sweeney & Jinyan Zang Federal Trade Commission

Panel 3: Surveying the Legal Landscape Leonard Chanin, Morrison Foerster Carol Miaskoff, Equal Employment Opportunity Commission Montserrat Miller, Arnall Golden Gregory C. Lee Peeler, Council of Better Business Bureaus Peter Swire, Georgia Institute of Technology

Break

Panel 4: Considerations on the Path Forward Christopher Calabrese, American Civil Liberties Union Daniel Castro, Information Technology and Innovation Foundation Jeanette Fitzgerald, Epsilon Jeremy Gillula, Electronic Frontier Foundation Michael Spadea, Promontory Financial Group Christopher Wolf, Future of Privacy Forum

Closing Remarks Jessica Rich Bureau of Consumer Protection Federal Trade Commission