Privacy: Legal Aspects of Big Data and Information Security



Similar documents
COPPA. How COPPA & Parental Intelligence Systems Help Parents Protect Their Kids Online. The Children s Online Privacy Protection Act

DEMOCRATIZING BIG DATA: THE ETHICAL CHALLENGES OF SOCIAL MINING. Dino PEDRESCHI (KDDLab, Dipartimento di Informatica, Università di Pisa)

Connected car, big data, big brother?

Exploring Big Data in Social Networks

Three proven methods to achieve a higher ROI from data mining

Degrees of De-identification of Clinical Research Data

Healthcare Measurement Analysis Using Data mining Techniques

March 31, Re: Government Big Data (FR Doc ) Dear Ms. Wong:

The Promise of Industrial Big Data

CONNECTING DATA WITH BUSINESS

A Pragmatic Guide to Big Data & Meaningful Privacy. kpmg.be

Comments of the World Privacy Forum To: Office of Science and Technology Policy Re: Big Data Request for Information. Via to

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Capturing Meaningful Competitive Intelligence from the Social Media Movement

BPM for Structural Integrity Management in Oil and Gas Industry

Big Data / Privacy: Pick One?

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

Big Data / FDAAWARE. Rafi Maslaton President, cresults the maker of Smart-QC/QA/QD & FDAAWARE 30-SEP-2015

Utilizing big data to bring about innovative offerings and new revenue streams DATA-DERIVED GROWTH

Privacy Statement. Privacy Practices and Feedback

Using Data Mining to Detect Insurance Fraud

ETHICAL ELECTRIC PRIVACY POLICY. Last Revised: December 15, 2015

SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014

SHORT FORM NOTICE CODE OF CONDUCT TO PROMOTE TRANSPARENCY IN MOBILE APP PRACTICES. I. Preamble: Principles Underlying the Code of Conduct

Google Places Optimization (FAQ)

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

The Complete Guide to DEVELOPING CUSTOM SOFTWARE FOR ANY BUSINESS CHALLENGE

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Zubi Advertising Privacy Policy

Yale University Open Data Access (YODA) Project Procedures to Guide External Investigator Access to Clinical Trial Data Last Updated August 2015

Workshop Discussion Notes: Housing

Big Data, Not Big Brother: Best Practices for Data Analytics Peter Leonard Gilbert + Tobin Lawyers

ABC PRIVACY POLICY. The ABC is strongly committed to protecting your privacy when you interact with us, our content, products and services.

How To Respond To The Nti'S Request For Comment On Big Data And Privacy

CONSUMERLAB. sharing information. The rise of consumer influence

Observations on international efforts to develop frameworks to enhance privacy while realising big data s benefits

Online Reputation in a Connected World

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Assessing Your Business Analytics Initiatives

Taking A Proactive Approach To Loyalty & Retention

Analyzing Big Data: The Path to Competitive Advantage

Big Data better business benefits

Hadoop for Enterprises:

FitCause Privacy Policy

Anatomy of a Decision

Big Data Analytics- Innovations at the Edge

Collaborations between Official Statistics and Academia in the Era of Big Data

Android Developer Applications

Amplify Serviceability and Productivity by integrating machine /sensor data with Data Science

HMG Corporate Development Team.

ConteGoView, Inc. Privacy Policy Last Updated on July 28, 2015

Formal Methods for Preserving Privacy for Big Data Extraction Software

VMware vcenter Log Insight Delivers Immediate Value to IT Operations. The Value of VMware vcenter Log Insight : The Customer Perspective

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

AdvancedMD Online Privacy Statement

Synapse Privacy Policy

!!!!! White Paper. Understanding The Role of Data Governance To Support A Self-Service Environment. Sponsored by

Tools for Managing and Measuring the Value of Big Data Projects

The Future of Customer Experience

PRIVACY POLICY Personal information and sensitive information Information we request from you

Managing Special Authorities. for PCI Compliance. on the. System i

KnowledgeSEEKER Marketing Edition

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Voice of the Customer: How to Move Beyond Listening to Action Merging Text Analytics with Data Mining and Predictive Analytics

Big Data in Transportation Engineering

Re: Big Data: A Tool for Inclusion or Exclusion? Workshop Project No. P145406

DATA-ENHANCED CUSTOMER EXPERIENCE

Healthcare data analytics. Da-Wei Wang Institute of Information Science

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Privacy Policy. If you have questions or complaints regarding our Privacy Policy or practices, please see Contact Us. Introduction

Data Mining. Toon Calders TU Eindhoven

Outline. What is Big data and where they come from? How we deal with Big data?

Moving the NPS Needle - How to Use Customer Feedback to Drive Improvement

Privacy Policy. Effective Date: November 20, 2014

The Internet of Things (IoT) Opportunities and Risks

Loss Prevention Data Mining Using big data, predictive and prescriptive analytics to enpower loss prevention

COMCAST.COM - PRIVACY STATEMENT

Privacy Policy. Effective Date: September 3, 2015

Transcription:

Privacy: Legal Aspects of Big Data and Information Security Presentation at the 2 nd National Open Access Workshop 21-22 October, 2013 Izmir, Turkey John N. Gathegi University of South Florida, Tampa, FL Visiting Professor, Hacettepe University, Ankara, TURKEY

Characteristics of big data and data mining: --Big data refers to massive amounts of seemingly unrelated data collected from a variety of sources that are agregated in massive data depository systems.

--usually data sets too big for common database software to manage or process.

--3 defining features (Rubinstein, 2013): 1. availability of massive data continuously collected in multiple ways including: -online -mobile devices -location tracking -data sharing apps

-smart environment interactions and monitoring (e.g. Internet of Things) --big data increasingly will be derived from The Internet of Things. -web 2.0 user generated data, including personal information sharing

2. use of high-speed, high-transfer rate computers with massive storage capability utilizing the cloud computing model 3. use of new computational frameworks... for storing and analyzing this huge volume of data. --summary: more data, faster computers, new analytic techniques

Data mining: extraction of information from massive amounts of data that lead to unexpected new knowledge associations, patterns, and meanings that were previously buried in the data. Have to use massively complex data mining algorithms and statistical methods to analyze the data.

--Think of Google: --email data (gmail); search data; personal information, web navigation data, geographic location data, voice communication data, video communication data, image management and processing data, translation data

--major benefits to industry and society in the area of innovations and service delivery (e.g. medical research, traffic management), but also some downsides, especially in the area of privacy.

--Think of Facebook: nearly a billion users uploading personal information Rubinstein (2013) notes several intertwined trends that are presenting great challenges to privacy: the popularity of social networking sites that permit individuals to voluntarily share personal data the growth of cloud computing the ubiquity of mobile devices and of physical sensors that transmit geo-location information and the growing use of data mining technologies enabling the aggregation and analysis of data from multiple sources --Add to this Open Access and you have a problem!

According to Nicholas Terry (2012): Data aggregation and customer profiling are hardly news. The developments that mark out big data are the scale of the data collection and the increasing sophistication of predictive analytics.

--problems: data mining; profiling (cookies are not the primary concern anymore) -finding hidden correlations, enabling interesting predictions --right to be forgotten (addressed somewhat in Europe but almost ignored in the US) --subverted by the ability to re-identify data subjects using non-personal data. Blurring the line between personal and non-personal data Data aggregation to provide anonymity loses its meaning

Consider this --purchase by Walmart in 2012 of Social Calendar (a Facebook application). Already had ShopyCat, a facebook app of its own that is a giftrecommendation service. Why purchase and not build its own?

Points to --weakest link: over-reliance on informed consent (most people do not read, or understand disclosures, and have no idea bout the subsequent use, or even custody, of their personal information)

Other BD problems --Also allows automated decision-making about individuals, e.g., creditworthiness, insurance eligibility, etc. --process opaque and affords little chance for individual feedback or correction of the underlying data --BD users unable to provide adequate notice of purpose and use of data to individuals, since they cannot tell in advance what they will find --Users cannot effectively consent to the use of their information because they cannot monitor the correlations made possible by the data mining

--dangers of predictive analysis -Target analysis producing a pregnancy prediction score based on women customers purchase patterns. (identification of pregnancy and prediction of due date) e.g., daughter sent baby ads, upsetting father - Pre-crime police departments (as in the movie Minority Report) apprehending criminals based on prediction of their future deeds (thought police?) -redlining certain neighborhoods (for insurance purposes,, social services, etc).

--Tene and Polonetsky (2013) make the very salient points that: In a big data World, what calls for scrutiny is often not the accuracy of the raw data but rather the accuracy of the inferences drawn from the data. Inaccurate, manipulative or discriminatory conclusions may be drawn from perfectly innocuous, accurate data.

--de-identification is often reversible --privacy v. Societal benefit e.g., Tene and Polonetsky (2013) pose the following question: what if the analysis of de-identified online search engine logs enables: identification of a life-threatening epidemic in x% of cases saving y lives assuming a z% chance of re-identification for a certain subset of search engine users should such an analysis be permitted?

No surprise that it is in the health area that privacy has received the most sympathy and attention. But even here, the US, for example, has depended on HIPAA, which is supposed to protect against disclosure of patient data However, as Terry (2012) points out, HIPAA protects against disclosure, not against collection! He notes that a lot of traditional health information circulates in a mainly HIPAA-free zone

--Harvard Researchers who collected data on Facebook users to study changes in their interests and friendships over time. Released data for research to the World because supposed to be anonymous. Other researchers quickly found that they could deanonymize parts of the dataset

On the other hand Stanford researchers who discovered the effect of taking an antidepressant drug together with a cholesterol-reducing drug on the increase of patients blood glucose to diabetic levels (through analyzing data in adverse effect reporting data sets and creating a symptomatic footprint for diabetes-inducing drugs. Then searched this footprint in interactions between pairs of drugs. Four pairs with this effect were found. Among them Paxil and Pravachol. Next they examined Bing search engine logs to see if there was more likelihood of people who searched for both drugs to also report the symptoms, as opposed to those who searched only for the one drug. Found support in the data and potentially saved the lives of 1 million Americans.

Industry not the only BD driver --In 2012 President Obama deployed a Big Data R&D initiative to advance the science and technology of managing, analyzing, visualizing and extracting information from large, diverse, distributed, and heterogeneous data sets. Terry (2012) also notes that in the future BD will come from less structured sources including "[w]eb-browsing data trails, social network communications, sensor data and surveillance data. Much of it is "exhaust data," or data created unintentionally as a byproduct of social networks, web searches, smartphones, and other online behaviors.

This means that with industry, social behavior, and government behind it, BD is only going to grow larger and the privacy problems associated with it are going to grow not in tandem, but exponentially

Ethics Look beyond the law; ethics of BD research availability makes it ethical? research ethics boards have insufficient understanding of the process of anonymizing and mining data, or the errors that can lead to data becoming personally identifiable effects may not be realized until many years into the future data contributors (e.g. social networkers) usually do not have researchers as their audience many have no idea of the processes currently gathering and using their data difference between being in public and being public

--even in the area of litigation, electronic discovery can uncover both criminal acts and non-criminal embarrassing acts

Conclusions BD is here to stay Increasingly happening in the cloud, and with open access Erasing the notion of public/private space distinction

Hierarchy in the BD World 3 classes of people in Big Data World (Manovich, 2011): (1) those that create data (consciously or by leaving digital footprints) (2) those who have the means to collect it (3) those who have the expertise to analyze it (smallest group, and most privileged) -A pyramid?

Tene and Polonetsky (2013) note that presently the benefits of big data do not accrue to individuals whose data is harvested, only to big businesses that use such data: -- those who aggregate and mine this data neither view their informational assets as public goods held on trust nor seem particularly interested in protecting the privacy of their data subjects. The truth lies in the opposite because the big data business model is selling information about their data subjects. To make it less of a pyramid, they advocate the empowerment of individuals in controlling their information by giving them meaningful rights to Access their data in usable, machine-readable format. advantages: unleash innovation for user-side applications and services, give an incentive to users to participate in the data economy ( by aligning their own self-interest with broader societal goals )

To make it less of a pyramid, they advocate the empowerment of individuals in controlling their information by giving them meaningful rights to Access their data in usable, machine-readable format. advantages: unleash innovation for user-side applications and services, give an incentive to users to participate in the data economy ( by aligning their own self-interest with broader societal goals )

What you think about this proposal will have to be a debate we are willing to undertake, today or another day! Thank you! jgathegi@usf.edu