Balancing Access to Information While Preserving Privacy, Security and Governance in the Era of Big Data

Similar documents
WHITE PAPER. Balancing Access to Information While Preserving Privacy, Security and Governance in the Era of Big Data.

SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

SOLUTION BRIEF. SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

Beyond the Data Lake

A Database Security Management White Paper: Securing the Information Business Relies On. November 2004

Paxata Security Overview

Symantec Enterprise Vault.cloud Overview

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution

Part A OVERVIEW Introduction Applicability Legal Provision...2. Part B SOUND DATA MANAGEMENT AND MIS PRACTICES...

White Paper Big Data Without Big Headaches

Integrated archiving: streamlining compliance and discovery through content and business process management

Big Data-Challenges and Opportunities

Synapse Privacy Policy

BANKING ON CUSTOMER BEHAVIOR

Key Steps to Meeting PCI DSS 2.0 Requirements Using Sensitive Data Discovery and Masking

MANAGED FILE TRANSFER: 10 STEPS TO HIPAA/HITECH COMPLIANCE

Teradata and Protegrity High-Value Protection for High-Value Data

PALANTIR HEALTH. Maximizing data assets to improve quality, risk, and compliance. 100 Hamilton Ave, Suite 300 Palo Alto, California 94301

Real World Strategies for Migrating and Decommissioning Legacy Applications

Data Refinery with Big Data Aspects

IBM Data Security Services for endpoint data protection endpoint data loss prevention solution

Using AWS in the context of Australian Privacy Considerations October 2015

Realizing business flexibility through integrated SOA policy management.

Contact Center Security: Moving to the Cloud

FIVE KEY CONSIDERATIONS FOR ENABLING PRIVACY IN HEALTH INFORMATION EXCHANGES

Top Ten Security and Privacy Challenges for Big Data and Smartgrids. Arnab Roy Fujitsu Laboratories of America

Have it all Protecting privacy in the age of analytics

Secure Data Transmission Solutions for the Management and Control of Big Data

I n t e r S y S t e m S W h I t e P a P e r F O R H E A L T H C A R E IT E X E C U T I V E S. In accountable care

W H I T E P A P E R E X E C U T I V E S U M M AR Y S I T U AT I O N O V E R V I E W. Sponsored by: EMC Corporation. Laura DuBois May 2010

Beyond the Single View with IBM InfoSphere

CrossPoint for Managed Collaboration and Data Quality Analytics

TOP 5 REASONS WHY FINANCIAL SERVICES FIRMS SHOULD CONSIDER SDN NOW

Washington State s Use of the IBM Data Governance Unified Process Best Practices

Cyberprivacy and Cybersecurity for Health Data

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

IBM Unstructured Data Identification and Management

Spend Enrichment: Making better decisions starts with accurate data

MANAGED FILE TRANSFER: 10 STEPS TO SOX COMPLIANCE

Contact Center Security: Moving to the True Cloud

PCI Data Security Standards (DSS)

NIST Big Data Public Working Group

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

HIPAA: MANAGING ACCESS TO SYSTEMS STORING ephi WITH SECRET SERVER

Privacy & Big Data: Enable Big Data Analytics with Privacy by Design. Datenschutz-Vereinigung von Luxemburg Ronald Koorn DRAFT VERSION 8 March 2014

Privacy by Design The 7 Foundational Principles Implementation and Mapping of Fair Information Practices

Enterprise Data Protection

Data Security and Governance with Enterprise Enabler

DATA QUALITY MATURITY

Seven Things To Consider When Evaluating Privileged Account Security Solutions

The biggest challenges of Life Sciences companies today. Comply or Perish: Maintaining 21 CFR Part 11 Compliance

Overview. Edvantage Security

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

We are Big Data A Sonian Whitepaper

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

Websense Data Security Suite and Cyber-Ark Inter-Business Vault. The Power of Integration

How to Enhance Traditional BI Architecture to Leverage Big Data

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

Master big data to optimize the oil and gas lifecycle

Preemptive security solutions for healthcare

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

MarkLogic Enterprise Data Layer

A COMPLETE GUIDE HOW TO CHOOSE A CLOUD-TO-CLOUD BACKUP PROVIDER FOR THE ENTERPRISE

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

CONNECTING DATA WITH BUSINESS

Protecting Business Information With A SharePoint Data Governance Model. TITUS White Paper

Data Quality Assessment. Approach

SIX STEPS TO SSL CERTIFICATE LIFECYCLE MANAGEMENT

The Comprehensive Guide to PCI Security Standards Compliance

Plain English Guide To Common Criteria Requirements In The. Field Device Protection Profile Version 0.75

Automating Healthcare Claim Processing

Next Generation Business Performance Management Solution

Six Steps to SSL Certificate Lifecycle Management

Making Database Security an IT Security Priority

NetApp Big Content Solutions: Agile Infrastructure for Big Data

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

Informatica Solutions for Healthcare Providers. Unlock the Potential of Data Driven Healthcare

Symantec Enterprise Vault and Symantec Enterprise Vault.cloud

Informatica Application Information Lifecycle Management

Business Intelligence & Data Warehouse Consulting

Your Data, Any Place, Any Time.

Privacy by Design Setting a new standard for privacy certification

Effecting Data Quality Improvement through Data Virtualization

IBM Software A Journey to Adaptive MDM

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Safeguarding the cloud with IBM Dynamic Cloud Security

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Transcription:

PHEMI Health Systems Process Automation and Big Data Warehouse http://www.phemi.com Balancing Access to Information While Preserving Privacy, Security and Governance in the Era of Big Data Executive Summary This white paper explores the critical role that privacy, security and governance play in ensuring appropriate access to information in the age of Big Data. Big Data is a technology that has been used extensively for about 10 to 15 years by Google, Netflix and major social media applications. With the ability to economically scale to handle vast datasets, Big Data lets companies easily aggregate varied and diverse data sources quickly, providing them with the ability to innovate and rapidly respond to changing business requirements. However, until now, Big Data has suffered from weak privacy, security and governance controls, essentially preventing certain organizations namely those in healthcare, the public sector and large corporate enterprises - from realizing the other competitive benefits that come from Big Data. There are now innovative technology providers who are addressing the very real challenges of privacy, security and governance in such a way that organizations can effectively mine and analyse their data for insight, while still protecting an individual s privacy rights. Privacy deals with who is authorized to access certain information. Security provides the technical methods to safeguard private information from unauthorized persons, while governance covers the stewardship and enforcement of roles, and the checks and balances to support the business needs. It is critical to adopt a new technology architecture that strikes the balance between privacy, security and governance while still ensuring appropriate and necessary access to information. With the new architecture in place, organizations will be able to roll out innovative services ranging from population health management to business intelligence to Open Data information sharing. Introduction The average cost of a healthcare data breach is estimated to exceed $3.5 million, cumulating in a potential cost to the healthcare industry alone of as much as $5.6 billion dollars annually. The impact to corporate reputation, fines, lost business and senior executive careers has been felt across industries. The average cost to a company for a data breach was $3.5 million in US dollars and 15 percent more than what it cost last year. 1 1 / 11

It is well understood that a majority of breaches can be traced back to improper employee handling of data, highlighting the importance of proper data governance and privacy controls that go beyond traditional perimeter-based firewall and security infrastructure. Privacy, security and governance each play an integral role in ensuring appropriate access to information. Too much control can dramatically limit business competitiveness and profits as information protection stifles efficiency and adds unnecessary cost. It can also mean that patients die with their privacy intact largely because doctors can t access the right information at the right time. Too little control means that personal and other private information can be compromised, costing businesses millions of dollars in fines, damaging personal and corporate reputations, with a subsequent loss of business and trust. A new approach is required one that embeds privacy, governance and security into the core of the data warehouse technology. The approach must allow the data to be mined for knowledge and insight, while simultaneously ensuring the data is used only for permitted purposes - guaranteeing that an individual s privacy remains intact. The Role of Big Data Invented over 10 years ago by Google and Yahoo, Big Data has quickly evolved to become a cornerstone technology for internet companies. Today, Big Data is commonly used by hundreds of millions of consumers daily whenever they visit Google, Netflix, Amazon, ebay or a social media website. The business case for adopting Big Data tools is well established and compelling. 60% potential increase in retailer s operating margins possible with Big Data 2 $300 billion potential annual value to US healthcare 2 Manufacturing industries - up to a 50% decrease in product development and assembly costs 2 Big Data and successful analytics credited as a key differentiator in the US 2012 presidential election campaign 3 Unlocking Data to Drive Innovation and Competitiveness Big Data 4 is widely credited for three core attributes that distinguish it from the traditional relational database approach: 1. Big Data is proven to economically scale from terabytes to hundreds of petabytes at a third the cost of a traditional relational data warehouse. 2. Big Data is able to quickly aggregate a variety of information from relational databases to unstructured documents such as Microsoft Word, PDF, text, images, or telemetry all without requiring complex schema changes. 3. Big Data gives organizations the agility to quickly innovate, develop new applications to mine the data, and rapidly respond to changing business requirements. 2 / 11

However, with its roots in the internet and social media sector, Big Data has long-suffered from weak privacy, security and governance capabilities. Consequently, public sector, healthcare and large enterprises have been relegated to observers, experimenting with and piloting this promising new technology. Integrating Big Data in your Privacy, Security and Governance Strategy The evolution of computing services has been driven by an ever increasing volume of information that is now available in the digital economy. Government and the private sector are focusing on harvesting this data to provide better, quicker and more valuable services to citizens and customers. Within the data-driven economy, challenges arise with respect to potential security and privacy breaches because of the incredible volume and variety of data types. Conversely, the value held within the data cannot be realized without appropriate access. Privacy, security and governance used to be the purview of applications to stand guard and protect the data. However, initiatives like Privacy by Design (explored in more detail later) are actively engaging Big Data and policy management in the uniform enforcement of privacy, security and governance across an organization. Privacy Privacy deals with the rights of an individual to control when, how and to what degree personal and private information is made visible for use. Managing what constitutes appropriate use of information at a fine-grained level, across a large data warehouse, can be an enormous challenge fraught with serious consequences if not structured and administered properly from the outset. This problem is particularly evident when considering Personally Identifiable Information (PII) or other information classified as private or sensitive within documents or databases information that may span multiple data sources, data owners or data sharing agreements. Although Privacy Impact Assessments may document policies, procedures and risks, a technical solution must play a role in managing and enforcing policy adherence. Privacy by Design To address these challenges, a Privacy by Design 5 framework has been defined and recognized as the global privacy standard in a landmark resolution by the International Conference of Data Protection and Privacy Commissioners in Jerusalem. Since then, the 7 Foundational Principles of PbD have been translated in over 30 official languages. Privacy by Design advances the view that the future of privacy cannot be assured solely by compliance with legislation and regulatory frameworks; rather, privacy assurance must become an organization s default mode of operation. 3 / 11

Whereas many systems view privacy and security as an afterthought, PHEMI Central was designed from the ground up to incorporate the 7 Foundational Principles of Privacy by Design. 1. PROACTIVE NOT REACTIVE; PREVENTATIVE NOT REMEDIAL PHEMI Central contains a sophisticated Governance Policy Manager that data stewards use to define what data may be stored, how it may be manipulated, and who is allowed to read the data. Permitted personal information can only be collected in the PHEMI system if enabled by the Governance Policy Manager. The data steward creates all policies for user access to the system and datasets. A policy enforcement layer in the PHEMI architecture enforces strict policy rules. 2. PRIVACY AS THE DEFAULT SETTING Data can only be stored in PHEMI Central once the data steward establishes a privacy policy. Furthermore, the system default prevents anyone, other than the data owner, to access the data. The data steward can relax these defaults to allow specific users and roles to access virtual databases within the Big Data Warehouse. The PHEMI solution was designed on the foundation that privacy policies determine what data can be accessed through the de-identification/anonymization processes defined by the Governance Policy Manager. Policies include fine-grained write-only, read-only and read/write permissions. 3. PRIVACY EMBEDDED INTO DESIGN As PHEMI Central ingests data, it converts it into a digital asset comprised of the data itself plus metadata a wrapper that embeds privacy/governance rules, policies and semantics for the digital asset being gathered and stored. A policy enforcement engine ensures that all users have the appropriate access privileges throughout the Big Data Warehouse. Users can only view digital assets if they have been granted access in a policy established by the data steward using the Governance Policy Manager. Fine-grained policies enforce de-identification of information. 4. FULL FUNCTIONALITY POSITIVE SUM, NOT ZERO SUM PHEMI believes an individual s privacy rights can co-exist with an organization s legitimate interests and objectives based on an individual s consent. PHEMI s groundup approach to privacy works symbiotically with clinician needs for complex data analysis. PHEMI s treatment of data as digital assets means that healthcare data can be enriched and analyzed for meaningful statistics and patterns, with privacy enforcement underpinning the process. No complicated overhead or tradeoff is needed for policy enforcement because users only interact with approved data that has passed through the Governance Policy Manager. 5. END-TO-END SECURITY FULL LIFECYCLE PROTECTION PHEMI brings lifecycle privacy to Big Data. All digital assets are immutable and only accessible through the PHEMI Governance Policy Manager. Our metadata tagging 4 / 11

enables automatic data retention enforcement, access controls and compliance management. For example, if a digital asset has metadata indicating that the asset must be held for ten years, then the data will be automatically deleted on the tenth anniversary. Furthermore, all assets are version controlled so the data steward can review and recover any changes. Additionally, the audit and logging system in PHEMI Central uses the same digital asset approach to store and manage log files, ensuring that logs are also immutable and only accessible to specific administrative roles. 6. VISIBILITY AND TRANSPARENCY KEEP IT OPEN PHEMI s privacy policy implementation is transparent and accessible to data stewards. De-identification and anonymization process rules can also be easily provided to individuals and organizations. PHEMI produces detailed audit logs to verify privacy policy enforcement. 7. RESPECT FOR USER PRIVACY KEEP IT USER-CENTRIC PHEMI recognizes an individual s paramount right to privacy and our technology core reflects this. With default privacy settings and a sophisticated governance and policy management approach, we ensure an individual s privacy remains intact. Yet our unique implementation also enables clinicians and researchers access to a rich dataset of healthcare information for medical innovation and research based on patient consent. Sophisticated Access Controls that Scale The traditional role-based access control model, consisting of users/roles/privileges, has been the mainstay of access control for decades. However, as the number of users and datasets grow, the variety of privacy policies governing the appropriate use of information can quickly become overwhelming and the risk of a data breach increases. A Big Data warehouse needs to be able to seamlessly migrate from simple role-based access to a more scalable and versatile attribute-based access control. A scalable access control system should be able to leverage metadata to describe the attributes of a user and the attributes of the content. These access control rules also need to cover data processing functions and MapReduce jobs. Modern access control capabilities can then simply enforce arbitrarily simple or elaborate access control rules, marrying user and content metadata 5 / 11

attributes. As an example, the Affordable Care Act and Accountable Care Organizations stretch across the continuum of care and are quickly pushing the limits of role-based access controls. For example, cardiologists who practice in the General Hospital Electrophysiology clinic are allowed to view a person s preliminary and final ECG while they are on the hospital network. Other physicians are not allowed to see the preliminary reports only the final interpretation. Researchers, however, are only allowed to see a de-identified/redacted version of the patient s interpreted ECG. And no-one is permitted to see the ECG outside of the General Hospital network. This complexity can only be managed through attribute-based access controls. Standards such as XACML provide a flexible and dynamic access control policy framework. Identity Management Traditional identity management and security technologies create a protective wall around data sources and applications. Security zones, firewalls and identity and access management (IDAM) systems govern coarse grain-access to resources and services, including authenticating users to ensure (with a certain level of assurance) that the user is who they purport to be. These security and IDAM systems have focused on service level access management, which enforces policies at this coarse grain level to the service but not the fine-grained content itself. There is an implied trust between the access management service and the data objects and without some form of entitlements management solution, that trust becomes a key point of fragility within the system. In transactional systems, this works because it is often a push to a system consuming the transaction and that system has the ability to accept or reject the request/message based on their own policies. Although there still needs to be security and access management services, the risk factor is different than that of data/information based systems where data mining and the extraction of data has different privacy implications. In the event that a transactional system is being used for querying or updates of personal identifiable information (PII), all the same rules are true and it would mean that the system providing the data would need to provide fine grain access control as well. This would imply that user identity information is embedded within the transactional messages. Traditional IDAM service level access control creates a number of issues in an environment with sensitive data: Does not provide the full solution that is often required to meet regulatory and legal compliance. IDAM and security systems are a add-on that are outside of the core system thus they do not meet PdB requirements Programmers often embed code to enable fine-grain access control, providing less scalability IDAM provides the front gate authorization, but does not provide entitlement management to the actual digital assets within the system 6 / 11

Often a network based trust model is set up between systems with all queries going through a single user within the data source system Auditing for compliance is much more difficult without the ability to track access controls at the digital asset level PHEMI Central is a Big Data platform that enables a PbD framework and unlocks the potential for fine-grained access control and policy management for individual digital assets. Using PHEMI Central s governance management capability, a data steward can impose policies on digital assets that are then enforced through the policy enforcement engine. Access control can be provided to the element level within the meta data or derived data extracted right from the digital assets themselves. The ability to anonymize or mask some data elements elements and expose those that the user or system is entitled to provides a very high degree of assurance that the right information is being delivered to the right person, at the right time for the right reason. With PHEMI Central, the power to manage risk is given to the data steward directly and is not being passed on to programmers or trusted system interfaces that are susceptible to privacy breaches. A much richer set of data is available, along with a much more powerful policy management system. De-Identification and Redaction On-Demand An important part of any privacy strategy is the ability to de-identify personal information. This includes the ability to disallow the sharing of personally identifiable information by masking the information, or redacting an image or using more sophisticated data dependency algorithms to reduce the risk of re-identification. Based on specific policy rules, the de-identification process should be able to be enforced ondemand when the data is read, reducing data sprawl and the risk of data consistency errors. Security Whereas privacy rules describe who is allowed to see what, when and how, security describes the technical methods by which privacy and access are safeguarded. Relational database installations, like first-generation Big Data solutions, rely on a trusted relationship between the data repository and the application. As more and more personally identifiable information is aggregated, the consequences of a data breach increase, driving the need for a more robust security strategy. A Big Data warehouse must embrace multiple layers of security. Security at Rest A Big Data warehouse should be able to encrypt data at rest within the data repository. For 7 / 11

performance reasons, it is usually unnecessary to encrypt all data. Instead, encryption of only personally identifiable information is advised. Security in Flight All communications between data sources, data consumers and the Big Data warehouse should be encrypted using either the Secure Sockets Layer (SSL) or Transport Layer Security (TLS) protocol with 256 bit encryption. Governance According to the American Health Information Management Association (AHIMA), only 11% of healthcare organizations consider their information governance program as mature. 6 An effective IT governance strategy defines and enforces user roles with appropriate checks and balances to ensure appropriate behavior throughout the data lifecycle (creation, storage, use, archiving and retiring of information). An effective IT governance strategy should include the following key capabilities. Audit logs A Big Data Warehouse must maintain complete audit logs of system and user operations including account creation/modification/deletion; policy creation/modification/deletion; dataset creation/modification/deletion, etc. These log files must be completely tamper-proof for all users. However, approved users should be able to filter log files and export the information for downstream analysis. Data Immutability Unlike typical relational database systems that only protect certain columns (keys), an effective Big Data Warehouse should store all data in a write-only data system that is never modified and data can only be deleted at a pre-specified end-of-life date. This approach provides assurance of data integrity for audit and compliance requirements. Version Control A typical enterprise data warehouse is unable to recover data that has been accidentally modified or deleted, making corrections a time consuming process of searching through system backups to find old data to restore. A Big Data Warehouse should be able to keep a simple history of old revisions and allow administrators to trace changes over time, including the ability to audit who made the change and when. This approach provides a complete record of data history for audit and compliance requirements. Rollback 8 / 11

A system administrator should be able to trace changes to data and rollback changes as appropriate, while honoring Data Immutability, Audit Log and Version Control requirements. Data Verification All data collected and derived in the Big Data Warehouse should include a checksum held in metadata. This approach allows the system to quickly detect if data has been corrupted or tampered with. Retention Policy A Big Data Warehouse should be able to prevent users from deleting data during a configured retention period and should also automatically de-identify, delete or otherwise process information when the retention period expires. Data Validation Unlike typical relational database systems where external Extract, Transform, Load (ETL) tools cleanse and validate data, an effective Big Data governance strategy should perform the cleansing and validation process within the system and log all transformation operations to properly track data provenance. Where information is incomplete or out of valid range, an alert should notify the data steward. Data Veracity Data Verification is an anti-corruption feature of PHEMI Central that checks for changes in your data as it gets imported. Our alert system flags you when your data gets corrupted, assuring the integrity of your entire dataset. A traditional data warehouse has no alert system to protect you from your data getting corrupted. As it gets imported, your data can get altered during the transmission process. A traditional data warehouse does not identify errors in your dataset as they arise. Separation of Roles An effective governance strategy must maintain a clear separation of roles such that a single user cannot create accounts, read/write data and modify log files. About PHEMI About PHEMI Central PHEMI Central is a Big Data Warehouse with capabilities far beyond traditional approaches. The system uses proven Big Data technology to unlock information trapped in non-relational and unstructured data, scaling to petabytes at a fraction of the cost of traditional solutions. 9 / 11

Moreover, PHEMI Central incorporates innovative new privacy, security, and governance capabilities to handle the increasing complexity of modern data warehouse implementations. PHEMI Central supports the collection, curation and analysis of rich datasets, enabling organizations to roll out innovative services ranging from population health management to business intelligence to Open Data information sharing. PHEMI Central lets organizations turn their focus from the task of collecting and storing data to the more strategic role of curating data, deriving insights and offering services. PHEMI Central is available as software or as an appliance for deployment in the enterprise data center. Specifically designed to provide the agility necessary to meet the increasing volume, variety and velocity of today s enterprise, PHEMI Central incorporates an innovative Privacy by Design architecture to bring the power of Big Data to healthcare, the public sector and large enterprises with complex data requirements. Break down the information silos and automatically collect and report high quality, real time and comprehensive data. Mine the Big Data warehouse by rapidly adopting new and innovative applications from best-of-breed vendors or developing custom solutions in-house. Offer applications like Population Health, Business Intelligence, Open Data Information Sharing, Readmission Risk Analysis, Personalized Medicine, Post-Market Surveillance and Research Registries. Improve productivity and data quality by reducing data entry tasks automatically converting unstructured documents to structured data, and auto-populating known information where possible. Protect information privacy, security and governance without compromising legitimate right to use. Scale to petabytes and beyond at a fraction of the cost of traditional enterprise data warehouse alternatives. About PHEMI PHEMI is a process automation and Big Data platform company that unlocks patient data to improve clinic productivity, patient outcomes, and medical research. The PHEMI team architected PHEMI Clinical and PHEMI Central from the ground up specifically for healthcare, life sciences and the public sector, fully incorporating the 7 foundational principles of Privacy by Design as applied to Big Data. PHEMI is based in Vancouver, BC, Canada. For more information, please visit www.phemi.com. Dictionary of Terms Privacy Security Controls who is authorized to access certain information. Provides the technical methods to safeguard private information from unauthorized persons. 10 / 11

Powered by TCPDF (www.tcpdf.org) Governance Digital Asset Covers the stewardship and enforcement of roles, checks and balances to support the business needs. 11 / 11