BEIJING BOSTON BRUSSELS CHICAGO DALLAS FRANKFURT GENEVA HONG KONG HOUSTON LONDON LOS ANGELES NEW YORK PALO ALTO SAN FRANCISCO SHANGHAI SINGAPORE SYDNEY TOKYO WASHINGTON, D.C. Technology Assisted Review Goes Left: Predictive Analytics In Information Governance Jeffrey C. Sharer November 11, 2014 This presentation has been prepared by Sidley Austin LLP for informational and training purposes only and does not constitute legal advice. This information is not intended to create, and receipt of it does not constitute, a lawyer-client relationship. Readers should not act upon this without seeking advice from professional advisers.
Agenda I. Why Are We Here? II. What Is Predictive Analytics? III.Modernizing Information Governance A. Modernizing Records Retention B. Information Classification C. Defensible Deletion IV.Moving Forward 2
Why Are We Here? Gartner predicts that, by 2017, 33 percent of Fortune 100 organizations will experience an information crisis, due to their inability to effectively value, govern and trust their enterprise information. Press Release, Gartner, Inc., Gartner Says One-Third of Fortune 100 Organizations Will Face an Information Crisis by 2017 (Feb. 27, 2014)
Consequences of Increased Data Volumes Increased IT infrastructure costs 78% Cannot find information when needed 73% Increased regulatory compliance risk 67% Increased e-discovery costs 59% Need to recreate information previously created Increased risk of data leakage Increased complexity of protecting intellectual property or trade secrets 41% 49% 47% No serious consequences 2% Source: Council for Information Auto-Classification, The Information Explosion: How Organizations are Dealing With It (Oct 2011) (available at http://www.infoautoclassification.org/survey.php). 4
Litigation and Regulatory Risks Risk of sanctions for failures to identify, preserve, or collect relevant data Risk of missed deadlines and sanctions resulting from inability to respond timely and effectively to discovery requests Risk that cost-prohibitive e- discovery influences settlement dynamic Risk of cross-border conflicts between U.S. discovery obligations and foreign privacy laws 5
Data Privacy and Security Risks Risks of violating data privacy laws, for example: 6 Improper (unsecure) handling of protected information, such as social security numbers, financial accounts, etc. Data leakage to locations outside company s environment (e.g., personal devices, Dropbox, Google) Retention of information past required destruction (e.g., upon expiration of purpose) Inadvertently transferring protected data out of jurisdiction Risk of losing protected or sensitive information in data breach, potentially resulting in notification obligations, regulatory or civil exposure, damage to reputation, and other harm to company Risk of audit failures (internal or external) exposed to customers, clients, or regulators, or publicly through discovery in litigation
Data Security Challenges Safeguarding data and protecting information security systems is increasingly complicated International consumer base accessing sites online Cloud computing / third-party service providers The human component Persistent, coordinated cyberattacks Corporations face significant risks: Loss of intellectual property and operational disruption Customer litigation, regulatory enforcement, or shareholder lawsuits 7
Compliance Risks Accelerating global regulations FCPA, Dodd-Frank, GLBA, SOX, Basel III, AML, ABC, PCI, etc. Compliance violations more difficult to detect where offending data obscured by millions of other grains of sand Noise created by debris hinders performance of predictive analytics and other tools and processes that otherwise might detect violations Less sensitive data potentially to land in insecure storage
Risks In Dark Data o o o o Risk of legal liabilities, compliance violations, smoking guns, and other unpleasant surprises lurking in dark data Clogged systems can obscure legal liabilities and compliance issues and make identification difficult even upon search and inquiry Unknown liabilities may be identified, or known liabilities may increase in severity, through due diligence in connection with business transactions or other sensitive negotiations Expired data produced in litigation or regulatory proceedings can trigger additional lawsuits or investigations 9
The Internet of Things Digitization of the Physical World: Physical objects are equipped with unique identifiers that can be digitally tracked RFID tags used to identify and inventory objects (and by proxy, people) Smartphone control: sensors embedded in object send data to smartphone, informing user s actions Currently focused on home systems and appliances (home security, smart appliances, thermostats)
And The Challenges Are Not Going Away 50-fold growth in the digital universe from the beginning of 2010 through the end of 2020 40 Zettabytes in 2020 will be equivalent of: 57 times the amount of all of the grains of sand on all the beaches on earth Saved on Blu-ray discs, without sleeves or cases, would weigh as much as 424 Nimitz-class aircraft carriers 5,247 GB of data for every person in the world And the Internet of Things (Exabytes) 2014 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000-11 Source: IDC, The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East (Dec. 2012)
12 What Is Predictive Analytics?
What Is Predictive Analytics?
Predictive Analytics Is Everywhere
15 The Electronic Discovery Reference Model
16 The Electronic Discovery Reference Model
Predictive Analytics In IG Leverages technology to backstop humans Even low tech auto-classification can be very effective at identifying certain sensitive information types Policy rule engines Word indexes and metadata Complex and challenging to maintain Difficult to scale Machine learning tools
Classification Through Machine Learning Applying lessons learned in e-discovery, earlier in information lifecycle Growing body of research showing computer-assisted classification to be as or more effective than human classification Question is not whether any approach is perfect; question is whether it s more effective than alternatives Solid acceptance by courts and regulators in litigation and enforcement contexts Effective for early warning systems Not an easy button requires significant up-front work and ongoing maintenance, but potential to be more effective and efficient over long run
How Does Auto-Classification Work? Training Classification Validation Application Humans teach machine classification from sample documents Machine applies learning to other documents and classifies them Humans QC machine output and provide further training if needed Machine classification becomes one aspect of overall IG process 19
20 Modernizing Information Governance
Modernizing Information Governance Looking forward, looking back Leveraging technology to break down traditional information silos Key issues to address: Types of records Variety of storage media Regulatory sprawl Data protection laws International considerations Accessibility to end users Legacy data stores
Looking Forward Active and newly created records Integration of IG, business, legal, and compliance Potential compliance functions: Enhancing security of sensitive data Segregation Access controls by user, content type, or other criteria specific to individual records Anonymization, redaction, and expungement Surveillance and early warning systems Automated legal holds, disposition upon expiration, and other handling
Automating Information Governance Tired but true: People, process, and technology. Identifying and capturing records Classification through content templates Classification through user-selected folders Hybrid classification: User-based but analytics provides defaults, performs QC, etc. Machine-based: Analytics-based using rules and algorithms to identify and classify records Data security Machine-powered security classification, redaction and/or expungement
Further Applications Managing records Assign retention periods based on machine classification Ability to manage-in-place, and know what s being managed Accessing records Search on steroids using text- and concept-based search Automated disposition according to retention schedule Used properly, powerful means of driving compliance Process requires necessary checks and approvals
Looking Back Classification of existing records Identify and secure sensitive information already dispersed throughout organization Remediation of expired data, a/k/a, defensible deletion Reduce unnecessary retention Reduce storage and litigation costs Reduce legal and compliance risks Create business value
Benefits of Defensible Data Reduction Reduce legal and compliance risks Data privacy and security Elimination of dark data mitigates surprise smoking guns Organization knows what it has and what its risks are Save money Hard and soft costs associated with data storage and maintenance E-discovery and regulatory response Create business value Operating efficiencies, e.g., IT staffing and infrastructure Employees better able to find information when needed Business better able to extract value from data that remains 26
27 Moving Forward
It Takes A Village Internal several stakeholders, including: Records Management Information Technology In-House Counsel Data Privacy and Security Business Units External resources may include one or more of: 28 Outside counsel, to advise on legal and regulatory requirements and defense of process Consulting firm, to assist with planning and execution of workflows, sampling protocols, and validation of results Technology provider, for technology needed to index, collect, and analyze data across relevant sources
Action Plan Information Governance Next 7 days Consider maturity of information governance within your enterprise think strategically, focus on value propositions Next 90 days Identify starting point Assemble steering committee Set project milestones and goals Next 18 months Measure progress against milestones Measure results against goals Consider additional opportunities Stay the course 29
Questions? Jeffrey Sharer Sidley Austin LLP One South Dearborn Street Chicago, Illinois 60603 (312) 853-7028 jsharer@sidley.com 30
BEIJING BOSTON BRUSSELS CHICAGO DALLAS FRANKFURT GENEVA HONG KONG HOUSTON LONDON LOS ANGELES NEW YORK PALO ALTO SAN FRANCISCO SHANGHAI SINGAPORE SYDNEY TOKYO WASHINGTON, D.C. Technology Assisted Review Goes Left: Predictive Analytics In Information Governance Jeffrey C. Sharer November 11, 2014 This presentation has been prepared by Sidley Austin LLP for informational and training purposes only and does not constitute legal advice. This information is not intended to create, and receipt of it does not constitute, a lawyer-client relationship. Readers should not act upon this without seeking advice from professional advisers.