Auto Classification and the Holy Grail for Records Managers
|
|
|
- Martin Hudson
- 10 years ago
- Views:
Transcription
1 Auto Classification and the Holy Grail for Records Managers Doug Magnuson Information Lifecycle Governance Solution Leader for North America 1
2 Focus since 1997 IBM Information Lifecycle Governance Solutions Executive Huron Consulting Group Managing Director: Strategic Consulting to the CLO of Fortune 500 companies in the area of Information Governance, ediscovery, Risk, and Compliance. Carefree Technologies President: Developed and implemented software applications for Enterprise Content Management solutions. Education BS Industrial Engineering (503) Doug Magnuson Information Lifecycle Governance Solutions Doug has nearly 30 years of experience in business process design, change management, and systems improvement.for the last 12 years he has focused on Enterprise Content, Document, , Records Management, and ediscovery systems. Doug has assisted with the development of business plans, software specifications, and provided guidance through software selectionprocesses. He has provided oversight for implementation, process improvement and system conversion efforts. Doug is a frequent speaker on topics related to Enterprise Content Management. Representative examples of Doug s experience: Developed information management strategies Architected solutions including the development of frameworks and reference models describing system interaction. Prepared business plans, process improvements, software specifications, and system requirements for content, document, and records management systems. Provided guidance through software and vendor selection processes including development of detailed RFP's, business use cases, and demonstration scripts. Conducted implementation and conversion oversight ensuring all critical business needs are addressed. Advised major information management vendors in the design, development, and improvement of their electronic content management software. 2
3 Auto Classification and the Holy Grail for Records Managers Session Plan Using the IBM Watson Jeopardy Challenge to help tell the story about the advancements the industry (not just IBM) is making in Natural Language Processing. Learning Goals Understand the basic concepts of Natural Language Processing and Content Analytics and how it supports the needs of records managers Apply these unprecedented capabilities for records managers. First, to understand authoritatively and quickly what is currently being retained by their organization; and second, to have confidence that temporary and transient items can be identified and disposed. Be a leader. Attending this session you will learn about the business case, the operational benefits, and the compelling need for using content analytics is your organization. 3
4 Agenda Obstacles to Managing the Information that Matters Best Practice Readiness Leveraging Content Analytics for Records and Info Management Summary 4
5 The Information Flood will Continue to Challenge Governance Processes 90% of the information in the world was created in the last 2 years. 44x The additional amount of information that will exist in the universe by
6 Very Simple Savings Proposition: Dispose of Unnecessary Data Enterprise Information 6
7 Transform Traditional Practices with New Outcomes Traditional Emphasis Records Retention High Value Shift Defensible Disposal Retention for legal and regulatory duties and business value is necessary but not sufficient in the economic climate. Disposal of unnecessary data reduces legal and IT costs, and aligns information costs with information value consistent with IT and business objectives to contain costs. Policy Publication Instrumentation Instrumentingretention, holds and disposal policy execution on application data and unstructured data ensures compliance and enables efficient, consistent disposal of unnecessary information to eliminate run rate costs immediately and sustainably. Risk Monitoring Cost Take Out Reframing our information governance objectives to not only reduce risk but to improve information economics can contribute significant savings to our IT cost reduction objectives through enabling systematic disposal of unnecessary data and the abilityto recover assets rapidly. 7
8 The Economic Benefits of Defensible Disposal are Compelling We could spend $35m less next year and lower our run rate We could lower run rate $3m now and spend $24m less over 3 years We could free up $150m to drive revenue and profit 8
9 Where Content Analytics Can Help On April 5, 2010 twenty-nine miners were killed in a terrible accident This same mine was fined a total of $382,000 for "serious" unrepentant violations In the previous month, the authorities cited the mine for 57 safety infractions The mine received two citations the day before the explosion and in the last five years has been cited for 1,342 safety violations Could improved records and information visibility may have prevented this accident? 9
10 Information Growth Outpaces Human Capacity There are not enough humans to deal with the problem Even if there were enough people to deal with the problem: Humans make mistakes and misinterpret meaning Manual action is costly Humans are inconsistent and choose to opt-out According to GAO report 1, agency policies on preserving records are not followed consistently In a recent Cohasset study, manual classification costs 17 cents per document for an organization with 25M documents this would represent a cost of 3M$ Typically less than 25% of users actually declare records In a 2005 Department of the Navy study, only 12.5% of documents were classified with Exact accuracy at least 75% of the time 1 GAO Report: Federal Records: Agencies Face Challenges in Managing , Apr
11 The day of reckoning is here Keeping everything forever drives unsustainable costs routine disposal of information drives significant savings Traditional approaches do not work--more human beings is not the answer Content decommissioning and routine disposal has immediate and ongoing impact on IT budget Increased spending without routine disposal eventually consumes the IT budget 11
12 Principle #1: Data Growth Outpaces Storage Budgets and Business Processes Run rate costs double quickly if volume grows >30% Consumes CIO budget Storage: Direct Procurement Costs in Millions Information volume overwhelms information governance processes Undermines their effectiveness Governance processes have not matured to reflect volume, specifically how to: Define and execute legal holds and data collection Apply retention schedules to electronic information Align storage and manage information based on specific legal obligations and business value Provision, decommission and dispose of data High Risks & Mitigation Burden This leads to excess data and cost as well as operational challenges that in turn contribute to risk: Difficulty disposing of unnecessary data Complexity in applying legal holds Inefficiencies in data management and governance 16 governance processes impacted by high data volume such as placing holds, collecting evidence, decommissioning systems and their inherent risks, represented in A-O. 12
13 Principle #2: Increased Spending Consumes IT Budget 100% 80% 60% 40% 20% Most IT budgets are >80% committed to existing projects and are flat or declining Budgets can t contain rising keep everything costs and still provide for strategic IT investments Eventually keep everything models consume remaining IT budget dollars 0% Already Committed to Existing Projects Available for Strategic Investments Failure scenario 13
14 Principle #3: Routine Disposal has positive Impact on IT Budget 100% 80% 60% 40% 20% 0% Already Committed to Existing Projects Available for Strategic Investments Impact of Decommissioning and Routine Disposal Immediate reduction in supporting storage and infrastructure and needs / costs from content decommissioning Ongoing disposal ensures controlled information growth and preservation of IT budget Enables strategic IT investments to be made as needed Stakeholders must agree on defensible decommissioning and disposal of information processes or face failure scenario 14
15 The Form of Current Practices Intensifies the Challenge Disconnected siloesare the problem and the source of high cost and risk. Describes holds by custodians involved; communicates hold to custodians rather than IT. Generally focused on and files for its holds efforts. Relies on IT to keep everything, unconcerned about IT cost but struggles with cost of ediscoveryon so much data. DUTY Matter ,000 Hold 30, ,000 VALUE Department 8,000 ASSET Retention schedule doesn t reflect their need for information, so ignore it but may revolt if automated. Fighting to drive profit up and back office costs down. Angry about charge back costs, want better system performance and more from their data 5,000 12,000 Retention Schedule DUTY Laws & Regs 100-page record schedule on intranet organized by class; relies upon volunteer effort to apply the schedule to electronic information. May have emphasis on retaining and regulatory compliance for 5-10% of enterprise information rather than enabling systematic deletion of unnecessary data. Has petabytesof data but no idea what is needed or why has to assume it is all valuable. Organizes data by system and server names. Paying full cost of compliance while struggling to reconcile doubling data with shrinking budget. Systems 2,000-8,000 Information 3PBs 100PBs Billion choices for IT to triangulate laws, lawsuits, business value with data 15
16 Stakeholder Alignment Yields Best Practice Benefits DUTY VALUE DUTY Matter Department Laws & Regs Hold Systems ASSET Retention Schedule Information LEGAL BUSINESS IT RECORDS Modernize ediscovery Process Precise, reliable legal holds Assess evidence in place, collect less Lower legal risk, cost State Information Value Guidance on information utility Participate in volume reduction Align around value Optimize Information Volume Dispose and retire unnecessary data Optimize storage based on value Lower information cost Modernize Retention Process Address electronic information Executable schedules can be automated Lower legal risk, cost 16
17 Results: Lowers Operational Cost and Risk Curbs storage growth, lowers run rate permanently Program leadership, process improvement and technology from IBM Storage Direct Procurement Costs Run rate reduction and growth avoidance Run rate Information Lifecycle Governance Program Executive charter for enterprise initiative Processes, capabilities and accountability to achieve cost and risk reduction benefits through Process improvements, expertise and technology: Value-Based Archiving & Defensible Disposal Archive to shrink storage, align cost to value Dispose rather than store unnecessary data Estimated Risk or Mitigation Burden Reduction Extend and automate retention management Include electronic data that has business value in addition to records for regulatory requirements Automate retention schedules across all information to enable reliable, systematic disposal. Automate the legal holds and ediscovery process Structure and automate legal holds process to lower risk, increase precision, enable disposal Analyze in place to reduce unnecessary collection, processing and review 17
18 Agenda Obstacles to Managing the Information that Matters Best Practice Readiness Leveraging Content Analytics for Records and Info Management Summary 18
19 Information Has a Lifespan Requiring Disposition Frequency of Access and Use 95% Expires 90% Born Digital Almost all has a retention policy very little should be kept forever Almost all is born digital and the rest should become digital Time 19
20 Begin with a shared system Policy and Process Integration Across Information Stakeholders Enables Disposal, Lowers Cost and Risk Strategy and Execution Drive Business Outcomes with Structure, Defined Processes, Metrics, Capacity & Accountability STRATEGY EXECUTION Governance Program Driving Savings and Risk Metrics Charter, directive and accountability for enterprise program. Savings achievement cadence and reporting. Program Office to Coordinate Stakeholders, Drive Benefit Achievement Ensures cross-silo engagement and progress toward maturity targets and financial objectives, change management Technology Provides Capacity to Improve and Integrate Processes, Consistently and Defensibly Dispose, Decommission Automates processes, ensures transparency, provides capacity. Accelerated deployment to drive faster save. Reclamation Removes Excess Storage, Infrastructure Savings-prioritized reclamation and recovery of infrastructure to drive P&L benefit >$300M enterprise value created over 3 years with lower legal and IT costs, reduced risk 20
21 Process Capabilities & Requirements PROCESS TRANSPARENCY Unified Governance Transparency across stakeholder processes Common governance data model and enterprise map Linkage of duties, value to information assets and business processes Governance analytics CREATE, USE Optimal accessibility Communicate value and duration Tap governance liaisons Access valuable information more easily Analytics on volume/cost of information HOLD, DISCOVER Rigorous Discovery Robust, affirmative legal holds for people, records, and data Preserve in place automation where disposition occurs Efficient data analysis and collection Legal cost and risk analytics RETAIN, ARCHIVE Value-Based Taxonomy and regulatory requirements Business value inventory Reliable, executable retention schedules for records and information of value Archive during period of value only Information cost and risk analytics STORE, SECURE Efficient Storage Store and optimize by value Meet SLAs for structured and unstructed information access ILG execution capability and enablement (holds, retention, disposal, collection) for data Data hygiene and governance DISPOSE Defensible Disposal Catalog of information value and duty by asset Legacy data clean up, application retirement Procedures and capabilities for disposal by source Risk and cost dashboard for information portfolio 21
22 Best Practices for the Information Lifecycle 1 1 Optimize business activities to: Automatically record and preserve evidence of transactions, events and processes Reduce the high costs of ediscovery Enforce records retention policies Reduce infrastructure costs Perform periodic assessments to determine what information should be kept 2. Archive, collect and classify both data and content to decommission systems while preserving access to the information 3. Declare and control official business records 4. Respond to ediscovery requests more efficiently 5. Routinely dispose of information defensibly 6. Audit and govern the entire process while optimizing the underlying storage systems and infrastructure based on the value of information and associated legal duty 22
23 GARP Accountability Integrity Protection Compliance Availability Retention Disposition Transparency executive owner, delegated program responsibilities, documented program policies and procedures guarantee of authenticity and reliability protect records & information that are private, confidential, privileged, secret, or essential to business continuity compliance with applicable laws regulations and policies ensure timely, efficient, and accurate retrieval of needed information maintain records as dictated by legal, regulatory, fiscal, operational, and historical requirements secure and appropriate disposition for records that are no longer required documented recordkeeping program Source: ARMA International GARP Maturity Model 23
24 GARP Includes an Information Governance Maturity Model to help organizations: Evaluate recordkeeping programs and practices Identify gaps between current practices and the desirable level of maturity for each principle Assess the risk(s) to the organization, based on the biggest gaps Source: ARMA International GARP Maturity Model 24
25 Agenda Obstacles to Managing the Information that Matters Best Practice Readiness Leveraging Content Analytics for Records and Info Management Summary 25
26 Automation Best Practices for RIM Technologies for Identifying & Managing Information User-Driven Automated Collection & Declaration Advanced Classification Content Analytics Users make collection and records categorization decisions by reviewing each content item Administrative users build automated collection and records declaration policies that use rule-based metadata policies that use rule-based metadata and advanced contextual classification policies that use rule-based metadata, advanced contextual classification, and advanced content analytics LEVEL 2 (In Development) LEVEL 3 (Essential) LEVEL 4 (Proactive) LEVEL 5 (Transformational) 26
27 Level 2: User-driven Collection & Declaration User-driven Collection & Declaration Users decide and control how content is declared. Highly subjective and inaccurate. Assumes all users understand records policies LEVEL 2 (In Development) NARA User Participation National Archives and Records Administration Electronic Records Management initiative focused on user driven records declaration 6+ month study Significant user drop-off after training period End users frequently outright refuse to categorize content Silos full of existing content abound resulting in large backlogs in addition to new content Manual declaration and an emphasis on user training an outdated practice 27
28 Level 3: Automated Collection & Declaration Automated Collection & Declaration Administrative users build automated collection and records declaration policies that use rule-based metadata Assess, monitor, identify, and collect information from all locations to facilitate RIM activities Legal preservation holds Records identification and declaration Information archiving Rule-based policies examine sources across enterprise silos, identify relevant information, and collect into a consolidated managed repository for RIM policy enforcement LEVEL 3 (Essential) 28
29 Level 4: Advanced Classification Advanced Classification policies that use rule-based metadata and advanced contextual classification LEVEL 4 (Proactive) Faster, More Accurate Collection Automation Increases collection accuracy with intelligent policy decisions based on both metadata and content context, without burdening users Flexible Automation Rapidly trains via learn-byexample approach, with flexible automation levels to accelerate adoption and acceptance Incorporates user feedback in real-time to improve understanding Auditable logic documents classification decisions for improved defensibility 29
30 Decision Plans Layer Multiple Methods for Records classification High Consistency Accuracy Consistent Participation & Enforcement Multiple Methods Imply Context Based Classification Ask Rules Based Classification Inspect Decision Plans combine approaches to classification Low Low Manual Classification Cost Savings Productivity High Context-based classification delivers high accuracy, rulesbased classification addresses hard-and-fast requirements. Combining methods delivers the best results. 30
31 Rule Systems: the Effect of Real-Time Learning High Manual Classification Rules Based Classification Multiple Methods Context Based Classification Use rule systems to act on existing meta data or keywords available in the process, content system or document properties. Low Low High 31
32 High Context Based Classification Manual Classification Rules Based Classification Multiple Methods Context Based Classification Use context based classification to inspect the document when there is not enough meta data already available Low Low High 32 Simple rules or keyword based analysis can be too coarse to make fine distinctions between long-form texts with very different intent 32
33 High Critical dimensions of classification Manual Classification Rules Based Classification Multiple Methods Context Based Classification Use manual classification for high value documents or when other methods do not provide enough information. Consider the volumes of information. Low Low High Manual Automated Accuracy Cost (per doc) 92% X 60 90% 46% $ 0.17 < $ 0.01 Consistency <50% 100% Increasing volume and variety of information magnifies the challenges of consistency and cost burdens 33
34 Classification at the US Army Challenge: Government Accountability Office (GAO) Report: 4 federation agencies surveyed revealed NARA regulation non-compliance, specifically with Factors contributing to noncompliance included insufficient training and oversight as well as the difficulties of managing large volumes of . 1 Training 1.2 million users: * Logistical impossibility, given the scale of the organization * Poorly aligned to users skills and inefficient use of their time Solution: Utilize IBM Classification Module in IBM s archiving and records management solution to automate record categorization without burdening users Benefits: 85% automation after Phase 1 99% automation after Phase 2 Each phase tested on approximately 600,000 messages (different corpus each phase) As a records manager with a 25-year background in federal and civilian records management, I believe the automatic categorization of information is the next logical evolution in managing the records of an organization. 2 Brenda Fletcher, records manager, United States Army ROI Projections: 900 TB of disk savings, annually $1.8 M in hardware savings alone, independent of human costs and consistency of classification Very high satisfaction with each pass when reviewed manually by a Records Manager for accuracy 1. GAO Report: Federal Records: Agencies Face Challenges in Managing , Apr IBM Case Study: Achieving compliance and controlling costs with automated categorization of for records management. A look at best practices and a U.S. Army ROI use case. 34
35 Level 5: Content Analytics Content Analytics policies that use rule-based metadata, advanced contextual classification, and advanced content analytics Bloated Production Systems with Inefficient Storage Unnecessary Information Content In The Wild Necessary Information The only way to effectively deal with massive amounts of records and information Search only has proven to fail LEVEL 5 (Transformational) 35 35
36 Traditional approaches are converging More than keyword search is needed Making unstructured data searchable is now a presumed primary interface for applications of all kinds, as well as for intranets and content repositories. Whit Andrews, Rita Knox Gartner Enterprise Search Content Analytics Business Intelligence Analyzing unstructured content no longer optional For many business process professionals, access to structured data, even when supported by BI or predictive analytics, lacks sufficient context for customer service, finance, and other areas where communications with customers involves many channels Craig Le Clair Forrester Increasing in business importance Early adopters of [text analytics] are already gaining a competitive advantage. Organizations that fail to do so will be at risk. Sue Feldman IDC Text Analytics Converging toward content analytics Every enterprise should understand how content analytics can produce answers to its critical questions; understanding this now will make it possible to exploit these tools as their availability proliferates. Rita Knox Gartner 36
37 Content Analytics can enable content archival, expiration and disposal Bloated Production Systems with Inefficient Storage Content In The Wild Unnecessary Information Necessary Information Content Analytics helps you gain control by eliminating unneeded content and content systems while preserving valued content One customer found 1200 copies of the same policy document, including 5 different versions, distributed across enterprise file servers 37
38 Organizations need text analysis and natural language processing to effectively deal with large volumes of records Over 80% of information being stored is unstructured Text analytics unlocks the power of that information for a variety of suctions and applications Data Content What is Text Analytics? Text Analytics (NLP*) describes a set of linguistic, statistical, and machine learning techniques that allow text to be analyzed and key information extracted for business integration * NLP = Natural Language Processing 38 38
39 Going from raw information to insightful information using natural language processing and analytics Uncover the value of records and information through visual-based approach Aggregate and extract from multiple sources Organize, analyze and visualize Search and explore to derive insight to form large text-based collections from multiple internal and external sources (and types), including ECM repositories, structured data, social media and more. enterprise content(and data) by identifying trends, patterns, correlations, anomalies and business context from collections. from collections to confirm what is suspected or uncover something new without being forced to build models or deploy complex systems. 39
40 Content Analytics Explained Claimant: Soft Tissue Injury Extracted Concept Analyzed Content (and Data) Person Injury Body Part Location Noun Verb Noun Phrase Prep Phrase John sprained his ankle on the step... Source Information Internal (ECM, Files, DBMS, etc.) and External (Social, News, etc.) Automatic Visualization for Interactive Exploration and Assessment 40
41 Content Analytics Enables Interactive Exploration Metadata * File type * File size * File location * Creation date * Author Content * Topic of document * Purpose of document * Organizations mentioned * Individual mentioned * Concepts mentioned Are particular file types or locations correlated to particular languages When did documents on a particular topic begin appearing in our file systems? Why is there content mentioning a particular organization from 12+ years ago? 41
42 Extend the same concepts to ediscovery Quickly get a view of the people, sender and recipient domains, and companies involved. Combine facets and filters to quickly include and eliminate custodians and data such as people from certain locations or other combination. Automatically extracted phrases in the content show the essence of the information. Organize a topographical view by key category. The peaks show frequency and phrases to quickly identify relevant information. 42
43 How to Decommission Unnecessary Content 1. Identify Content Sources to be assessed 2. IT Initial Assessment to decommission IT irrelevant content (duplicates, machine generated s, etc.) 3. LOB and RIM Specific Assessments to decommission over-retained and obsolete content and to collect and classify valued and obligated content (requires knowledge of content value). 4. System & Application Decommissioning by IT 5. Periodic audits by IT, RIM and LOB keep content environments optimized 5 Periodic Audit Identify Content Sources Initial IT Assessment Specific LOB Assessments System & Application Decommissioning Content Collection Content Collection 43
44 Benefits of Content Analytics Before Overflowing file systems burying valuable content Storage is 17% of IT budget Dormant and orphaned content on high-cost storage, necessitating software maintenance Knowledge workers searching for info 10 hrs/week Over-retained information stored beyond disposition After Up to 80% reduction in content storage costs Corresponding storage administration costs cut by 40-60% Elimination of up to $150K per retired software application Up to 20-40% of searching time eliminated Lower risk 44
45 Level 5+: A Look Towards the Future Deep Question Answering Applying natural language question answering technology to RIM Next LEVEL (Beyond Transformational) 45 45
46 Truly understanding natural language is the next great computing challenge Over 80% of information today is unstructured and based on natural language The impact of Systems of Engagementboth inside and outside the firewall is dramatic such masses of information not easily understandable by humans Legacy approaches have all failed; searching not the right approach A new approach is needed, leveraging content analysisand natural language processing 46
47 The Next Grand Challenge 47
48 Real language is real hard Chess A finite, mathematically well-defined search space Limited number of moves and states Grounded in explicit, unambiguous mathematical rules Human Language Ambiguous, contextual and implicit Contains slang, riddles, idioms, abbreviations, acronyms and more Grounded only in human cognition Seemingly infinitenumber of ways to express the same concepts and meaning 48
49 Unstructured vs Structured The hard part: understanding natural language with confidence and accuracy Where was Einstein born? One day, from among his city views of Ulm, Otto chose a watercolor to send to Albert Einstein as a remembrance of Einstein s birthplace. Welch ran this? Unstructured If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE. Structured 49
50 The Jeopardy! Challenge 5 key dimensions to drive the technology Broad/open domain Complex language High precision $200 If you're standing, it's the direction you should look to check out the wainscoting $800 In cell division, mitosis splits the nucleus & cytokinesissplits this liquid cushioning the nucleus Accurate confidence High speed $1000 Of the 4 countries in the world that the U.S. does not have diplomatic relations with, the one that s farthest north 50
51 Examples from Jeopardy! clues and missing links This fish was thought to be extinct millions of years ago until one was found off South Africa in 1938 Category: ENDS IN "TH" Answer: coelacanth When hit by electrons, a phosphor gives off electromagnetic energy in this form Category: General Science Answer: light (or photons) Secy. Chase just submitted this to me for the third time--guess what, pal. This time I'm accepting it Category: Lincoln Blogs Answer: his resignation 51 51
52 The Jeopardy! winner s cloud Best human performance Each dot represents an actual human Jeopardy! game Top human players are remarkably good Winning Human Performance Grand Champion Human Performance Past computer results 2007 QA Computer System More Confident Less Confident 52
53 The technology behind IBM Watson How it Really Works with Content Question Primary Search Multiple Natural Language Interpretations Question & Topic Analysis Answer Sources Question Decomposition Candidate Answer Generation 100 s of Sources Hypothesis Generation Answer Scoring 1000 s of Pieces of Evidence Evidence Sources Evidence Retrieval Deep Evidence Scoring 100,000 s Scores from Many Deep Analysis Algorithms Hypothesis and Evidence Scoring Balance, Weigh & Combine Synthesis Learned Models help combine and weigh the Evidence Models Models Models Models Models Models Final Confidence Merging & Ranking Hypothesis Generation... Hypothesis and Evidence Scoring Answer with Confidence 53
54 Isn t this just like search? Question: What happens if my shoelaces become untied? Search only results: Based on keyword popularity and search engine optimized Lots of shopping suggestions Results prove it didn t understand the question Can include profanity PROFANITY Note: This is mocked up from two separate search query approaches 54
55 Evidence Profiles summarize evidence analysis across many sources Clue: Chile shares its longest land border with this country. Bolivia is more popular due to a commonly discussed border dispute but Argentina has more reliable sources Correct Answer: Argentina 55
56 Different Types of Evidence: Keyword Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. In May, Gary arrived in India after he celebrated his anniversary in Portugal. arrived in celebrated Keyword Matching celebrated In May 1898 Keyword Matching In May 400th anniversary Keyword Matching anniversary Evidence suggests Gary is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence arrival in India explorer Portugal Keyword Matching Keyword Matching Gary India in Portugal 56 56
57 Different Types of Evidence: Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. On On 27th 27th May May 1498, 1498, Vasco Vasco da dagama On landed 27th May landed in in Kappad 1498, Vasco Beach Beachda Gama landed in Kappad Beach On the 27 th of May 1498, Vasco da Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Portugal Find Judge Evidence Many inference algorithms landed in May th anniversary Temporal Reasoning 27th May 1498 Stronger evidence can be much harder to find and score. arrival in India explorer Statistical Paraphrasing GeoSpatial Reasoning Date Math Paraphrase s Geo- KB Kappad Beach Vasco da Gama 57 57
58 IBM at 100: ECM Innovation for Over 50 Years Beginning in 1957 Searching and Classifying Content Syndication Imaging Tarian Software Workflow / BPM 58 IBM Confidential Watson Advanced Case Management Content Analytics (TAKMI) Records Management / ediscovery Aptrix Green Pasture Digital Libraries Video Content ECM Standards Pure iphrase FileNet Production Imaging 2011 PSS Systems Datacap Venetica Edge Over $15B Invested Since
59 Agenda Obstacles to Managing the Information that Matters Best Practice Readiness Leveraging Content Analytics for Records and Information Management Summary, and How to Get Started 59
60 RIM Benefits of Content Analytics Smarter Decisions Lower Costs Reduced Risk Increased Productivity Invest smarter: Develop ROI case for information governance Cut storage costs up to 80% by eliminating the unnecessary Lower risk via more consistent disposition Eliminate manual analysis and classification at 17 cents/doc Plan smarter: Prioritize riskiest areas for your focus Reduce administration burden for storage 40-60% Eliminate up to $150K associated to each system and application decommissioned ediscovery collection in hours vs. days Cut ediscovery review by 5-10%: fewer docs for collection, leads to lower ediscovery review costs Slash the 10 hours per week knowledge workers spend searching by removing the irrelevant content Rapid creation of rules, policies and models 60
61 Best practice process for leveraging Content Analytics Ensure stake holder alignment Develop or adopt a valuation model test the model Choose the right technology based on your requirements Monitor, refine, audit and report on the results Track and promote the ROI 61
62 Next Steps: Validate The Potential Savings and How to Achieve Them 62
63 Summary The day of reckoning is here take action now Humans are not the answer the only way forward is with content analytics Dynamically Analyze to empower your information stakeholders to make decisions Decommission what s unnecessary Preserve and Exploit the content that matters and your IT budget 100% 80% 60% Unnecessary Information Necessary Information 40% 20%
64 References Title Compliance, Governance and Oversight Council (CGOC) Benchmark Report on Information Governance Litigation Cost Survey of Major Companies, May 2010 (from the Conference on Civil Litigation, Duke Law School, May 2010) Link ew/33a2682a2d4ef e4b5/$file/litigation%20cost%20s urvey%20of%20major%20companies.pdf?openelement InformationWeek December 2009 IDC 2010 Digital Universe Study, sponsored by EMC Fulbright s 6th Annual Litigation Trends Survey Report, Oct 2009, with permission U.S. General Accounting Office (GAO): Federal Records: Agencies Face Challenges in Managing E- Mail, Apr 2008 Generally Accepted Recordkeeping Principles (GARP ) from ARMA International Information Management Reference Model (IMRM) from Achieving compliance and controlling costs with automated categorization of for records management. A look at best practices and a U.S. Army ROI use case. Unstructured Information Management Architecture (UIMA) Open Source Project How Content Assessment Can Reduce Your Risk and Help Manage Storage More Efficiently, Feb 2010, by Osterman Research htmlfid=imw14166usen&attachment=imw14166usen.pdf ma.index.html U=33910 More information about IBM Content Analytics for Assessment More information about IBM s Information Lifecycle Governance solutions e_warehouse/site/index.html 64
65 IBM is a Unique, Strategic Partner in Enabling Defensible Disposal 65
66 Thank you 66
98% 22% RM-Speedy [03] Sven Hapke IBM Deutschland GmbH. From Records Management to Information Lifecycle Governance
RM-Speedy [03] From Records Management to Information Lifecycle Governance Sven Hapke IBM Deutschland GmbH The Information Governance Problem 98% Companies that cite defensible disposal as key result of
Information Lifecycle Governance. Surabhi Kapoor & Jan Lambrechts
Information Lifecycle Governance Surabhi Kapoor & Jan Lambrechts Information Lifecycle Governance Executive Overview 1 Introduction to Information Lifecycle Governance 2 It s no longer about one thing
SMART ARCHIVING. The need for a strategy around archiving. Peter Van Camp
SMART ARCHIVING The need for a strategy around archiving Peter Van Camp I.R.I.S. mission I.R.I.S. mission : Increase our customers productivity and knowledge through helping them better manage their documents,
Auto-Classification for Document Archiving and Records Declaration
Auto-Classification for Document Archiving and Records Declaration Josemina Magdalen, Architect, IBM November 15, 2013 Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management
Cohasset Associates, Inc. NOTES. 2014 Managing Electronic Records Conference 1.1. The discipline of analyzing the. Value Costs and Risks
Understanding Today s Economics of Information Get Your Act Together Now! Sylvan Sibito H Morley III IBM Worldwide Director Information Lifecycle Governance Information Economics: The discipline of analyzing
IBM ediscovery Identification and Collection
IBM ediscovery Identification and Collection Turning unstructured data into relevant data for intelligent ediscovery Highlights Analyze data in-place with detailed data explorers to gain insight into data
How the Information Governance Reference Model (IGRM) Complements ARMA International s Generally Accepted Recordkeeping Principles (GARP )
The Electronic Discovery Reference Model (EDRM) How the Information Governance Reference Model (IGRM) Complements ARMA International s Generally Accepted Recordkeeping Principles (GARP ) December 2011
Gain control over all enterprise content
Brochure Gain control over all enterprise content HP Autonomy ControlPoint Turning Big Data into little data Most organizations today store data in a number of business systems and information repositories.
IBM Unstructured Data Identification and Management
IBM Unstructured Data Identification and Management Discover, recognize, and act on unstructured data in-place Highlights Identify data in place that is relevant for legal collections or regulatory retention.
ILM et Archivage Les solutions IBM
Information Management ILM et Archivage Les solutions IBM Dr. Christian ARNOUX Consultant Information Management IBM Suisse, Software Group 2007 IBM Corporation IBM Strategy for Enterprise Content Compliance
IBM Watson : Beyond playing Jeopardy!
IBM Watson : Beyond playing Jeopardy! Katharine Frase, VP Industries Research, IBM with thanks to: David Ferrucci, Principal Investigator, DeepQA Team @ IBM Research April 24, 2012 Want to Play Chess or
Agile enterprise content management and the IBM Information Agenda.
Transforming your content into a trusted, strategic asset Agile enterprise content management and the IBM Information Agenda. Delivering a common information framework for uncommon business agility Highlights
Integrated email archiving: streamlining compliance and discovery through content and business process management
Make better decisions, faster March 2008 Integrated email archiving: streamlining compliance and discovery through content and business process management 2 Table of Contents Executive summary.........
Director, Value Engineering
Director, Value Engineering April 25 th, 2012 Copyright OpenText Corporation. All rights reserved. This publication represents proprietary, confidential information pertaining to OpenText product, software
IBM Unstructured Data Identification & Management An on ramp to reducing information costs and risk
Amir Jaibaji - Product Management Program Director IBM Information Lifecycle Governance IBM Unstructured Data Identification & Management An on ramp to reducing information costs and risk Enterprise big
Brochure. ECM without borders. HP Enterprise Content Management (ECM)
Brochure ECM without borders HP Enterprise Content Management (ECM) HP Enterprise Content Management (ECM) Without question, the volume, variety, and velocity of data across your enterprise create new
Breaking Down the Silos: A 21st Century Approach to Information Governance. May 2015
Breaking Down the Silos: A 21st Century Approach to Information Governance May 2015 Introduction With the spotlight on data breaches and privacy, organizations are increasing their focus on information
Real World Strategies for Migrating and Decommissioning Legacy Applications
Real World Strategies for Migrating and Decommissioning Legacy Applications Final Draft 2014 Sponsored by: Copyright 2014 Contoural, Inc. Introduction Historically, companies have invested millions of
Putting IBM Watson to Work In Healthcare
Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research [email protected] Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or
IBM Software Five steps to successful application consolidation and retirement
Five steps to successful application consolidation and retirement Streamline your application infrastructure with good information governance Contents 2 Why consolidate or retire applications? Data explosion:
IBM Policy Assessment and Compliance
IBM Policy Assessment and Compliance Powerful data governance based on deep data intelligence Highlights Manage data in-place according to information governance policy. Data topology map provides a clear
MAN VS. MACHINE. How IBM Built a Jeopardy! Champion. 15.071x The Analytics Edge
MAN VS. MACHINE How IBM Built a Jeopardy! Champion 15.071x The Analytics Edge A Grand Challenge In 2004, IBM Vice President Charles Lickel and coworkers were having dinner at a restaurant All of a sudden,
A Practical Guide to Legacy Application Retirement
White Paper A Practical Guide to Legacy Application Retirement Archiving Data with the Informatica Solution for Application Retirement This document contains Confidential, Proprietary and Trade Secret
The World of Information Governance
The World of Information Governance Society of Corporate Compliance and Ethics Maggi Johnsen, CRM October 12, 2012 Table of Contents What is Information Governance (IG)? What Might Lead to an IG Failure?
The evolution of data archiving
The evolution of data archiving 1 1 How archiving needs to change for the modern enterprise Today s enterprises are buried by data, and this problem is being exacerbated by the unfettered growth of unstructured
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R L e v e r a g e R e c o r d s M a n a g e m e n t B e s t P r a c t i c e s t
Leveraging Information For Smarter Business Outcomes With IBM Information Management Software
Leveraging Information For Smarter Business Outcomes With IBM Information Management Software Tony Mignardi WW Information Management Sales IBM Software Group April 1 2009 Agenda Our Smarter Planet and
3 MUST-HAVES IN PUBLIC SECTOR INFORMATION GOVERNANCE
EXECUTIVE SUMMARY Information governance incorporates the policies, controls and information lifecycle management processes organizations and government agencies utilize to control cost and risk. With
SAME PRINCIPLES APPLY, BUT NEW MANDATES FOR CHANGE
Information is an organization s most important strategic asset the lifeblood of the organization s knowledge, processes, transactions, and decisions. With information continuing to grow exponentially,
Guide to Information Governance: A Holistic Approach
E-PAPER DECEMBER 2014 Guide to Information Governance: A Holistic Approach A comprehensive strategy allows agencies to create more reliable processes for ediscovery, increase stakeholder collaboration,
Fundamentals of Information Governance:
Fundamentals of Information Governance: More than just records management PETER KURILECZ CRM CA IGP Hard as I try, I simply cannot make myself understand how Information Governance isn t just a different
UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES
UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES CONCEPT SEARCHING This document discusses some of the inherent challenges in implementing and maintaining a sound records management
The Smart Archive strategy from IBM
The Smart Archive strategy from IBM IBM s comprehensive, unified, integrated and information-aware archiving strategy Highlights: A smarter approach to archiving Today, almost all processes and information
What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy
What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence
EMC SourceOne Email Management and ediscovery Overview
EMC SourceOne Email Management and ediscovery Overview Deanna Hoover EMC SourceOne Systems Engineer 1 Agenda Value of Good Information Governance Introduction to EMC SourceOne Information Governance Email
Autonomy Consolidated Archive
Autonomy Consolidated Archive Dennis Wild Director SME, Information Governance and Archiving POWER PROTECT PROMOTE Meaning-Based Governance Files IM Audio Email Social Video SharePoint Archiving = Gain
Defensible Disposition Strategies for Disposing of Structured Data - etrash
Defensible Disposition Strategies for Disposing of Structured Data - etrash Presented by John Isaza, Esq., FAI Co-Founder & CEO, Information Governance Solutions, LLC Tom Reding, CRM Executive Consultant,
W H I T E P A P E R E X E C U T I V E S U M M AR Y S I T U AT I O N O V E R V I E W. Sponsored by: EMC Corporation. Laura DuBois May 2010
W H I T E P A P E R E n a b l i n g S h a r e P o i n t O p e r a t i o n a l E f f i c i e n c y a n d I n f o r m a t i o n G o v e r n a n c e w i t h E M C S o u r c e O n e Sponsored by: EMC Corporation
Reduce Cost, Time, and Risk ediscovery and Records Management in SharePoint
Reduce Cost, Time, and Risk ediscovery and Records Management in SharePoint David Tappan SharePoint Consultant C/D/H [email protected] Twitter @cdhtweetstech Don Miller Vice President of Sales Concept Searching
Collaboration. Michael McCabe Information Architect [email protected]. black and white solutions for a grey world
Collaboration Michael McCabe Information Architect [email protected] black and white solutions for a grey world Slide Deck & Webcast Recording links Questions and Answers We will answer questions at
Information Governance 2.0 A DOCULABS WHITE PAPER
Information Governance 2.0 A DOCULABS WHITE PAPER Information governance is the control of an organization s information to meet its regulatory, litigation, and risk objectives. Effectively managing and
Tapping the benefits of business analytics and optimization
IBM Sales and Distribution Chemicals and Petroleum White Paper Tapping the benefits of business analytics and optimization A rich source of intelligence for the chemicals and petroleum industries 2 Tapping
Business white paper. Lower risk and cost with proactive information governance
Business white paper Lower risk and cost with proactive information governance Table of contents 3 Executive summary 4 Information governance: the new business imperative 4 A perfect storm of information
SAP Thought Leadership Business Intelligence IMPLEMENTING BUSINESS INTELLIGENCE STANDARDS SAVE MONEY AND IMPROVE BUSINESS INSIGHT
SAP Thought Leadership Business Intelligence IMPLEMENTING BUSINESS INTELLIGENCE STANDARDS SAVE MONEY AND IMPROVE BUSINESS INSIGHT Your business intelligence strategy should take into account all sources
IDC MaturityScape Benchmark: Big Data and Analytics in Government. Adelaide O Brien Research Director IDC Government Insights June 20, 2014
IDC MaturityScape Benchmark: Big Data and Analytics in Government Adelaide O Brien Research Director IDC Government Insights June 20, 2014 IDC MaturityScape Benchmark: Big Data and Analytics in Government
How To Know If Your Email Archive Is Ready To Be Used For Business
7 REASONS TO WORRY ABOUT YOUR CURRENT EMAIL ARCHIVING STRATEGY The data growth explosion facing most organizations today is coinciding with the mounting demands of stagnant IT budgets and an increased
The IBM Solution Architecture for Energy and Utilities Framework
IBM Solution Architecture for Energy and Utilities Framework Accelerating Solutions for Smarter Utilities The IBM Solution Architecture for Energy and Utilities Framework Providing a foundation for solutions
agility made possible
SOLUTION BRIEF CA IT Asset Manager how can I manage my asset lifecycle, maximize the value of my IT investments, and get a portfolio view of all my assets? agility made possible helps reduce costs, automate
Records Management and SharePoint 2013
Records Management and SharePoint 2013 SHAREPOINT MANAGEMENT, ARCHITECTURE AND DESIGN Bob Mixon Senior SharePoint Architect, Information Architect, Project Manager Copyright Protected by 2013, 2014. Bob
Using EMC SourceOne Email Management in IBM Lotus Notes/Domino Environments
Using EMC SourceOne Email Management in IBM Lotus Notes/Domino Environments Technology Concepts and Business Considerations Abstract EMC SourceOne Email Management enables customers to mitigate risk, reduce
Adopting the DMBOK. Mike Beauchamp Member of the TELUS team Enterprise Data World 16 March 2010
Adopting the DMBOK Mike Beauchamp Member of the TELUS team Enterprise Data World 16 March 2010 Agenda The Birth of a DMO at TELUS TELUS DMO Functions DMO Guidance DMBOK functions and TELUS Priorities Adoption
IDC MaturityScape Benchmark: Big Data and Analytics in Government
IDC MaturityScape Benchmark: Big Data and Analytics in Government Adelaide O Brien Research Director, IDC [email protected] Presentation to ACT-IAC Emerging Technology SIG July, 2014 IDC MaturityScape Benchmark:
Predictive Analytics for Donor Management
IBM Software Business Analytics IBM SPSS Predictive Analytics Predictive Analytics for Donor Management Predictive Analytics for Donor Management Contents 2 Overview 3 The challenges of donor management
Implementing Enterprise Information Governance: A Practical Approach
Implementing Enterprise Information Governance: A Practical Approach TAD C. HOWINGTON, CRM, CA, FAI MANAGER, RECORDS AND INFORMATION COPANO ENERGY HOUSTON, TEXAS 512.627.9181 Learning Objectives Upon completion
Creating a Business Intelligence Competency Center to Accelerate Healthcare Performance Improvement
Creating a Business Intelligence Competency Center to Accelerate Healthcare Performance Improvement Bruce Eckert, National Practice Director, Advisory Group Ramesh Sakiri, Executive Consultant, Healthcare
Big Data & Analytics for Semiconductor Manufacturing
Big Data & Analytics for Semiconductor Manufacturing 半 導 体 生 産 におけるビッグデータ 活 用 Ryuichiro Hattori 服 部 隆 一 郎 Intelligent SCM and MFG solution Leader Global CoC (Center of Competence) Electronics team General
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing
Delivering Quality Service with IBM Service Management
Delivering Quality Service with IBM Service Milos Nikolic, Global Technology Services Manager Dragan Jeremic, Technical Sales Specialist for Tivoli December 10th 2008 Innovation is the Process, Success
Information Archiving
Information Archiving Drinking from the firehose. Raymond Lambie Product Marketing Manager, HP Autonomy AP/J Archive or Backup What is the difference? Ctrl-X or Ctrl-C An archive is a primary copy of inactive
RIM s Growing Challenges in Managing ESI
Learning From Mistakes: Three Big Errors Breaking Information Governance Systems 1) Let s just expand today s RIM to email/files. Kon Leong CEO ZL Technologies, Inc. [email protected] DGI Conference April
The Evolution of Enterprise Social Intelligence
The Evolution of Enterprise Social Intelligence Why organizations must move beyond today s social media monitoring and social analytics to Social Intelligence- where social media data becomes actionable
IBM Information Archive for Email, Files and ediscovery
IBM Information Archive for Email, Files and ediscovery Simplify and accelerate the implementation of an end-to-end archiving and ediscovery solution Highlights Take control of your content with an integrated,
Solve Your Toughest Challenges with Data Mining
IBM Software Business Analytics IBM SPSS Modeler Solve Your Toughest Challenges with Data Mining Use predictive intelligence to make good decisions faster Solve Your Toughest Challenges with Data Mining
Electronic Records Management
Electronic Records Management HOW TRANSIT AGENCIES CAN LEVERAGE THEIR USE What is Electronic Records Management Electronic Records Management (ERM) utilizes technology to enable the indexing, imaging,
Generally Accepted Recordkeeping Principles How Does Your Program Measure Up?
Generally Accepted Recordkeeping Principles How Does Your Program Measure Up? GARP Overview Creation Purpose GARP Overview Creation About ARMA International and the Generally Accepted Recordkeeping Principles
84% of Migration Projects Fail Getting it Right in SharePoint
84% of Migration Projects Fail Getting it Right in SharePoint Val Orekhov Don Miller Vice President of Sales Concept Searching [email protected] Twitter @conceptsearch Michael Konrath Chief Architect
Agenda. You are not in the business to manage records
Global Records and Information Management Risk: Proactive and Practical Approaches to Effective Records Management September 16, 2014 Maura Dunn, MLS, CRM Lee Karas, MBA Agenda Drivers for your Records
ECM: Key Market Trends and the Impact of Business Intelligence
ECM: Key Market Trends and the Impact of Business Intelligence Cheryl McKinnon, Principal Analyst February 2014 Agenda ECM current state and market trends Achieve ECM success by using business intelligence
Overview, Goals, & Introductions
Improving the Retail Experience with Predictive Analytics www.spss.com/perspectives Overview, Goals, & Introductions Goal: To present the Retail Business Maturity Model Equip you with a plan of attack
Predictive Coding, TAR, CAR NOT Just for Litigation
Predictive Coding, TAR, CAR NOT Just for Litigation February 26, 2015 Olivia Gerroll VP Professional Services, D4 Agenda Drivers The Evolution of Discovery Technology Definitions & Benefits How Predictive
Explore the Possibilities
Explore the Possibilities 2013 HR Service Delivery Forum Best Practices in Data Management: Creating a Sustainable and Robust Repository for Reporting and Insights 2013 Towers Watson. All rights reserved.
Certified Information Professional 2016 Update Outline
Certified Information Professional 2016 Update Outline Introduction The 2016 revision to the Certified Information Professional certification helps IT and information professionals demonstrate their ability
Module 6 Essentials of Enterprise Architecture Tools
Process-Centric Service-Oriented Module 6 Essentials of Enterprise Architecture Tools Capability-Driven Understand the need and necessity for a EA Tool IASA Global - India Chapter Webinar by Vinu Jade
Accenture Federal Services. Federal Solutions for Asset Lifecycle Management
Accenture Federal Services Federal Solutions for Asset Lifecycle Management Assessing Internal Controls 32 Material Weaknesses: identified in FY12 with deficiencies noted in the management of nearly 75%
AV-20 Best Practices for Effective Document and Knowledge Management
Slide 1 AV-20 Best Practices for Effective Document and Knowledge Management Douglas J. Vargo Vice President, Information Management Practice 2013 Invensys. All Rights Reserved. The names, logos, and taglines
ediscovery AND COMPLIANCE STRATEGY
ONE EASILY AVOIDABLE PITFALL IN YOUR ediscovery AND COMPLIANCE STRATEGY As the mobile workforce continues to grow and more data gets generated outside of the datacenter, bringing that endpoint data into
Miguel Ortiz, Sr. Systems Engineer. Globanet
Miguel Ortiz, Sr. Systems Engineer Globanet Agenda Who is Globanet? Archiving Processes and Standards How Does Data Archiving Help Data Management? Data Archiving to Meet Downstream ediscovery Needs Timely
Archiving and the Cloud: Perfect Together
Data Explosion At the 2010 Techonomy conference, Google CEO Eric Schmidt asserted that we are now generating more data every two days than we generated between the dawn of civilization and 2003. While
IBM Content Analytics with Enterprise Search, Version 3.0
IBM Content Analytics with Enterprise Search, Version 3.0 Highlights Enables greater accuracy and control over information with sophisticated natural language processing capabilities to deliver the right
Lowering E-Discovery Costs Through Enterprise Records and Retention Management. An Oracle White Paper March 2007
Lowering E-Discovery Costs Through Enterprise Records and Retention Management An Oracle White Paper March 2007 Lowering E-Discovery Costs Through Enterprise Records and Retention Management Exponential
