Auto-Classification for Document Archiving and Records Declaration Josemina Magdalen, Architect, IBM November 15, 2013
Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management IBM environment: Research and SWG working together to produce the best solution for our Customers How does Content Classification bring value to ECM? Content Classification concepts, components and architecture How can Content Classification help with Document Archiving and Records Management? Classification and Compliance Managing content at the entry point with Content navigator and Classification Optimizing your business workflow with Case manager and Classification Copyright International Business Machines Corporation 2013. All Rights Reserved. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 2 Internal Use Only
Beating the competition and leading the industry IBM ahead in both growth and market share Source: Gartner, ECM Market Share, April 30, 2013 IBM is the undisputed leader in the 2013 Gartner ECM Magic Quadrant (Sep 2013) 3 Internal Use Only Source: IDC, ECM Market Share, April 30, 2013 IBM Corporation
It s no longer about one thing Volume Velocity Variety 12 terabytes of Tweets create daily 5 trade events per second million 4 terabytes/site/day average surveillance video Analyze product sentiment Identify potential fraud Monitor events of interest 15 petabytes 500 million 80% info of new information daily growth call detail records per day is unstructured content Determine relevance Prevent customer churn Improve customer satisfaction During this presentation, 458.81 terabytes of information will have been created 4 Internal Use Only
Unleash the value of content in motion Capture it. Activate it. Socialize it. Analyze it. Govern it. Content at Rest equals Cost,.. Content in Motion equals Value 5 Internal Use Only
Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management IBM environment: Research and SWG working together to produce the best solution for our Customers How does Content Classification bring value to ECM? Content Classification concepts, components and architecture How can Content Classification help with Document Archiving and Records Management? Classification and Compliance Managing content at the entry point with Content navigator and Classification Optimizing your business workflow with Case manager and Classification Copyright International Business Machines Corporation 2013. All Rights Reserved. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 6 Internal Use Only
IBM Research: Open and Collaborative The Eras of IBM Research: The World Is Now Our Lab Isolated Research Joint Projects IBM Divisions, Clients, Universities Radical Collaboration In-world Research, Smarter Planet Research 50s 90s Hardware 90s 00s + Software & Services 10s + Smarter Planet 7 7 Internal Use Only
Let s talk about Watson What is IBM Watson? Why is it important? How is IBM putting Watson to work? 9 What can we expect in the future? 9 Internal Use Only
IBM Watson combines transformational technologies 1 Understands natural language and human communication 2 Generates and evaluates evidence-based hypothesis 3 Adapts and learns from user selections and responses built on a massively parallel architecture optimized for IBM POWER7 10 10 Internal Use Only
Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management IBM environment: Research and SWG working together to produce the best solution for our Customers How does Content Classification bring value to ECM? Content Classification concepts, components and architecture How can Content Classification help with Document Archiving and Records Management? Classification and Compliance Managing content at the entry point with Content navigator and Classification Optimizing your business workflow with Case manager and Classification Copyright International Business Machines Corporation 2013. All Rights Reserved. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 11 Internal Use Only
Content is Exploding Content is Evolving Content is Transforming The marketplace is driving greater volume, variety and velocity 12 Internal Use Only
Organizations will need to redefine their content strategy In order to gain control, optimize business outcomes, improve collaboration, achieve new insight, and govern for reduced cost and risk content in motion 13 13 Internal Use Only 2013 2012 IBM IBM Corporation Corporation
IBM helps companies realize the full value of content for better insight and outcomes Capture harness and exploit Activate Socialize Analyze optimize outcomes share and collaborate achieve new insights Govern reduce costs and risks 14 Internal Use Only
Classification brings value to IBM ECM products helping organizations with: Accessibility, Usability, Compliance, Analytics Can you find relevant content, quickly? Search, Refine, Repeat is no longer an acceptable Image Capture, Content Collection, Enterprise Search Is the right content available at the right time? Business processes require timely access to content Business Process Management, Case Management Are you complying with Legal and Business mandates? Content has a compliance lifecycle that must be enforced Content Collection, Enterprise Records, ediscovery Are you uncovering business insight from your content? Organized content produces better insight Content Analytics 15 Internal Use Only IBM Confidential 15
Content Classification: Analyze content to unlock critical insight Derive new business insight rapidly by accessing, interpreting and analyzing unstructured content Analyze content to derive 360-degree visibility and insight into unstructured information Search, assess and analyze large volumes of text in order to understand and determine relevant insight quickly Classify content through contextual understanding Only IBM brings together the technologies that define the next generation of Smarter Analytics solutions that can reason and learn Natural language Hypothesis testing 2 1 3 Evidence-based learning IBM Content Classification Moving your organization from search to discovery, from possibilities to probabilities, and from simple outputs to intelligent options 16 16 Internal Use Only
What does IBM Content Classification do? Content Classification discovers the intent of a document by analyzing its content automatically learns from examples allows you to auto-classify huge volumes of documents into pretrained categories, consistently and efficiently 17 Internal Use Only
What is IBM Content Classification used for? Content Classification is most valuable when: A large number of documents need to be categorized Documents need to be categorized based on their content When an action needs to be taken as a result of the classification Need to order the chaos and bring structure into unstructured data 18 Internal Use Only
What is IBM Content Classification used for? (cont.) Automatic classification advantages over manual classification: Reduces training cost Reduces laborious activities Consistent decisions, reduces errors Coherent and legally defensible Extremely fast 19 Internal Use Only
Why organizations need Content Classification Through automated, advanced classification, knowledge workers have quick access to relevant content can use the information they need to complete tasks are not burdened with enforcing compliance and retention policies can analyze content relevant to specific subject matter Automated classification allows workers to focus on key business tasks, rather than spend time with manual categorization of content In short, Content Classification improves productivity 20 20 Internal Use Only
Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management IBM environment: Research and SWG working together to produce the best solution for our Customers How does Content Classification bring value to ECM? Content Classification concepts, components and architecture How can Content Classification help with Document Archiving and Records Management? Classification and Compliance Managing content at the entry point with Content navigator and Classification Optimizing your business workflow with Case manager and Classification Copyright International Business Machines Corporation 2013. All Rights Reserved. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 21 Internal Use Only
Content Classification What If? Manual classification might yield Correspondence or Complaint Rules-based classification needs rules for One Million Mile, delay, weather and what else? Context-Based Classification enables you to.. Classify as Presidential or High Value Client Route to worker assigned to High Value Clients Assign to High Value Client Record Class & Retention Rules Analyze content for High Value Client feedback 22 Internal Use Only
Content Classification What If? Manual classification might yield Correspondence or Complaint Rules-based classification needs rules for awful, delay, never and what else? Context-Based Classification enables you to.. Classify as Customer Complaint Route to worker assigned to Complaints Assign to Complaint Record Class & Retention Rules Analyze content for Customer Satisfaction or Dissatisfaction 23 Internal Use Only
Content Classification What If? Manual classification might yield Correspondence Rules-based classification needs rules for wonderful, delay, pleasure and what else? Context-Based Classification enables you to.. Classify as Customer Compliment Route to worker assigned to Cross Sell or Up Sell Assign to Compliment Record Class & Retention Rules Analyze content for Customer Satisfaction or Dissatisfaction 24 Internal Use Only
What-If Summary The sample emails were not, specifically, about delays, they were about how customers were treated during a delay Manual classification would be slow and could have resulted in inaccurate categorization Rules-based, keyword-only classification would require rules for specific keywords and may have miscategorized based on words like delay Context-based classification allowed the system to understand the context of each email and classify, route, govern, and analyze with better accuracy The Bottom Line: Content Classification tells you what you content is about 25 Internal Use Only
Classification Process Train using Quick Start Tool 1. Train Decision Plan 2. Deploy Classification Server Classification Application? The core market for this new product has been defined as such by IBM 3. Auto Classify A The core market for this new product has been defined as such by IBM 26 26 Internal Use Only
Classification by Contextual Understanding Text Analysis, Statistics, and Learning by Example Knowledge Base Custom & partner applications IBM pre-built integrations (ECM,...) Input Team, We need to determine how to handle the results of the most recent earnings report and how it will impact the reaction on Wall Street. We need to get out in front of this before the press does! Jack, get the status from Engineering ahead of time. Regards, John Output PR(92%) FINANCE(82%) ENGINEERING(32%) Feedback Intent = PR Email IBM Content Classification 27 Internal Use Only
Control the level of Classification automation Advanced classification can be executed as an assistance to authors in user interfaces Semi-automated advanced classification via monitoring Assisted classification in user interfaces like SharePoint or in the future in IBM s Office integration Complete Automation Automation with Auditing Automation of Medium Confidence and Above Automation of High Confidence and Above Assisted Manual Classification 100% 0% 28 Internal Use Only
Data in motion: Periodic human oversight facilitates automatic adjustment of policies Content Classification learns from user feedback to improve and adapt policies Category Recommendation User Interactions User Feedback Classification Server 29 Internal Use Only
Content Classification Rules Decision Plan A decision plan is a sequence of rules and calls to statistical analysis Rule capabilities: String search Word distance Regular expressions Pattern extraction Boolean expressions Decision plan capabilities: Identify category (in more than one taxonomy) Set document metadata Invoke statistical analysis Language identification Recommend actions 30 Internal Use Only
Content Classification Rules Fine-tuning with Rules Use rules to select a category based on score Use rules to extract data from textual content Test rules to analyze their behavior with variable content items 31 Internal Use Only
Content Classification Rules More Decision Plan rules Triggers Substring search, words search/ words within a distance Search based on (large) words/phrases lists Search based Date/Time search Actions Set Date/Time actions Set Expiration Date (Retention Date) Entity identification and extraction: standard regular expression syntax supported Decision plan pipeline has a published API. Customers can create their own custom classification methods or call out to other systems to enhance classification You can invoke your preferred ontology You can use UIMA annotators 32 Internal Use Only
Content Classification Methods: Contextual and Rules Content Classification combines multiple methods of categorization technologies to deliver the automatic classification Uses contextual analysis based on machine learning techniques Uses natural language processing and semantic analysis Uses rules-based on metadata or confidence score Can be used in tandem or separately depending on requirements 33 33 Internal Use Only